项目作者: Pavel216989

项目描述 :
International Data Analysis Olympiad - online round
高级语言: Jupyter Notebook
项目地址: git://github.com/Pavel216989/idao.git
创建时间: 2019-01-17T18:08:00Z
项目社区:https://github.com/Pavel216989/idao

开源协议:

下载


International Data Analysis Olympiad 2019 - online round

Results

25th place on the first track (out of 1315 participants)

40 place on the overall leaderboard (sum of track 1 and track 2) (out of 1315 participants)

Task description:

The task is to build a classifier that would distinguish muons from non-muons in the LHCb detector.

Full task description + features explanation

Solution description:

Stacking of 2 models:

  1. LightGBM model trained on lightgbm-encoded categorical features.

    HyperParameters:{‘max_depth’:7, ‘objective’:’binary’, ‘learning_rate’:0.2,’num_leaves’:64,’min_data_in_leaf’:15, ‘num_iterations’:90}

  2. LightGBM model trained on one-hot-encoded categorical features.

    HyperParameters:{‘max_depth’:9, ‘objective’:’binary’, ‘learning_rate’:0.2,’num_leaves’:128,’min_data_in_leaf’:15, ‘num_iterations’:90}

Model

Features generation

  1. P_PT = P - PT.

    The difference between momentum and the component of the momentum, which is parallel to the beam.

  2. P_PT_P = (P - PT) / P.

    Same as above, normalized by momentum.

  3. closest_{x/y/T/z/dx/dy}_per_station.

    The {X,Y,Z} positions, timing (T) and uncertainty of the Matched hit coordinates, also known as pad size of the closest hit for each of 4 stations

  4. absMatchedHit{X/Y}{0/1/2/3}.

    Absolute value of hit {X/Y} coordinates for each of 4 stations.

Files:

Notebooks:

  1. closest_hits_generator.ipynb - Generate closest hits features and save to a file

  2. LGBM.ipynb - first LGBM model.

  3. LGBM_dummies.ipynb - second LGBM model

  4. Main.ipynb - meta-model