项目作者: pplonski
项目描述 :
Genetic Algorithm Feature Engineering
高级语言: Python
项目地址: git://github.com/pplonski/gafe.git


GAFE - Genetic Algorithm Feature Engineering
Simple algorithm for new features engineering.
- gafe tries different combination of features with operators:
+
, -
, *
- add it to your dataset
- re-evaluate the classifier performance with new features
Example
- In binary classification problem, you have dataset with following 20 features:
feature1
, feature2
, feature3
, …, feature20
and binary target column. - GAFE computes the base score for your dataset using Random Forest (32 trees), 5-fold CV and negative log loss.
- The algorithm is starting with random population of new feature sets. Each new feature set contains from
new_features_lower_cnt
to new_features_upper_cnt
new features. Each new feature is combination of original features with operators: +
, -
, *
, for example new feature can look like: feature1-feature2-feature3
. - Each new feature set is scored with the same classifier as in step 2. For scoring are used concatenated original and new features.
- The genetic algorithm is applied to mutate new feature sets to find better features.
- At the end, the best feature set is selected based on classifier performance.