Regression and Classification Using Decision Tree, Random Tree, Bootstrap Aggregating, and Boosting.
Based on project 3 in the Georgia Tech Spring 2020 course Machine Learning for Trading by Prof. Tucker Balch.
sys.argv[1]
File name with the dataset passed as argument. Data must be in a csv file, with each column a feature and the label in the last column.
split_factor
Split value between training and test data.
learner_type
Type of learner (decision tree or random tree).
leaf
Lowest number of leaves to keep; any branch with equal or less leaves is substituted by a single leaf with a value equal to the average of the removed leaves.
tol
Tolerance to group leaves based on their labels; any branch where the leaves have a value that differ from their average less or equal to this tolerance is substituted by a single leaf with a value equal to the average.
bags
Number of bags to be used for bootstrap aggregating; no bagging is enforced setting this value to zero.
bag_factor
Number of data in each bag as a fraction of the number of training data.
boost
Specify if boosting should be used or not.
All examples are for the file istanbul.csv
. Correlation results are obtained averaging 20 runs.
split_factor = 0.7
learner_type = 'dt'
leaf = 1
tol = 1.0e-6
bags = 0
bag_factor = 1.0
boost = False
split_factor = 0.7
learner_type = 'rt'
leaf = 1
tol = 1.0e-6
bags = 0
bag_factor = 1.0
boost = False
split_factor = 0.7
learner_type = 'dt'
leaf = 10
tol = 1.0e-6
bags = 0
bag_factor = 1.0
boost = False
split_factor = 0.7
learner_type = 'dt'
leaf = 1
tol = 1.0e-2
bags = 0
bag_factor = 1.0
boost = False
split_factor = 0.7
learner_type = 'dt'
leaf = 1
tol = 1.0e-6
bags = 10
bag_factor = 1.0
boost = False