项目作者: Shrsh

项目描述 :
Random Forests on Income classification
高级语言: Jupyter Notebook
项目地址: git://github.com/Shrsh/UCI--Adult-Data-Set.git
创建时间: 2018-05-28T07:45:32Z
项目社区:https://github.com/Shrsh/UCI--Adult-Data-Set

开源协议:

下载


UCI—Adult-Data-Set

This data was extracted from UCI Repository- ADULT DATA.

http://archive.ics.uci.edu/ml/machine-learning-databases/adult

Test-Train Split

Split into train-test using MLC++ GenCVFiles (2/3, 1/3 random).

48842 instances, mix of continuous and discrete (train=32561, test=16281)

45222 if instances with unknown values are removed (train=30162, test=15060)

Prediction task is to determine whether a person makes over 50K a year.
Results:
Algorithm Accuracy
SVM- ‘rbf’ Kernel 0.7986448220064725
SVM- ‘linear’ Kernel 0.8180622977346278
Decision Trees 0.8518406148867314
Bagging with Decision Trees 0.8973503236245954
Random Forest 0.8701456310679612