A prediction model that uses logistic regression and gradient boosting to classify population income.
A prediction model to determine if a person’s income is over $50,000 a year.
The dataset is extracted from the 1994 Census database and is available on the UCI repository. The size of the dataset is 48,842 rows and includes 14 attributes such as age, gender, occupation, number of hours the individual works per week, etc.
Algorithm | Accuracy | Area under the curve |
---|---|---|
Logistic regression | 81.63% | 0.862 |
Gradient boosting machine | 82.58% | 0.881 |