项目作者: nisharangnani

项目描述 :
A prediction model that uses logistic regression and gradient boosting to classify population income.
高级语言: Jupyter Notebook
项目地址: git://github.com/nisharangnani/adult-income-prediction.git
创建时间: 2020-05-17T21:36:41Z
项目社区:https://github.com/nisharangnani/adult-income-prediction

开源协议:MIT License

下载


Adult Income Prediction

A prediction model to determine if a person’s income is over $50,000 a year.

Dataset

The dataset is extracted from the 1994 Census database and is available on the UCI repository. The size of the dataset is 48,842 rows and includes 14 attributes such as age, gender, occupation, number of hours the individual works per week, etc.

Approach

  • Exploratory data analysis (uni-variate and bi-variate)
  • Data preprocessing (deduplication, handling missing values)
  • Classification using logistic regression
  • Classification using a gradient boosting machine
  • Feature engineering

Results

Algorithm Accuracy Area under the curve
Logistic regression 81.63% 0.862
Gradient boosting machine 82.58% 0.881