Repo for the Adult Census Income Project
Repo for the Adult Census Income Project
Exploratory Data Analysis, outliers identification and data cleaning.
Modelling using KNeighbors, Logistic Regression, Random Forest, CatBoost amd XGBoost classifiers.
Hyperparameters tuning usings RandomizedSearchCV amd GridSearchCV.
Metrics evaluation and Feature Importance.
Python Version: 3.8.2
Packages: Pandas, Numpy, Matplotlib, Seaborn, SKlearn, XGBoost, CatBoost
The EDA shows distribution of data and relation between different features’ Below are few highlights from the graphs:
Create a preprocess_data(df)
function that performs transformations on the DataFrame given as parameter and returns its converted version. Below the changes function makes:
train
and test
datafit_and_score(model)
function to instantiate and compare accuracy from different estimators simultaneously.Metrics evaluation using Cross Validation (Precision, Recall and F1 scores), ROC curve and AUC, Confusion Matrix and Classification Report
Feature Importance