项目作者: acdick
项目描述 :
Performance comparison of classification algorithms
高级语言: Python
项目地址: git://github.com/acdick/endangered_species_classification.git
Classification of Endangered Species
Predicting the Federal Listing Status of U.S. plant and animal species.
U.S. Fish & Wildlife Service
Data for this project was collected from the U.S. Fish & Wildlife Service.
https://ecos.fws.gov
U.S. Forest Service
Table B-11: Forest Land Area
https://www.fs.fed.us/sites/default/files/fs_media/fs_document/publication-15817-usda-forest-service-fia-annual-report-508.pdf
U.S. Environmental Protection Agency
Air Quality Index
https://aqs.epa.gov/aqsweb/airdata/download_files.html#Annual
Exploratory Data Analysis
Species Group
- Dropped features that represented less than 1% of the population

State Distribution
- Dropped features that represented less than 1% of the population

VIP Distribution

Classification Models
- Dummy Classifier
- Logistic Regression
- K Nearest Neighbors
- Decision Tree
- Random Forest
Baseline Model


Best Training Model by F1 Score

Balanced Class Model with SMOTE Oversampling


Best Training Model by F1 Score

Tuned Hyper-Parameter and Balanced Class Model

Best Training Model by F1 Score

Most important features:
States
Species Groups
