项目作者: acdick

项目描述 :
Performance comparison of classification algorithms
高级语言: Python
项目地址: git://github.com/acdick/endangered_species_classification.git
创建时间: 2019-05-06T18:41:38Z
项目社区:https://github.com/acdick/endangered_species_classification

开源协议:MIT License

下载


Classification of Endangered Species

Predicting the Federal Listing Status of U.S. plant and animal species.

U.S. Fish & Wildlife Service

Data for this project was collected from the U.S. Fish & Wildlife Service.

https://ecos.fws.gov

U.S. Forest Service

Table B-11: Forest Land Area

https://www.fs.fed.us/sites/default/files/fs_media/fs_document/publication-15817-usda-forest-service-fia-annual-report-508.pdf

U.S. Environmental Protection Agency

Air Quality Index

https://aqs.epa.gov/aqsweb/airdata/download_files.html#Annual

Exploratory Data Analysis

Species Group

  • Dropped features that represented less than 1% of the population

Species Group Distribution

State Distribution

  • Dropped features that represented less than 1% of the population

State Distribution

VIP Distribution

VIP Distribution

Classification Models

  • Dummy Classifier
  • Logistic Regression
  • K Nearest Neighbors
  • Decision Tree
  • Random Forest

Baseline Model

Class Imbalance

Baseline

Best Training Model by F1 Score

  • K Nearest Neighbors

Baseline Train KNN

Balanced Class Model with SMOTE Oversampling

Class Balance

Balanced

Best Training Model by F1 Score

  • Decision Tree

Balanced Train Decision Tree

Tuned Hyper-Parameter and Balanced Class Model

Balanced and Tuned

Best Training Model by F1 Score

  • Logistic Regression

Balanced and Tuned Train Logistic

Most important features:

States

  • Idaho
  • Hawaii
  • Wyoming

Species Groups

  • Insects
  • Crustaceans

Feature Importance