项目作者: christisarv

项目描述 :
This project uses tropical storm data to predict whether a detected weather disturbance will be a severe hurricane.
高级语言: Jupyter Notebook
项目地址: git://github.com/christisarv/tropical-storm-classification-project.git


Tropical Storm Classification Modeling Project

Author: Christie Sarver

Overview

The goal of this project is to build a model that takes in tropical cyclone tracking data and classifies accurately whether readings indicate that a storm is a severe Tropical Storm or a less disruptive disturbance.

The Data

The data for this project is from the National Oceanic and Atmospheric Administration’s International Best Track Archive for Climate Stewardship (IBTrACS) project. The goal of this project is make available tropical cyclone best track data to aid understanding of the distribution, frequency, and intensity of tropical cyclones worldwide.

Because IBTrACS is a global source, this data is pulled from many separate agenices worldwide, and therefore has many columns that are duplicative, inconsistent, or difficult to interpret. When doing this analysis, reference was made to the data documentation saved in this repository. The Data Visualizations notebook includes different ways of examining the features, as well as a function to map out the specific storms in the data set.

Source

Business Problem

The resulting model will be used by meterologists to understand whether an incoming storm is a major threat to a certain area, and therefore inform news agenices, local governments, and the public to prepare accordingly.

storm_map.jpg

Methodology

Due to the complexity of this data set, a large part of this project was cleaning and exploring the data with reference to its documentation, located in this repository. Before modeling, relevant features were selected including:

  • Location-based indicators: latitude, longitude, basin, distance to land
  • Weather conditions: wind speed, storm speed, storm direction
  • Times of occurence: year, week of year

Several model types were run to determine the best methodology, with the primary scoring method of recall in order to reduce false negatives i.e. instances where a severe storm is misclassified as a non-threat. The model was also evaluated on accuracy & precision.

Results

Based on the model evaluations which are detailed in the notebook, the final model chosen to move forward with is a boosted method using Adaboost. The strenghts of this model include its high recall score as compared to others and overall high accuracy.

Areas where the model could be improved include reducing overall error as shown in the confusion matrix below.

confusion_matrix.png

Business Objective Results

The project achieved the goal of creating a highly accurate model with an emphasis on a high recall score, which can be used to classify tropical storms. Due to the challenges with the data, there are future improvements that can be made if more resources are alloted to this project.

Conclusions & Future Work

The Adaboost model can be further tuned to reduce error, and other boosting methods can be exploredto see how they compare. It is also recommended to revisit the feature selection to potentially remove more of the less trustworthy features to get better predictions.

This data set itself presented several challenges. For future work it is recommended to work closer with or take further time to examine/understand NOAA data and their methodology in order to improve data that is piped into this model. This may include examining the different sources of the data as well as the data gathering process.

Lastly, while this project analyzed individual readings in the data, it is recommended to analyze data grouped by storm if more instances of the 0 class are able to be added to the data set to take a more holistic look at storm patterns.

For More Information

Please reference the Jupyter Notebook or review this presentation.

Repository Structure

  1. ├── images
  2. ├── Data Classification_Predicting Tropical Storms.ipynb
  3. ├── Data Visualizations.ipynb
  4. ├── IBTrACS_version4_Technical_Details.pdf
  5. ├── Data Classification Presentation.pdf
  6. ├── README.md

Thank you!