This project aims to order, structure, preprocess, analyze, and try to feed the data to a Machine Learning model to find statistics and meaningful numbers that can help us predict the what, when, and where of the occurrence of the upcoming Catastrophes.
Machine learning in disaster management holds a great potential to use predictive analytics to help alerting about upcoming calamities.
The aim of this project is to order, structure, preprocess ,analyze , and try to feed the data to a Machine Learning model in order to find statistics and meaningful numbers that can help us predict the what, when and where of the occurrence of the upcoming Catastrophes.
We worked on this project with the government of Valle del Cauca especially with the National Unit for disaster risk management who is participating in a data science convocation for the Secretary of Information and Communications Technology.
The data is presented as 22 Excel files. Each of them represents a specific year (1998 →2019). Every file has a different structure for the data (they don’t have the same columns, some of them have more columns than
Some data had different structures, some columns were missing in some files and others had additional information that is not to be found in other files. So the first thing we did was find the shared columns in every file and make a unified structure. Each one of us created a structure for his correspondent files and these structures were created. Which with the same technique, lead us to Final structure presented below:
Once we had the final structure for all the 22 files we started with the cleaning process using SageMaker.
We did the data cleaning process using AWS SageMaker service, also Python with Pandas and SQL. And for storage AWS S3
We use the SVM algorithm to build the model
All the process are well document on the notebook of this project
Amine Neifer - Github / LinkedIn
Victor Arteaega - Github / LinkedIn
Pablo Andres Urbano De la Cruz - Github / LinkedIn