项目作者： sudip-padhye

项目描述：
Visualize geolocation data by addressing outliers

高级语言： Jupyter Notebook

项目主页： https://www.coursera.org/learn/clustering-geolocation-data-intelligently-python/home/week/1

项目地址: git://github.com/sudip-padhye/Clustering-Geolocation-Data-Intelligently-using-HDBSCAN.git

创建时间： 2020-08-10T01:22:46Z
项目社区：https://github.com/sudip-padhye/Clustering-Geolocation-Data-Intelligently-using-HDBSCAN
开源协议：
下载

Clustering Geolocation Data Intelligently in Python

We have taxi rank locations, and want to define key clusters of these taxis where we can build service stations for all taxis operating in that region.

Prerequisites

Basic Matplotlib skills for plotting 2-D data clearly.
Basic understanding of Pandas and how to use it for data manipulation.
The concepts behind clustering algorithms.

Project Outline

Exploratory Data Analysis
Visualizing Geographical Data
Clustering Strength / Performance Metric
K-Means Clustering

drawing

output

DBSCAN
HDBSCAN
Addressing Outliers

drawing

output

Further Reading

For some additional reading, feel free to check out
K-Means, [DBSCAN] (https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html), and
[HDBSCAN] (https://hdbscan.readthedocs.io/en/latest/) clustering respectively.

It may be of use to also check out [other forms of clustering] (https://scikit-learn.org/stable/modules/clustering.html) that are commonly used and available in the scikit-learn library. HDBSCAN documentation also includes [a good methodology] (https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html) for choosing your clustering algorithm based on your dataset and other limiting factors.


