项目作者: mtzmonica
项目描述 :
Basic text analytics demo using Gutenberg Project data.
高级语言: Python
项目地址: git://github.com/mtzmonica/Text-Analytics-Demo.git
Text Analytics I
Implementation of various basic text analytics techniques and algorithms.
Implemented Techniques/Algorithms:
- Data Scraping
- Normalization/Preprocessing
- Vectorization: Feature Extraction
- Bag of Words
- TF-IDF
- Document Similarity
- Document Clustering Algorithms
- Classification Algorithms
Visualizations
- Word Clouds - simple term freq distribution of using NLTK and Word_Cloud library. (data: Gutenberg Project)