项目作者: mtzmonica

项目描述 :
Basic text analytics demo using Gutenberg Project data.
高级语言: Python
项目地址: git://github.com/mtzmonica/Text-Analytics-Demo.git
创建时间: 2017-03-07T04:31:50Z
项目社区:https://github.com/mtzmonica/Text-Analytics-Demo

开源协议:

下载


Text Analytics I

Implementation of various basic text analytics techniques and algorithms.

Implemented Techniques/Algorithms:

  • Data Scraping
  • Normalization/Preprocessing
    • Tokenization
  • Vectorization: Feature Extraction
    • Bag of Words
    • TF-IDF
      • Unsupervised Learning:
    • Document Similarity
      • Cosine Similarity
    • Document Clustering Algorithms
      • Supervised Learning:
    • Classification Algorithms

Visualizations

  • Word Clouds - simple term freq distribution of using NLTK and Word_Cloud library. (data: Gutenberg Project)