项目作者: ishaan007

项目描述 :
NLP in python Vector Space Modelling and document classification NLP
高级语言: Jupyter Notebook
项目地址: git://github.com/ishaan007/vector_space_modelling.git
创建时间: 2017-03-19T11:01:39Z
项目社区:https://github.com/ishaan007/vector_space_modelling

开源协议:

下载


Text classification @IshaanArora95/document-feature-extraction-and-classification-53f0e813d2d3#.3maxyvobf">Blog Link

Document classification

  1. a. Feature extraction
  2. (i)TF-IDF
  3. (ii) word embeddings using doc2vec
  4. b. Classification
  5. (i) Logistic Regression
  6. (ii) Naive Bayes (Multinomial and gauusian)

Token classification :TODO

Data

Reuters News data

Cleaned Reuters data

Results

Document Classification

Token Classification :TODO

Doc2Vec self trained model

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D