项目作者: ishaan007
项目描述 :
NLP in python Vector Space Modelling and document classification NLP
高级语言: Jupyter Notebook
项目地址: git://github.com/ishaan007/vector_space_modelling.git
Text classification @IshaanArora95/document-feature-extraction-and-classification-53f0e813d2d3#.3maxyvobf">Blog Link
Document classification
a. Feature extraction
(i)TF-IDF
(ii) word embeddings using doc2vec
b. Classification
(i) Logistic Regression
(ii) Naive Bayes (Multinomial and gauusian)
Token classification :TODO
Data
Reuters News data
Cleaned Reuters data
Results
Document Classification

Token Classification :TODO
Model Links
Doc2Vec self trained model
Contributing
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request :D