项目作者: singh-l

项目描述 :
Clustering text using text vectorization
高级语言: Jupyter Notebook
项目地址: git://github.com/singh-l/Clustering_Repo.git
创建时间: 2020-03-29T00:31:29Z
项目社区:https://github.com/singh-l/Clustering_Repo

开源协议:MIT License

下载


Clustering_Repo

Website

Clustering Text : a comparison between available text vectorization techniques*


Author: Lovedeep Singh













Abstract. The concept of clustering is of primitive importance in the field of unsupervised learning. We have always required the need to categorize data with respect to some parameters. More or less, this can become quite challenging with the increasing amount of jargon, which requires expert domain knowledge, and with the increasing amount of data. Sometimes, we even do not possess enough knowledge about the data to divide it into categories. We simply do not possess past experiences to train a classification model for categorizing data. This paper present a comparative study on the techniques available for clustering text data using only text vectorization methods.


Proceedings published in AISC-Springer