项目作者: ngathan

项目描述 :
Cleaning Text social media text data
高级语言: R
项目地址: git://github.com/ngathan/text_analysis_templates.git
创建时间: 2020-07-22T20:33:01Z
项目社区:https://github.com/ngathan/text_analysis_templates

开源协议:

下载


Text Data Templates

This repo contains R scripts for cleaning and preparing text data for further analysis. I will also provide simple templates of some popular text analysis methods such as Word2Vec, topic modeling (structural topic modeling, or LDA).

In general my text-data-cleaning process is as follows:

  1. remove emojis
  2. remove URLs
  3. remove language(s) that you don’t use in the final analysis
  4. remove spams

Description of text data

  1. top words
  2. bigram
  3. trigram

Topic modeling

  1. LDA
  2. STM

Word2Vec