Skip to content

ngathan/text_analysis_templates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text Data Templates

This repo contains R scripts for cleaning and preparing text data for further analysis. I will also provide simple templates of some popular text analysis methods such as Word2Vec, topic modeling (structural topic modeling, or LDA).

In general my text-data-cleaning process is as follows:

  1. remove emojis
  2. remove URLs
  3. remove language(s) that you don't use in the final analysis
  4. remove spams

Description of text data

  1. top words
  2. bigram
  3. trigram

Topic modeling

  1. LDA
  2. STM

Word2Vec

Releases

No releases published

Packages

No packages published

Languages