Skip to content

neilchalk/Text-Mining

 
 

Repository files navigation

Text-Classification---R

Classifying documents into categories

Document Classification using R

The file "Topic Modelling" is based on the blog post Topic Modeling in R.

BEfore running on Windows I had some setup to do:

the values from the Twitter app need to go into these lines in "Topic Modelling":

Consumer_key<- "YOUR_CONSUMER_KEY"
Consumer_secret <- "YOUR_CONSUMER_SECRET"
access_token <- "YOUR_ACCESS_TOKEN"
access_token_secret <- "YOUR_TOKEN_SECRET"

Then in the console make sure all the packages required are installed (this is in the script "prereqs.R"):

install.packages('twitteR')
install.packages("tm")
install.packages("wordcloud")
install.packages("slam")
install.packages("topicmodels")
install.packages('base64enc')

Now you are ready to run "Topic Modelling" in RStudio! There are two lines near the head of the file that can be used to tweak the number and type of of results:

numTweets <- 900

tweetData <- searchTwitter("flight", n=numTweets, lang="en")

numTweets is used in the twitter search and later in to set the SEED for the analysis algorithm.

tweetData holds the data that we want to analyse. Here I have done a simple search for the text "flight" in English. See Getting Data via twitteR for more ideas. Look in "topic-exploration.R" and "Sentiment.R" for more ways to look at the data downloaded in "Topic Modelling.R"

About

Classifying documents into categories

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 84.2%
  • Python 15.8%