Skip to content

vineetdhanawat/twitter-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Sentiment Analysis - BITS Pilani

Introduction

Used IFTTT to monitor twitter for the following keywords. #BITSPilani, #BITSGoa, #BITSHyd, #BITSDubai, #BITSAA, #BITS, #Pilani Obviously this has lot of noise as well due to 'BITS' keyword. Since IFTTT has stopped Twitter support for live searches, Use https://zapier.com/ and create your own dataset for analysis.

All the services will send you emails in one or the other formats, which is easy to parse. If you just have list of all tweets, Use twitter API to crawl the texts.

IFTTT Email format

ifttt

via task 618721:
http://ifttt.com/tasks/618721

Agam, the band, with BITSian roots http://t.co/EvIumdmJ http://twitter.com/BITSAA/status/166177110573064192
by http://twitter.com/BITSAA

Usage

  1. Twitter training dataset taken from http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/ .
  2. Parsed and formatted training datasets for 1.5M and .1M tweets has been included.
  3. BITS Pilani Dataset containing tweets for the duration January 20, 2012 to September 27, 2012
  4. Use Rapidminer 5.3 with -Xms2048m -Xmx3072m for faster calculations. Though other models are faster, SVM is really slow and so avoid using more than 0.1 Million dataset.

SVM model (~20 hours)

Performance Vector

true 0true 1class precision
pred. 024042992270.79%
pred. 1194824653770.49%
class recall55.24%82.43%

Stats

Top 10 Positive and Negative words

wordweightwordweight
thank0.06800427050495744sad0.06904954519705979
love0.04238921785592977miss0.06799716497097386
good0.03864780316342833sorri0.06447410364223946
great0.03332699835307452wish0.04964308132602499
quot0.028049576202737663suck0.04549754050714666
welcom0.028045093611976712bad0.03882145370669514
awesom0.027883840586310205hate0.038814744730334146
haha0.027711586964757735work0.038456277249749565
nice0.026502431781819224poor0.03537374379337165
happi0.024842171425360552want0.03312521661076012

Sentiment Ratio

Positive Tweets4759
Negative Tweets1552

Naive Bayes (~4 hours)

Performance Vector

true 0true 1class precision
pred. 0344133688448.27%
pred. 191111957568.24%
class recall79.07%34.67%

Sentiment Ratio

Positive Tweets3436
Negative Tweets2875

License

MIT: http://vineetdhanawat.mit-license.org/