Skip to content

Streaming tweets with spark, language detection & sentiment analysis, dashboard with Kibana

License

Notifications You must be signed in to change notification settings

alonsoir/twitter-sentiment-analysis

 
 

Repository files navigation

Twitter Sentiment Analysis

  • retrieve tweets using Spark Streaming
  • language detection
  • sentiment analysis (StanfordNLP)
  • index tweets in Elasticsearch
  • live dashboard using Kibana

UPDATE Alonso

First at all, thank you to Vincent Spiewak, original author of this project.

I am running this code in osx Sierra, so the best way is to install kibana and elastic dependencies using brew command.

brew install kibana

brew install elasticsearch

Go to spark.apache.org and download 1.5.2 version, actual is spark-2.0.0-bin-hadoop2.7.tgz, but it doesnt work actually because of this exception:

at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:42). 
...

launched by this portion of code:

tweets.foreachRDD{(rdd, time) =>
       rdd.map(t => {
         Map(
           "user"-> t.getUser.getScreenName,
           "created_at" -> t.getCreatedAt.toInstant.toString,
           "location" -> Option(t.getGeoLocation).map(geo => { s"${geo.getLatitude},${geo.getLongitude}" }),
           "text" -> t.getText,
           "hashtags" -> t.getHashtagEntities.map(_.getText),
           "retweet" -> t.getRetweetCount,
           "language" -> detectLanguage(t.getText),
           "sentiment" -> detectSentiment(t.getText).toString
         )
       }).saveToEs("twitter/tweet")
     }

Obviously the saveToEs method.

Whatever, decommpress 1.5.2 version whenenever you want.

Go to https://apps.twitter.com and create an app, save in your laptop consumer-key, consumer-secret, access_token and access_token_secret values.

To run the code:

Execute elasticsearch:

$:~ aironman$ elasticsearch 
[2016-10-03 16:49:49,879][INFO ][node                     ] [Rex Mundi] version[2.4.0], pid[52071], build[ce9f0c7/2016-08-29T09:14:17Z]
[2016-10-03 16:49:49,880][INFO ][node                     ] [Rex Mundi] initializing ...
[2016-10-03 16:49:50,425][INFO ][plugins                  ] [Rex Mundi] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2016-10-03 16:49:50,454][INFO ][env                      ] [Rex Mundi] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [33.6gb], net total_space [464.7gb], spins? [unknown], types [hfs]
[2016-10-03 16:49:50,454][INFO ][env                      ] [Rex Mundi] heap size [989.8mb], compressed ordinary object pointers [true]
[2016-10-03 16:49:50,455][WARN ][env                      ] [Rex Mundi] max file descriptors [10240] for elasticsearch process likely too low, consider increasing to at least [65536]
[2016-10-03 16:49:52,143][INFO ][node                     ] [Rex Mundi] initialized
[2016-10-03 16:49:52,143][INFO ][node                     ] [Rex Mundi] starting ...
[2016-10-03 16:49:52,220][INFO ][transport                ] [Rex Mundi] publish_address {127.0.0.1:9300}, bound_addresses {[fe80::1]:9300}, {[::1]:9300}, {127.0.0.1:9300}
[2016-10-03 16:49:52,224][INFO ][discovery                ] [Rex Mundi] elasticsearch_aironman/vzEnJrBbRbqkQGdGts10tQ
[2016-10-03 16:49:55,252][INFO ][cluster.service          ] [Rex Mundi] new_master {Rex Mundi}{vzEnJrBbRbqkQGdGts10tQ}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-10-03 16:49:55,264][INFO ][http                     ] [Rex Mundi] publish_address {127.0.0.1:9200}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9200}
[2016-10-03 16:49:55,264][INFO ][node                     ] [Rex Mundi] started
[2016-10-03 16:49:55,316][INFO ][gateway                  ] [Rex Mundi] recovered [3] indices into cluster_state
[2016-10-03 16:49:55,827][INFO ][cluster.routing.allocation] [Rex Mundi] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[library][2]] ...]).
[2016-10-03 16:50:25,279][WARN ][cluster.routing.allocation.decider] [Rex Mundi] high disk watermark [90%] exceeded on [vzEnJrBbRbqkQGdGts10tQ][Rex Mundi][/usr/local/var/elasticsearch/elasticsearch_aironman/nodes/0] free: 33.6gb[7.2%], shards will be relocated away from this node
...

Execute kibana, as simple as put kibana in a terminal tag:

$:clasificacion_tweets aironman$ kibana
log   [17:14:59.962] [info][status][plugin:kibana] Status changed from uninitialized to green - Ready
log   [17:14:59.988] [info][status][plugin:elasticsearch] Status changed from uninitialized to yellow - Waiting for Elasticsearch
log   [17:15:00.001] [info][status][plugin:kbn_vislib_vis_types] Status changed from uninitialized to green - Ready
log   [17:15:00.006] [info][status][plugin:markdown_vis] Status changed from uninitialized to green - Ready
log   [17:15:00.014] [info][status][plugin:metric_vis] Status changed from uninitialized to green - Ready
log   [17:15:00.019] [info][status][plugin:spyModes] Status changed from uninitialized to green - Ready
log   [17:15:00.023] [info][status][plugin:statusPage] Status changed from uninitialized to green - Ready
log   [17:15:00.027] [info][status][plugin:table_vis] Status changed from uninitialized to green - Ready
log   [17:15:00.040] [info][listening] Server running at http://0.0.0.0:5601
log   [17:15:00.046] [info][status][plugin:elasticsearch] Status changed from yellow to green - Kibana index ready

In order to run the job, you have to clone the project and compile it with sbt as usual.

git clone https://github.com/alonsoir/twitter-sentiment-analysis.git

cd twitter-sentiment-analysis

sbt clean package assembly 


$HOME/spark-1.5.2/bin/spark-submit \
    --class com.github.vspiewak.TwitterSentimentAnalysis \
    --master local[2] \
    target/scala-2.10/twitter-sentiment-analysis-assembly-0.1-SNAPSHOT.jar \
    <consumer_key> \
    <consumer_secret> \
    <access_token> \
    <access_token_secret> \
    [<filters>]

ie:

Do you want to know what is going on with FARC and paro?

$HOME/spark-1.5.2/bin/spark-submit \
    --class com.github.vspiewak.TwitterSentimentAnalysis \
    --master local[2] \
    target/scala-2.10/twitter-sentiment-analysis-assembly-0.1-SNAPSHOT.jar \
    <consumer_key> \
    <consumer_secret> \
    <access_token> \
    <access_token_secret> \
    FARC paro

Docker setup (To test!)

Configure Twitter OAuth in docker-compose.yml file.

After installing docker & docker-compose, you can launch the stack with the following commands:

docker-compose build
docker-compose up

After a while, Elasticsearch, Kibana, and Spark should be up:

About

Streaming tweets with spark, language detection & sentiment analysis, dashboard with Kibana

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 72.4%
  • Shell 27.6%