The project aims to demonstrate how to use real time social media data to have a quick glimpse of what people are talking about a specific topic right now.
- Download streaming data from Twitter.
- Perform Latent Dirichlet Allocation (LDA) on the data, to quickly summarise different sub-topics.
This is put together during the Data Science HK unhackathon on 20170813.
See requirements.txt
pip install -r requirements.txt
Run download.py
python src/download.py
The downloaded data would be saved to tweetdb.sqlite as a sqlite database in a table called tweets.
The following packages would be required
- tidyverse
- tidytext
- TODO
- src/sql_to_csv.R - convert sqlite database into csv file called tweets.csv
- src/word_cloud.R - shows word cloud of the tweets
- src/topic_model.R - perform latent dirichlet allocation on the topic model