Skip to content

Tweets Filtering; Tweets Preprocessing; Tweet Representation; Sentiment Analysis; Geographical Analysis

License

Notifications You must be signed in to change notification settings

bright1993ff66/Social-Media-Data-Analysis

Repository files navigation

Analysis of Geo-coded Social Media Data in HK

1. Introduction

In this repository, I will show how to analyze the geo-coded social media data posted in Hong Kong. The general procedure is the following:

  1. Tweet filtering. For more information, please check the following Jupyter notebooks:

  2. Tweet text preprocessing

  3. Generate tweet representation using FastText word embedding based on sentiment140

  4. Manually label the sentiment of 5000 tweets randomly sampled from our tweet dataset

  5. Build Sentiment analysis classifiers and conduct cross validation. To check how to train the word embedding model based on sentiment140, please check the train_word_vectors_from_sentiment140 folder. To generate the tweet representation for each tweet of our own dataset, please visit the emoji2vec notebook or the code get_tweet_representation.py

  6. Cross sectional analysis and longitudinal analysis

  7. Difference-in-difference analysis

  8. Result visualization(word cloud, topic modelling, etc)

2. Prerequisite Python Packages

In this project, I am using Python 3.5 to analyze the tweets. You could install all relevant packages by running the following code in the command line:

pip install -r requirements.txt

However, in the transit_non_transit_comparison folder, you need the ArcPy package to do the geographical analysis. This package is only supported in Python 2+ and could only be imported after downloading the ArcGIS.

3. Some Results

To be continued.....

License

MIT

About

Tweets Filtering; Tweets Preprocessing; Tweet Representation; Sentiment Analysis; Geographical Analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published