Analysis of Geo-coded Social Media Data in HK

1. Introduction

In this repository, I will show how to analyze the geo-coded social media data posted in Hong Kong. The general procedure is the following:

Tweet filtering. For more information, please check the following Jupyter notebooks:
- tweet_filtering_process
Tweet text preprocessing
- Please check the clean the text sample notebook for how to get the raw Chinese tweet text
- Please check the tweet cleaning notebook to know how we clean, translate and preprocess the tweet for this work
Generate tweet representation using FastText word embedding based on sentiment140
Manually label the sentiment of 5000 tweets randomly sampled from our tweet dataset
Build Sentiment analysis classifiers and conduct cross validation. To check how to train the word embedding model based on sentiment140, please check the train_word_vectors_from_sentiment140 folder. To generate the tweet representation for each tweet of our own dataset, please visit the emoji2vec notebook or the code get_tweet_representation.py
Cross sectional analysis and longitudinal analysis
Difference-in-difference analysis
Result visualization(word cloud, topic modelling, etc)

2. Prerequisite Python Packages

In this project, I am using Python 3.5 to analyze the tweets. You could install all relevant packages by running the following code in the command line:

pip install -r requirements.txt

However, in the transit_non_transit_comparison folder, you need the ArcPy package to do the geographical analysis. This package is only supported in Python 2+ and could only be imported after downloading the ArcGIS.

3. Some Results

To be continued.....

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
Datasets		Datasets
Figures		Figures
Other		Other
Recent Research About Social Media Data Analysis		Recent Research About Social Media Data Analysis
Visualization		Visualization
generate_tweet_representation		generate_tweet_representation
train_word_vectors_from_sentiment140		train_word_vectors_from_sentiment140
transit_non_transit_comparision		transit_non_transit_comparision
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Use_Google_Places_API_to_find_geoinformation_MTR_stations.ipynb		Use_Google_Places_API_to_find_geoinformation_MTR_stations.ipynb
cities_bounds.py		cities_bounds.py
clean_the_text_sample.ipynb		clean_the_text_sample.ipynb
random_sample_for_human_review.py		random_sample_for_human_review.py
requirements.txt		requirements.txt
sentiment_computation.py		sentiment_computation.py
tweet_cleaning_final_github.ipynb		tweet_cleaning_final_github.ipynb
tweet_filtering_final_github.ipynb		tweet_filtering_final_github.ipynb
utils.py		utils.py

License

bright1993ff66/Social-Media-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis of Geo-coded Social Media Data in HK

1. Introduction

2. Prerequisite Python Packages

3. Some Results

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages