GitHub - KwokHing/Exploratory-Data-Analysis-on-SMRT-Tweets: Demo on performing exploratory data analysis (EDA) on train service disruptions based on scrapped (user generated contents) tweets from the train operator's (SMRT) twitter account

Project Overview

This demo will provide a brief introduction in performing a rudimentary analysis on train service disruptions in Singapore. Data scrapped are from the SMRT's twitter account and wikipedia containing the relevant train stations information such as name and code

scraping of data from website (twitter) using Selenium
scraping of tabular data from website (wikipedia) using Xpath
exploratory data analysis (EDA) on the scrapped data
data cleaning, data prepration and processing
loading of .shp (shape) files into Python
geospatial analysis on frequency of service disruptions using Folium & Leaflet

There are two primary methods of extracting data from the SMRT tweets (twitter website). The first method was to use the provided twitter API for getting SMRT tweets, while the second method was to scrap information out from the HTML codes on the official SMRT twitter website (https://twitter.com/smrt_singapore). Due to limitation on the number of tweets the twitter's API could be pulled and an expected substantial number of SMRT tweets involved (approximately 4000 tweets), the latter method was employed to overcome twitter API's rate limitation.

This codes are submitted as a web scraping project for NTU's WKW H6752 - Data Extraction Techniques module.

Getting started

Open 1_scrape_tweets.ipynb and 2_geospatial_EDA_tweets.ipynb on a jupyter notebook environment, or Google colab. The notebook consists of further technical details.

1_scrape_tweets.ipynb shows the steps taken to scrape tweets from twitter using Selenium
2_geospatial_EDA_tweets.ipynbshows the steps taken to generate a heat map on the frequency of train breakdowns

Improvements

To perform scraping and generate SBS train breakdowns heat map as well.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
images		images
utils		utils
1_scrape_tweets.ipynb		1_scrape_tweets.ipynb
2_geospatial_EDA_tweets.ipynb		2_geospatial_EDA_tweets.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images

images

utils

utils

1_scrape_tweets.ipynb

1_scrape_tweets.ipynb

2_geospatial_EDA_tweets.ipynb

2_geospatial_EDA_tweets.ipynb

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Project Overview

Getting started

Improvements

About

Languages

License

KwokHing/Exploratory-Data-Analysis-on-SMRT-Tweets

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Getting started

Improvements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages