GitHub

ATT Fall Case Competition Code

This folder includes codes and part of sample data used for Big Data AT&T Fall Case Competition. This project is ranked Top 5 in this competition.

Goal of customer insights project is to identify top customer concerns, analyze customer sentiment related to ATT and provide recommendation strategies for CRM system. This project consumes documents from various social media sources and applies various natural language processing techniques and models. The programming languages in this project are R and Python.

Summary:

Collect and preprocess 50000+ reviews and tweets by APIs and Python.
Top customer concern by social media feeds (LDA).
Customer tweet sentiment analysis and prediction (SVM , TFIDF).
Custom ranking algorithm to measure the overall service quality of retailer stores in Dallas area.
Provide visualized presentation of such findings in CRM recommendation engine on top of Tableau platform. ![alt text][logo] [logo]:https://github.com/fairypp/ATT_Fall_Case_Competition_Code/blob/master/overall_rank.png

Project Structure:

Code Notes :

ATT_LDA.R : extract customer service topics by LDA method.
Corr.R : compute the correlation matrix of different demographics factors.
Preprocess.R : normalize all collected review ratings and prepare the training corpus for sentiment prediction.
Sentiment.R : predict sentiment for tweets by Max Entropy and SVM.
TwitterPublicData.R : fetch Twitter history data by Twitter APIs.
TwitterStreamData.R : fetch Twitter real-time streaming data by Twitter APIs.
File “mystopwords.txt” is used for text preprocessing.
fetch_google.py : fetch Google reviews by Google Search APIs.
fetch_yelp.py : fetch part of Yelp reviews by Yelp APIs.
File "top 100 populated cities in US.txt" is used to store geographic information of US top 100 populated cities for fetch_google.py.

Sample Data Notes:

ATT_dallas_rank_YGF.csv : all overall ranks of AT&T retail stores in Dallas area from 3 main social media platforms (Yelp, Google and Facebook), and other information like zipcode, store address, lat and long.
ATT_dallas_reviews.csv : sample review data of AT&T retail stores in Dallas area got from Yelp, Google and Facebook.
ATT_US_reviews.csv : sample review data of AT&T retail stores all over US got from Google reviews.
Demographic.csv : the demographic information we collected for Dallas area.
realtime_twitter.csv : some sample Twitter streaming data.
TMobileHelp_twitter_users.csv : some sample Twitter data related users.
LDA 15 TopicsToTerms.xlsx : LDA-extract customer service topics.

Limitations and Known Issues:

Due to Yelp APIs’ limitation, chrome tools is chosen to called Web Scraper to fetch all Yelp reviews from webpages. For the same reason, all reviews are fetched from Facebook.
Due to time limitation of competetion, custom implementation for web scraping was not developed.

Input file paths are hardcoded. This can be easily modified to be command line parameter(s).

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Python Code		Python Code
R Code		R Code
README.md		README.md
overall_rank.png		overall_rank.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Code

Python Code

R Code

R Code

README.md

README.md

overall_rank.png

overall_rank.png

Repository files navigation

ATT Fall Case Competition Code

Summary:

Project Structure:

Code Notes :

Sample Data Notes:

Limitations and Known Issues:

About

Releases

Packages

Contributors 2

Languages

fairypp/ATT_Fall_Case_Competition_Code

Folders and files

Latest commit

History

Repository files navigation

ATT Fall Case Competition Code

Summary:

Project Structure:

Code Notes :

Sample Data Notes:

Limitations and Known Issues:

About

Topics

Resources

Stars

Watchers

Forks

Languages