CFPB Complaints Analysis

This project is the final result for course Capstone SIADS 699.

Data Source:https://www.consumerfinance.gov/data-research/consumer-complaints/

The datasource is owned by CFPB, and it could be access through downloading in a csv file or using API. For this project, I have downloded the dataset into a csv file. But this file is greater than 1GB, thus not included in this repository. But a smaller csv could be downloaded through the link above.

Languages and Tools:

General Information

The project focuses on CFPB complaint and the data is made publicly available through the CFPB website. The CFPB collects the complaints, then sends them to the financial services companies for a response. We are interested in analyzing the complaints over time, company responses, and ongoing trends in the complaint data.
For this project, we focused on topic extraction from complaints narratives.
With the topic modelling, and the final output interface, the intention is to help CFPB to accelerate the complaint process, and be able to track topic models trend and patterns in complaint narrative. And with the interface, it is easy to predict the main topic on any new complaint narrative.

Technologies/Environment Used

AWS SageMaker
Great Lake

Setup

A requirements.txt in the folder which specified all the requirement packages needed for the project. Another requirement.yml file which is needed for the setup for Binder to display the interface. It is not required if you need to run the project on your own laptop.

Structure

Three main jupyter notebooks are the output for this project.

EDA - this jupyter notebook has Exploratary Data Analysis and focuses on visualzing some metrics such as complaints timely responce, and time gaps between a complaints gets received and complaint sent to companies.
Data Cleaning & Topic Modeling - this is the main notebook that has the data cleaning stepa, and the topic modeling part.
Output-interface - this is where we apply the model and accepts new complaint narrative and predict the major topic model.
requirement.txt - This is file lists all the packages and libraries that we may need for the project.
All other files - all the models and id2words and bigrams are saved so that if you would not like to re-run the project from the scratch, these results could be simply load to show the results.

Project Status

Project is: complete

Room for Improvement

Include areas you believe need improvement / could be improved. Also add TODOs for future development.

Room for improvement:

The current topic modeling is based on coherence value, future improvement could be using clustering as method to pick the most appropriate number of topics for the topic modeling.
The current final interface only takes the complaint narrative and predict the main topic. Due to the limitation of the server, more visualizations could not been shown, so in the future, we could consider to switch to another platform.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Data Cleaning & Topic Modeling-cleaned(output removed).ipynb		Data Cleaning & Topic Modeling-cleaned(output removed).ipynb
Data Cleaning & Topic Modeling.html		Data Cleaning & Topic Modeling.html
Data Cleaning & Topic Modeling.ipynb		Data Cleaning & Topic Modeling.ipynb
EDA-cleaned(output removed).ipynb		EDA-cleaned(output removed).ipynb
EDA.html		EDA.html
EDA.ipynb		EDA.ipynb
Output Interface.ipynb		Output Interface.ipynb
README.md		README.md
data_lemmatized_bigrams.txt		data_lemmatized_bigrams.txt
data_lemmatized_bigrams_dtm.txt		data_lemmatized_bigrams_dtm.txt
environment.yml		environment.yml
lda_model_bow_lem_5comp.model.id2word		lda_model_bow_lem_5comp.model.id2word
lda_model_bow_lem_5comp.pkl		lda_model_bow_lem_5comp.pkl
lda_model_tfidf_lem_5.model.id2word		lda_model_tfidf_lem_5.model.id2word
lda_model_tfidf_lem_5.pkl		lda_model_tfidf_lem_5.pkl
ldaseq_test.model		ldaseq_test.model
ldaseq_test.pkl		ldaseq_test.pkl
requirements.txt		requirements.txt
voila.json		voila.json

fangyf113/Capstone_699_fangyf

Folders and files

Latest commit

History

Repository files navigation

CFPB Complaints Analysis

This project is the final result for course Capstone SIADS 699.

Data Source:https://www.consumerfinance.gov/data-research/consumer-complaints/

Languages and Tools:

Table of Contents

General Information

Technologies/Environment Used

Setup

Structure

Project Status

Room for Improvement

About

Resources

Stars

Watchers

Forks

Languages