Skip to content

fangyf113/Capstone_699_fangyf

Repository files navigation

Binder

CFPB Complaints Analysis

This project is the final result for course Capstone SIADS 699.

The datasource is owned by CFPB, and it could be access through downloading in a csv file or using API. For this project, I have downloded the dataset into a csv file. But this file is greater than 1GB, thus not included in this repository. But a smaller csv could be downloaded through the link above.

Languages and Tools:

aws pandas

Table of Contents

General Information

  • The project focuses on CFPB complaint and the data is made publicly available through the CFPB website. The CFPB collects the complaints, then sends them to the financial services companies for a response. We are interested in analyzing the complaints over time, company responses, and ongoing trends in the complaint data.

  • For this project, we focused on topic extraction from complaints narratives.

  • With the topic modelling, and the final output interface, the intention is to help CFPB to accelerate the complaint process, and be able to track topic models trend and patterns in complaint narrative. And with the interface, it is easy to predict the main topic on any new complaint narrative.

Technologies/Environment Used

  • AWS SageMaker
  • Great Lake

Setup

A requirements.txt in the folder which specified all the requirement packages needed for the project. Another requirement.yml file which is needed for the setup for Binder to display the interface. It is not required if you need to run the project on your own laptop.

Structure

Three main jupyter notebooks are the output for this project.

  1. EDA - this jupyter notebook has Exploratary Data Analysis and focuses on visualzing some metrics such as complaints timely responce, and time gaps between a complaints gets received and complaint sent to companies.
  2. Data Cleaning & Topic Modeling - this is the main notebook that has the data cleaning stepa, and the topic modeling part.
  3. Output-interface - this is where we apply the model and accepts new complaint narrative and predict the major topic model.
  4. requirement.txt - This is file lists all the packages and libraries that we may need for the project.
  5. All other files - all the models and id2words and bigrams are saved so that if you would not like to re-run the project from the scratch, these results could be simply load to show the results.

Project Status

Project is: complete

Room for Improvement

Include areas you believe need improvement / could be improved. Also add TODOs for future development.

Room for improvement:

  • The current topic modeling is based on coherence value, future improvement could be using clustering as method to pick the most appropriate number of topics for the topic modeling.
  • The current final interface only takes the complaint narrative and predict the main topic. Due to the limitation of the server, more visualizations could not been shown, so in the future, we could consider to switch to another platform.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published