Financial Statement Analysis

In the role of a business or investment analyst, financial statements are relied on heavily as they are key factors when determining the overall success of a company and if an investment will provide a return on investment. Over recent years, different supervised machine learning models have been applied to help gain insights from the large number of datasets available.
The problem that is being explored in this project is how to help classify and separate different companies in various industries based on the financial information provided in 10K filings. The objective is to separate companies from various industries into different clusters and identify the key attributes in each cluster to explore any similarities between companies.

Application

A live version of the page can be found here: https://qmind-fsa.netlify.app/

On the clusters page, each company is listed along with their cluster number. By clicking on each different company, you'll be able to see other similar companies that are in the same cluster, along with some of the characteristics that were used to form those clusters. A copy of the shapley values plot is also included. You may also search for companies using their ticker symbol or their name, using the Search Box located on the top-right of the page.

Technical Details

The dataset was first processed by removing all NaN values by various strategies, such as removing attributes and rows that had high NaN rates.
Various clustering techniques were used, such as k-means clustering, DBSCAN, expectation maximum, and more. After normalizing the data, it was found that there was still a lot of variation in attributes for the companies. This led to many companies being marked as noise, causing many clustering techniques such as DBSCAN to be rendered useless.
However, clustering using Affinity Propagation and Gaussian Mixture allowed for clear distinct clusters to form.
This final model uses Affinity Propagation clustering to form the company groups.
Some levels of supervised learning using Random Forests and Shapley values were also used to find some of the factors involved in forming the clusters. The summary of the top factors impacting all clusters can be found below (analyzed using Shapley Values):

Team Members:

George Trieu (Computer Engineering @ Queen's)
Jackson Kehoe (Computer Engineering @ Queen's)
Alexia Tecsa (Commerce @ Queen's)
Raisa Sayed (Commerce @ Queen's)
Nicolas Wills (Computer Engineering @ Queen's)

Dataset:

https://www.kaggle.com/cnic92/200-financial-indicators-of-us-stocks-20142018

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Jupyter		Jupyter
data		data
fsa-ui-dev		fsa-ui-dev
processed_data		processed_data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jupyter

Jupyter

data

data

fsa-ui-dev

fsa-ui-dev

processed_data

processed_data

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Financial Statement Analysis

Application

Technical Details

Team Members:

Dataset:

About

Releases

Packages

Contributors 3

Languages

geotrieu/fsa

Folders and files

Latest commit

History

Repository files navigation

Financial Statement Analysis

Application

Technical Details

Team Members:

Dataset:

About

Resources

Stars

Watchers

Forks

Languages