Performing NLP on Transcripts of Multiple Companie's Earnings Calls

Seeking Alpha is a website that provides information on the stock market. One of the stockmarket related services that they provide are transcripts of Earnings Calls made with a variety of companies. These earnings calls are updates on how the company has done in it's earnings during a period of time, usually in the form of yearly quarters.

Problem Statement

The client wanted to have the Earning Call Transcripts scraped from Finding Alpha and have them analyzed for sentiment and complexity.

Data Collection

I gathered the data by using Selenium. I started by collecting the urls of all transcripts that I wanted to collect. I then went to each transcript page individually and scraped them one at a time.

In order to avoid having the website question my code's humanity I started by changing the user agent name. When having just one user name stopped working I had the browser close and then reopen with a new user name for every pull. Eventually I had to add in a sleep timer of 20 seconds, but the full scrape of a single url took almost a full minute. This meant that collecting the number of transcripts I wanted took a few days to collect.

Data Analysis

This project gave me the opportunity to learn about the Gunning Fog formula for complexity. The full formula is as follows:

The idea is that the result is supposed to represent the number of years one would have to spend in a school to easily understand the piece of text. Thus, a score of 12 would suggest an equivalent of a high school senior. Unfortunately this doesn’t show in practice as many results can end up in the highs of 24 or more. Thus this score should best be taken as a measurement without the comparison to years in school.

Results

I was able to provide the information requested and created a column for the Complexity and Sentiment scores for both the speech and Q-and-A portions of the transcripts. Is there more info to elaborate on the results? It seems brief compared to the other sections.

Future Steps

Setting up an AWS instance for running the code in the cloud so the work will not require a computer on hand. Slowly scraping the remainder of the transcripts with the cloud instance. The original scraping was done via my own personal computer and with 1 transcript being scraped per second it would be best to automate this on somebody else’s machine.
Storing the data scraped in an SQL server. The data I scraped for 6,000 transcripts was over the 100MB limit for Github in a few of it’s different forms. Thus over 100,000 would again be best suited for somebody else’s machine. Like Amazon’s machine(s) or another cloud service.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.DS_Store		.DS_Store
AAPL_transcript_analysis.ipynb		AAPL_transcript_analysis.ipynb
Final_nlp_code.ipynb		Final_nlp_code.ipynb
Old_code_from_earnings_calls.ipynb		Old_code_from_earnings_calls.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.DS_Store

.DS_Store

AAPL_transcript_analysis.ipynb

AAPL_transcript_analysis.ipynb

Final_nlp_code.ipynb

Final_nlp_code.ipynb

Old_code_from_earnings_calls.ipynb

Old_code_from_earnings_calls.ipynb

README.md

README.md

Repository files navigation

Performing NLP on Transcripts of Multiple Companie's Earnings Calls

Problem Statement

Data Collection

Data Analysis

Results

Future Steps

About

Releases

Packages

Languages

TerraJRiley/NLP_of_Company_Earnings

Folders and files

Latest commit

History

Repository files navigation

Performing NLP on Transcripts of Multiple Companie's Earnings Calls

Problem Statement

Data Collection

Data Analysis

Results

Future Steps

About

Resources

Stars

Watchers

Forks

Languages