Earnings Call NLP Strategy V1

ABSTRACT

Publicly-traded companies are prohibited to fabricate or deceive investors in earnings calls so it’s a useful tool for stock valuations. There may exist patterns from earnings calls that may be identified by a machine learning algorithm and used to extrapolate the direction of future stock movements. Various feature extraction techniques are used to convert earnings call transcripts (texts) to machine-readable formats (vectors). The main feature extraction methods include the use of TF-IDF and Cosine similarity; sentiment analysis using the Loughran-Mcdonalds Dictionary; and various text complexity metrics. The features are then processed through a Random Forest Classifier where both Binary (one vs rest approach) and Multi-class methods were implemented. The former achieved an average accuracy of 83%, whereas multi-class methods achieved 45% and 74% accuracy, depending on the label range.

CONCLUSION

There are several weaknesses this model face. Firstly, it doesn’t have a large dataset (18280). Secondly, it has limited features for training (only various sentiment, prose, and text complexity) which means it doesn’t consider other elements that may impact the price movement of a stock. Due to the limited and qualitative nature of existing features, it means the model cannot give a precise quantitative percentage estimation, thus explaining the difficulty with a regression model. Since the movement of share prices is not solely dependent on earnings calls and instead also relies on external information (press releases, financials, sentiment etc.), improvements must be made to ensure higher accuracies.

The model may improve by including the ability to read financial statements and spot irregularities with past statements (e.g. spotting new footnotes, and new risk factors in 10K and 10Qs). Another improvement is to include analyst rating as well as the general sentiment towards a specific ticker (and its changes over time) by scraping content from Twitter, Reddit, and StockTwits. Technical analysis can also be applied (e.g. data on Volume, MACD, RSI) by scraping off yahoo finance using the yFinance plugin, to gain higher accuracies for multi-class classification algorithms, potentially even enough information for regression models.

Nonetheless, at present, an effective strategy is to go long on shares predicted to overperform (labelled 1.0); and to short those predicted to underperform (by reversing the labels, i.e. switching ‘<’ to ‘ >=’ for the binary classification model); as well as adding empathise on companies who operate in the same industry. Howbeit, it’s prudent to expand on the current dataset as well as to avoid data imbalances using methods such as over/under-sampling, or by using a classification model that features a weighted loss function.

Furthermore, the conclusions I received may not be applicable in the real world since the data I selected for classification measures the Day 0 to 50 percent change. This is unrealistic as stocks often move pre-market/after-hours almost immediately after the earnings are announced. This means we cannot capitalise on the day 0 bounce, since transcripts often take 1-2 days before being published, nonetheless, it’s a good starting point.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Feature Extraction.ipynb		Feature Extraction.ipynb
Feature_extracted_data.csv		Feature_extracted_data.csv
ML Models.ipynb		ML Models.ipynb
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Extraction.ipynb

Feature Extraction.ipynb

Feature_extracted_data.csv

Feature_extracted_data.csv

ML Models.ipynb

ML Models.ipynb

README.md

README.md

Report.pdf

Report.pdf

Repository files navigation

Earnings Call NLP Strategy V1

About

Releases

Packages

Languages

Vxtr10/Earnings-Call-NLP-Strategy-V1

Folders and files

Latest commit

History

Repository files navigation

Earnings Call NLP Strategy V1

About

Topics

Resources

Stars

Watchers

Forks

Languages