Skip to content

My Master Thesis: Developing a financial market sentiment analysis model for social media content

Notifications You must be signed in to change notification settings

moritzwilksch/MasterThesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎓 Master Thesis

📄 Published in the International Journal of Data Management and Data Insights as PyFin-sentiment: Towards a machine-learning-based model for deriving sentiment from financial tweets

My Master Thesis. The actual write up can be found in this other repo

The Project

Developing a sentiment analysis model for financial social media posts

The Problem

There is loads of research on sentiment analysis models for social media posts (Hutto & Gilbert, 2014; Barbierie et al., 2020) and on sentiment analysis of financial texts like news and corporate filings (Loughran & McDonald, 2011; Araci, 2019). However, the research on financial social media posts (think StockTwits, Reddit r/wallstreetbets, and Twitter) is limited.

The Status-Quo

Researchers often utilize sentiment models from the adjacent domains of finance or generic social media. Therefore, be benchmark the most common models: VADER (Hutto & Gilbert, 2014), NTUSD-Fin (Chen et al., 2018), FinBERT (Araci, 2019), and TwitterRoBERTa (Barbierie et al., 2020)

The Solution

We collect and label 10,000 tweets and train a varietiy of sentiment analysis models comparing their performance and compute footprints. The detailed methodology can be found here. The final models will be open-sourced and availabe for anyone to use as pyFin-sentiment: a python package for sentiment analysis of financial social media posts.

Performance

On Tweets

Out-of-sample ROC AUC of proposed and existing models on the collected dataset of 10,000 tweets.

image

On StockTwits Posts

Out-of-sample ROC AUC of proposed and existing models on a dataset of StockTwits posts.

Using the Fin-SoMe dataset compiled by Chen et al. (2020) image

Resourcefulness

Measured as inference time per sample (ms) on a system with an AMD Ryzen 5 3600 CPU and 64GB of RAM image

pyFin-Sentiment

This work set out to publish a usable model artifact to provide future research with more accurate sentiment assessments. We therefore publish the proposed logistc regression model in an easy-to-use python library called pyFin-Sentiment

References

  1. Araci, D. (2019). Finbert: Financial sentiment analysiswith pre-trained language models. arXiv preprint arXiv:1908.10063
  2. Barbieri, F., Camacho-Collados, J., Neves, L., & Espinosa-Anke, L. (2020). Tweeteval: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2010.12421.
  3. Chen, C.-C., Huang, H.-H., & Chen, H.-H. (2018). Ntusd-fin: a market sentiment dictionary for financial social media data applications. In Proceedings of the 1st financial narrative processing workshop (fnp 2018).
  4. Chen, C.-C., Huang, H.-H., & Chen, H.-H. (2020). Issues and perspectives from 10,000 annotated financial social media data. In Proceedings of the 12th language resources and evaluation conference (pp. 6106–6110).
  5. Hutto, C., &Gilbert, E. (2014). Vader: Aparsimonious rule-based model for sentiment analysis of social media text. InProceedings ofthe international aaai conference on web andsocial media (Vol. 8, pp. 216–225).
  6. Loughran, T.,&McDonald, B. (2011).When is aliabilitynotaliability? textual analysis, dictionaries, and 10-ks. The Journal offinance, 66(1), 35–65.

About

My Master Thesis: Developing a financial market sentiment analysis model for social media content

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published