Skip to content

sohomghosh/FinRAD_Financial_Readability_Assessment_Dataset

Repository files navigation

FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability

This repository contains 2 samples (sample-1, sample-2) from the dataset mentioned in the paper: FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability (accepted at The Financial Narrative Processing Workshop colocated with LREC-2022, Marseille, France).

In addition to this, data collection & cleaning scripts, embedding extraction & model development script, and a starter example are also present. You can dowloand the model along with the weights from Hugging Face.

The embeddings & labels of the full dataset are available in the embeddings_and_labels directory. Several model artifacts developed by training classiers like Logistic Regression, GBM, Random Forest on the entire dataset have been made available in the models directory.

To access the raw version of the full dataset, please send a request by filling this form. You can also re-create the raw datasets using the data collection & cleaning scripts.

alt text

Metadata of FinRAD

Primary Columns:
"terms": This is the financial term
"definitions": This is the definition corresponding to the financial term
"source": This represents the source from which the term and the definition has been obtained.
"assigned_readability": This is the manually assigned readability. 0 means not readable, 1 means readable.

Other Columns:
"flesch_reading_ease", "flesch_kincaid_grade", "smog_index", "coleman_liau_index", "automated_readability_index", "dale_chall_readability_score", "linsear_write_formula", "gunning_fog"
These are readability scores extracted using the textstat library

Metadata of source

Tag Description Assigned Readability
prin Principles of Corporate Finance by Richard A. Brealey, Stewart C. Myers, Franklin Allen 0
zvi Investments by Zvi Bodie Alex Kane Alan J. Marcus 0
sam Economics Textbook by Paul Samuelson and William Nordhaus 1
opod Options, Futures, and Other Derivatives, Global Edition by John C. Hull 0
fmi Financial Markets and Institutions by Frederic S. Mishkin Stanley Eakins 0
ncert_keec111 NCERT Indian Economic Development Economics Class 11 1
ncert_kest NCERT Statistics for Economics Class 12 1
ncert NCERT Introduction to MacroEconomics Class 12 1
ncert_class12_econ NCERT Introduction to MicroEconomics Class 12 1
investopedia Investopedia Data Dictionary 1
economist The Economist terms dictionary 1
6_8_louis Glossary of Economics and Personal Finance Terms from Federal Reserve Bank of St. Louis 1
9_12_louis Glossary of Economics and Personal Finance Terms from Federal Reserve Bank of St. Louis 1
pre_louis Glossary of Economics and Personal Finance Terms from Federal Reserve Bank of St. Louis 1
palgrave The Palgrave Macmillan Dictionary of Finance, Investment and Banking by Erik Banks 0

Citing & Authors

If you find this repository helpful, feel free to cite our forthcoming publication [FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability](to be updated):

@InProceedings{ghosh-EtAl:2022:FNP,
  author    = {Ghosh, Sohom  and  Sengupta, Shovon  and  Naskar, Sudip Kumar and  Singh, Sunny Kumar},
  title     = {FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability},
  booktitle      = {Proceedings of the The 4th Financial Narrative Processing Workshop @LREC2022},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {1--9},
  url       = {http://www.lrec-conf.org/proceedings/lrec2022/workshops/FNP/pdf/2022.fnp-1.1.pdf}
}

and our demo/tool presented at ICON 2021. The artifacts of this demo are available in the old_model_FinRead directory.
New model trained on 13K+ instances (using Logistic Regression): HuggingFace Spaces link
Old model trained on 8K+ instances (using lightgbm classifier): Google Colab link

@inproceedings{ghosh-etal-2021-finread,
    title = "{F}in{R}ead: A Transfer Learning Based Tool to Assess Readability of Definitions of Financial Terms",
    author = "Ghosh, Sohom  and
      Sengupta, Shovon  and
      Naskar, Sudip  and
      Singh, Sunny Kumar",
    booktitle = "Proceedings of the 18th International Conference on Natural Language Processing (ICON)",
    month = dec,
    year = "2021",
    address = "National Institute of Technology Silchar, Silchar, India",
    publisher = "NLP Association of India (NLPAI)",
    url = "https://aclanthology.org/2021.icon-main.81",
    pages = "658--659"
    }

alt text

Contact: sohom1ghosh@gmail.com

For any part of this work for which the license is applicable, this work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internationallicense. See LICENSE.CC-BY-NC-SA-4.0.

About

FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published