Skip to content

SUTD ISTD 2021 50.034 Introduction to Probability & Statistics 1D Project

License

Notifications You must be signed in to change notification settings

jamestiotio/pns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SUTD ISTD 2021 50.034 Introduction to Probability & Statistics 1D Project

Topic: Non-Transitivity Property of Pearson's Correlation Coefficient

Code style: black

Team Members:

Repository Details

This repository houses the code scripts, as well as the datasets that we utilized in this short research as we attempt to find a real-world/real-life example/sample to showcase the non-transitivity property of Pearson's correlation coefficient between 3 random variables.

Usage

These are the available flags and options:

$ python3 main.py --help
Usage: main.py [-h] -f FILE [-t [THRESHOLD]] [-o OUTPUT]

Conveniently select appropriate/relevant triplets of random variables to
demonstrate the non-transitivity property of Pearson's correlation
coefficient.

Options:
  -h, --help            show this help message and exit
  -f FILE, --file FILE, --dataset FILE, --csv FILE
                        the CSV dataset input file to be processed (default:
                        None)
  -t [THRESHOLD], --threshold [THRESHOLD]
                        the threshold for the correlation coefficient strength
                        to be considered/taken into account (default: 0.7)
  -o OUTPUT, --output OUTPUT
                        the base non-indexed output image filename to save the
                        correlation matrix plot(s) to (default: None)

Acknowledgements

  • IEEE dataset obtained from this research paper.
  • Top 100 most valuable GitHub repositories list obtained from this article.
  • Kaggle housing prices dataset obtained from either here or here (they have the same training dataset).
  • India's graduate admissions dataset obtained from here.
  • Financial indicators of US stocks (2018) dataset obtained from here.
  • Pokemon dataset obtained from here.
  • Rolling correlation matrix of the prices of cryptocurrencies over time can be retrieved from here.
  • Latest worldwide COVID-19 per-country statistics (retrieved on 11 March 2021) provided by Our World in Data here.