Skip to content

dang-trung/crypto-return-predictor

Repository files navigation


Logo

Cryptocurrency Returns Predictor

An Application of Random Forest!

MIT License GitHub LinkedIn

Project Description

Introduction

  • Objective: Project for my intern at Research Center VERA, Ca' Foscari University of Venice.

  • Abstract: Use sentiment-based features to predict cryptocurrency returns. Models used: Random Forest Classifier, Random Forest Regressor, and VAR time-series model. Analysis timeframe: 28/11/2014 - 25/07/2020.

  • Status: Completed.

Methods Used

  • Random Forests (Regressor & Classifier)
  • Principal Component Analysis
  • Vector Autoregression (VAR) model
  • Sentiment Indicators (retrieved from my graduation thesis)

Dependencies

  • Python 3
  • numpy==1.18.5
  • pandas==1.0.5
  • scikit-learn==0.23.2
  • statsmodels==0.12.0
  • plotly==4.9.0

Interesting Results to Keep You Reading

Backtesting strategies based on 3 models:

  • Generate trading signals: Long as predicted return > 0, short as predicted return < 0, wait otherwise.
  • Test period (25% of the dataset): 05/03/2019 - 25/07/2020
  • RF Classifier outperforms significantly both strategies and also the simple buy-and-hold strategy. alt text
  • Download the interactive version.

Table of Contents

Getting Started

How to Run

  1. Clone this repo:
    git clone https://github.com/dang-trung/crypto-return-predictor

  2. Create your environment (virtualenv):
    virtualenv -p python3 venv
    source venv/bin/activate (bash) or venv\Scripts\activate (windows)
    (venv) cd crypto-return-predictor
    (venv) pip install -e

    Or (conda):
    conda env create -f environment.yml
    conda activate crypto-return-predictor

  3. Run in terminal:
    python -m crypto_return_predictor

Dependent Variable/Target

Cryptocurrency market returns (computed using the market index CRIX, retrieved here, see more on how the index is created at Trimborn & Härdle (2018) or those authors' website.)

Sentiment Measures

  • Sentiment score of Messages on StockTwits, Reddit Submissions, Reddit Comments
  • Messages volume on StockTwits, Reddit Submissions, Reddit Comments.
  • Market volatility index VCRIX (see how the index is created: Kolesnikova (2018), retrieved here.)
  • Market trading volume (retrieved using Nomics Public API)

Read more on how I retrieve these sentiment measures in my graduation thesis or its Github repo.

Features Selection

  • For VAR model: Lagged values of the first principal component of all 9 sentiment measures (up to 5 lags).
  • For Random Forests: Sentiment measures' lagged Values (up to 5 lags).

Results (Test Period)

Order by performance (from high to low):

  1. Random Forest Classifier:
  • Accuracy: 61.86%
  • Confusion matrix:
Actual
Negative Unchanged Positive
Predicted Negative 145 0 97
Unchanged 1 0 0
Positive 96 0 170
  • Backtesting daily returns: ~91bps
  1. VAR(5):
  • Accuracy: 54.62%
  • Confusion matrix:
Actual
Negative Unchanged Positive
Predicted Negative 57 0 185
Unchanged 0 0 1
Positive 45 0 221
  • Backtesting daily returns: ~48bps
  1. Random Forest Regressor:
  • Accuracy: 56.19%
  • Confusion matrix:
Actual
Negative Unchanged Positive
Predicted Negative 222 0 20
Unchanged 1 0 0
Positive 202 0 64
  • Backtesting daily returns: ~19bps (just slightly better than holding the CRIX index)

Read More

For better understanding of the project, kindly read the report.

About

Utilized sentiment-based features to predict cryptocurrency returns, models used: Random Forest Classifier, Random Forest Regressor, and VAR time-series model

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages