Improving S&P stock prediction with time series stock similarity

Overview

Stock market prediction with forecasting algorithms is a popular topic these days where most of the forecasting algorithms train only on data collected on a particular stock. In this work, we enriched the stock data with related stocks just as a professional trader would have done to improve the stock prediction models. We tested five different similarities functions and found co-integration similarity to have the best improvement on the prediction model. We evaluate the models on seven S&P stocks from various industries over five years period. The prediction model we trained on similar stocks had significantly better results with 0.55 mean accuracy, and 19.782 profit compare to the state of the art model with an accuracy of 0.52 and profit of 6.6.

Framework

In order to evaluate if stocks similarity improves a baseline model, we conduct two-step experiments (back-testing) to evaluate different types of configurations. The first experiment goal is to come up with a processing pipeline and a baseline model. The second experiment is to evaluate how different stock similarity functions influence the baseline model.

A configuration tree of all the setup to be optimize and evaluate in workflow pipeline

The S&P dataset contains daily historical data for all the S&P (Standard & Poor) 500 stock market index companies from 2012 to 2017. The features given are date, open price, closing price, highest price, lowest price, volume, and the short name of the stock. The S&P is an American stock index of the largest companies listed in NYSE or NASDAQ, maintained by S&P Dow Jones Indices. It covers about 80 percent of the American equity market by capitalization.

We apply the evaluation process on stocks from different industries:

Consumer (Disney - DIS, Coca Cola KO)
Health (Johnson and Johnson - JNJ)
Industrial (General electric - GE , 3M - MMM)
Information technology (Google - GOOGL)
Financial (JP Morgan - JPM)

The validation folds are set to five and prepared for each stock separately.

Results

The evaluation metrics are accuracy score and F1 score; we calculate each metric per class (increase/decrease) and average it to one score. To evaluate the model profit, we implement a simple Buy & Hold algorithm that applies a long or short position regarding the model price prediction. We also measure the risk of the strategy with the Sharp ratio.

Experiment 1 processing parameters results - transformation function, features and temporal modeling. (rows - configuration , columns - metrics and color - profit scale)

Experiment 1 prediction parameters results - prediction model, Horizon and Value (rows - configuration, columns - prediction value with metrics and color - profit scale

Experiment 2 random selection compare - a profit comparison between SAX and co-integration similarities on top 50 stocks and random stock selection

Experiment 2 folds profit per stock - a profit comparison between top 50 stocks from co-integration similarity(orange) and 100 random stock selection enhancement (Blue) for each stock (x axis) in different folds (y axis)

for more information on the methods results, and deeper analysis of the stock similarities check our paper pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
archive		archive
sandp500		sandp500
sax_py		sax_py
utils		utils
v1_old		v1_old
Constants.py		Constants.py
README.md		README.md
configurations.png		configurations.png
data_preperation.py		data_preperation.py
evaluations.csv		evaluations.csv
exp1_models.png		exp1_models.png
exp1_prep.png		exp1_prep.png
exp2_profit_plot.png		exp2_profit_plot.png
exp2_rand_sim.png		exp2_rand_sim.png
full_paper.pdf		full_paper.pdf
run_experiments.py		run_experiments.py
similarity_functions.py		similarity_functions.py
support_funcs.py		support_funcs.py
trade_strategies.py		trade_strategies.py

liorsidi/sp500-stock-similarity-time-series

Folders and files

Latest commit

History

Repository files navigation

Improving S&P stock prediction with time series stock similarity

Overview

Framework

Results

About

Resources

Stars

Watchers

Forks

Languages