GitHub - JanAndrosiuk/NDX-forecasing-using-LSTM: Financial time series forecasting using rolling LSTM neural network.

About the Project

Rolling LSTM modelling framework for stock data prediction using candlestick data, technical indicators, and macroeconomic indicator.

Requirements & Run

Install python >= 3.9.* and latest pip. Preferably using miniconda.
Install required padckages using pip: pip install -r requirements.txt
Download data and place it to data/raw. Otherwise, sample data is already in the repository.
Run main.py

Example financial time series data: (NDX OHLCV candlestick) [yahoo finance]
Initial Claims time series data [Initial Claims - Federal Reserve Bank of St. Louis]

Description of modules

1. src.data.preprocessing

Input: raw datasets (csv): config[raw] -> data/raw

This module preprocesses and joins datasets: 
OHLCV, Initial Claims (ICSA), Technical Indicators, and transforms the target variable.

Output: joined dataset (pkl, csv) config[prep][JoinedDfPkl] -> data/input/joined.pkl

2. src.data.windowSplit

Input: joined dataset (pkl) config[prep][JoinedDfPkl] -> data/input/joined.pkl

This module splits the dataset into train and test windows. 
There are 3 parameters to consider: lookback, train-window, test-window.
First of all, data is divided into train-test windows in a way that training period of next window moves 
over a test period of the previous window (see diagram #1 below).

Moreover, each train and test period is also handled using the rolling window approach (see diagram #2 below).

This approach utilizes lookback period which allows the model to train on small batches of recent data.
At the end, the code with default config should generate arrays with following dimensions:
Train window (features, targets): (N, train, look_back, n_feat), (N, train, look_forward, n_targets)
Test window (features, targets): (N, test, look_back, n_feat), (N, test, look_forward, n_targets)
where:
- N                  = resulting number of train-test windows
- look_back          = look-back period for feature matrix in each train window
- look_forward       = how many days ahead should the model predict the target (default = 1) (target period in diagram above)
- n_feat, n_targets  = number of features / targets in joined dataset
- train, test        = train, test periods

Default settings example:
Train window dimensions (features, targets): (70, 504, 63, 19), (70, 504, 1, 1)
Test window dimensions (features, targets): (70, 126, 63, 19), (70, 126, 1, 1)

Output: window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl

3. src.model.modelFitPredict

Input: window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl

The module utilizes keras Sequential model:
builds the framework, trains it on window data, and generates predictions using hyperparameters from config.ini.

Output:

numpy array of predictions (pkl): config[prep][PredictionsArray] -> data/output/latest_preds.pkl
data-to-evaluate (csv, pkl): data/output/model_eval_data_<timestamp>.pkl
model configuration (json): reports/model_config_<timestamp>.json

3.1. src.model.performanceMetrics

Input: data-to-evaluate (pkl): data/output/model_eval_data_<timestamp>.pkl

Calculates Equity Line and performance metrics: 
- Annualized Return Ratio, 
- Annualized Standard Deviation, 
- Information Ratio, 
- Maximum Loss Duration

Output:

Performance metrics dictionary (json): reports/performance_metrics_<timestamp>.json
Equity Line array (pkl): data/output/eq_line_<timestamp>.pkl

4. src.visualization.plotResults

Input:

data-to-evaluate (pkl): data/output/model_eval_data_<timestamp>.pkl
window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl
model configuration (json): reports/model_config_<timestamp>.json
Performance metrics dictionary (json): reports/performance_metrics_<timestamp>.json
Equity Line array (pkl): data/output/eq_line_<timestamp>.pkl

Visualizes results. 
Includes information about model configuration, comparison between real vs predicted data, and performance metrics.

Output:

Equity Line plot (png): reports/figures/equity_line_<timestamp>.png
Predictions histogram (png): reports/figures/predictions_histogram_<timestamp>.png

Remarks

Further improvements to be included:

Averaging the results from many runtimes (random seed cannot be currently set due to the large amount of stochastic processes)
Hyper-param tuning between windows
Real time approach

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
analysis-notebooks		analysis-notebooks
data		data
models		models
references		references
reports		reports
src		src
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis-notebooks

analysis-notebooks

data

data

models

models

references

references

reports

reports

src

src

LICENSE

LICENSE

README.md

README.md

config.ini

config.ini

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

About the Project

Requirements & Run

Description of modules

Remarks

License

About

Releases

Packages

Languages

License

JanAndrosiuk/NDX-forecasing-using-LSTM

Folders and files

Latest commit

History

Repository files navigation

About the Project

Requirements & Run

Description of modules

Remarks

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages