Trading with Reinforcement Learning

Improving the solution initially proposed here

In trading we have an action space of 3: Buy, Sell, and Sit

We set the experience replay memory to deque with 2000 elements inside it We create an empty list with inventory which contains the stocks we've already bought We need to set an gamma parameter to 0.95, which helps to maximize the current reward over the long-term

The epsilon parameter is used to determine whether we should use a random action or to use the model for the action. We start by setting it to 1.0 so that it takes random actions in the beginning when the model is not trained. Over time we want to decrease the random actions and instead we can mostly use the trained model, so we set epsilon_final to 0.01

Let's first look at how we can translate the problem of stock market trading to a reinforcement learning environment.

Each point on a stock graph is just a floating number that represents a stock price at a given time. Our task is to predict what is going to happen in the next period, and as mentioned there are 3 possible actions: buy, sell, or sit. This is regression problem - let's say we have a window_size = 5 so we use 5 states to predict our target, which is a continuous number. Instead of predicting real numbers for our target we instead want to predict one of our 3 actions.

High Priority Tasks

Pipeline architecture design → Assignee: Adrian Iordache, Status: Done
Split train, validation, test and integration development → Assignee: Adrian Iordache, Status: Done
Refactoring trader_agents.py for hyperparameter optimization (LR, loss_fn, optimizer) → Assignee: Adrian Iordache, Status: Done
Monitoring and ploting results (profit, rewards, loss) → Assignee: Adrian Iordache, Status: Done
Adding inference script for valid and test set based on existing models → Assignee: Adrian Iordache, Status: Done
Prioritized Experience Replay (we don't have this, but we need a simple replay buffer for vanilla (as in improved)) - experiments and results for validation and test → Assignee: Manea Andrei, Status: Done
Another evaluation metrics (backtesting, pyfolio, FinRL, Capital Curve) → Assignee: Sichitiu Marian, Gîdea Andrei, Status: Done
OpenAI Gym Integration ? → Assignee: Manea Andrei, Status: Aborted
Vanilla DQN - experiments and results for validation and test → Assignee: Manea Andrei, Adrian Iordache, Status: Done
DQN with fixed targets (target network) - experiments and results for validation and test → Assignee: Manea Andrei, Adrian Iordache, Status: Done
Double DQN - experiments and results for validation and test → Assignee: Manea Andrei, Status: Done
Dueling Double DQN Architectures - experiments and results for validation and test → Assignee: Manea Andrei, Status: Done
Searching for other features as input (stockstats) → Assignee: Dobre Bogdan, Sichitiu Marian, Gîdea Andrei, Status: Done
Improved Estimator Networks, Convolutional 1d vs Fully Connected, maybe LSTM or transformer encoder or transofrmer encoder decoder (Githuh Source) → Assignee: Sichitiu Marian, Gîdea Andrei, Status: Done

Future Work Tasks

Using news API for prediction fusion with stock data
Throw away all episodes with a reward below the boundary
Random offset to start the experiments
Commision of the broker (0.1%)
reward += 100.0 * (close - self.open_price) / self.open_price

Algorithm design (Class Requirements)

Introduction

• What is the problem?

• Why can't any of the existing techniques effectively tackle this problem?

• What is the intuition behind the technique that you developed?

• Techniques to tackle the problem

• Brief review of previous work concerning this problem (i.e., the 4-8 papers that you read)

https://arxiv.org/pdf/1511.05952.pdf (prioritized replay)
https://arxiv.org/pdf/2203.02628.pdf (target network)
https://arxiv.org/pdf/1511.06581.pdf (dueling dqn)
https://arxiv.org/pdf/1509.06461.pdf (double dqn)
https://arxiv.org/pdf/2011.09607.pdf (FinRL)
https://arxiv.org/pdf/1703.01327.pdf (if we use TD)
others from https://github.com/tuxedcat/PER-D3QN

• Describe the technique that you developed

• Brief description of the existing techniques that you will compare to

Evaluation

• Analyze and compare (empirically or theoretically) your new approach to existing approaches

Conclusion

• Can your new technique effectively tackle the problem?

• What future research do you recommend?

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
books		books
data		data
doc		doc
experiments		experiments
images		images
logs		logs
models		models
news_api		news_api
notebooks		notebooks
results		results
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
agents.py		agents.py
automated_training.py		automated_training.py
common.py		common.py
generate_splits.py		generate_splits.py
getters.py		getters.py
inference.py		inference.py
models.py		models.py
trader.py		trader.py

andeceneu4more/rl-stoks-pick

Folders and files

Latest commit

History

Repository files navigation

Trading with Reinforcement Learning

Improving the solution initially proposed here

High Priority Tasks

Future Work Tasks

Algorithm design (Class Requirements)

Introduction

Evaluation

Conclusion

About

Resources

Stars

Watchers

Forks

Languages