Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reward shaping in RL algorithms using benchmark returns #1220

Open
arunbharadwaj2009 opened this issue Apr 30, 2024 · 1 comment
Open

Reward shaping in RL algorithms using benchmark returns #1220

arunbharadwaj2009 opened this issue Apr 30, 2024 · 1 comment

Comments

@arunbharadwaj2009
Copy link

I want to build an RL algo that will understand the concept of beating a benchmark (say S&P500), at a tic level. So if a tic is constantly beating the benchmark, the algo should prefer to pick that tic more often, versus a tic that keeps losing to the benchmark.

How should I make this happen?

Can I setup a feature that keeps checking on monthly basis, if a tic beat the benchmark and sends this as a signal to the RL algo? It could be a binary or a numeric feature (delta between tic and benchmark monthly return). But even then this, will be just a feature and is not really altering the reward signal. How do I alter the reward signal to achieve this?

@zhumingpassional
Copy link
Collaborator

after training a agent. Introduce a feature, which is a mask, i.e., being 1 if beating and 0 otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants