Reward shaping in RL algorithms using benchmark returns #1220

arunbharadwaj2009 · 2024-04-30T11:16:27Z

I want to build an RL algo that will understand the concept of beating a benchmark (say S&P500), at a tic level. So if a tic is constantly beating the benchmark, the algo should prefer to pick that tic more often, versus a tic that keeps losing to the benchmark.

How should I make this happen?

Can I setup a feature that keeps checking on monthly basis, if a tic beat the benchmark and sends this as a signal to the RL algo? It could be a binary or a numeric feature (delta between tic and benchmark monthly return). But even then this, will be just a feature and is not really altering the reward signal. How do I alter the reward signal to achieve this?

zhumingpassional · 2024-05-06T06:45:44Z

after training a agent. Introduce a feature, which is a mask, i.e., being 1 if beating and 0 otherwise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward shaping in RL algorithms using benchmark returns #1220

Reward shaping in RL algorithms using benchmark returns #1220

arunbharadwaj2009 commented Apr 30, 2024

zhumingpassional commented May 6, 2024

Reward shaping in RL algorithms using benchmark returns #1220

Reward shaping in RL algorithms using benchmark returns #1220

Comments

arunbharadwaj2009 commented Apr 30, 2024

zhumingpassional commented May 6, 2024