Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation #9859

2302680972 · 2024-02-23T15:21:31Z

Describe your environment

Operating system: Windows11
Python Version: 3.9.18 (python -V)
CCXT version: using docker with latest version of freqtrade (pip freeze | grep ccxt)
Freqtrade Version: 2024.1 (freqtrade -V or docker compose run --rm freqtrade -V for Freqtrade running in docker)

Describe the enhancement

I'm exploring the reinforcement learning aspect of Freqai and have a suggestion regarding the calculate_reward function. I believe that utilizing columns in the DataFrame which not designated as features might offer some benefits for training the model.

I'm currently experimenting with creating a new column in the feature_engineering_standard method to mark the positions of local highest and lowest prices without including them as features. My intention is to leverage this information in the calculate_reward function.

Clearly, directly incorporating this result as a feature would introduce lookahead bias. Determining the highest and lowest prices requires data from several candles into the future, which is unacceptable for features.

Therefore, I propose to use this information in the reward calculation without including it as a feature. During the initial training process, where the model solely relies on historical data, these local optima values could potentially enhance training efficiency by reinforcing the correlation between correct actions and current features. This could accelerate early-stage training to some extent.

Although this mechanism would be ineffective in subsequent real-time runs due to the lack of future price data, it could still serve as a valuable aid during model training.

Unfortunately, in the MyRLEnv class, the DataFrame I encountered only contains columns designated as features, preventing me from practically implementing this idea.

Is it feasible in the current version of Freqai to utilize values from the DataFrame in the calculate_reward function without including them as features? I believe this would greatly benefit model training, as the features would still be derived solely from past data, while the reward function provides a more accurate evaluation.

The text was updated successfully, but these errors were encountered:

2302680972 · 2024-02-24T19:16:44Z

Here is another question:

Would the current state of the model - such as the current position type, trading time, floating profit and loss, etc. - be added as features and used for training?

When we evaluate the model's behavior using the reward function, does the model have the ability to understand its own state? Are aspects like the current position and floating profit and loss included as part of the features observable by the model?

Clarifying the above questions could be helpful for creating a good model. If real-time state information is part of the features, it seems that the model can adjust its next actions based on the current investment situation. For example, when adjusting positions is not allowed, the model may be more inclined to not issue opening signals when there is an existing position.

If other types of investments and floating profit and loss for different coins can also be included as features, then the model may have the potential to learn to control the investment ratio in different directions.

robcaulk · 2024-02-24T22:47:06Z

Hello,

For your first question regarding carrying non features into the environment, this is certainly possible, but it requires development. We are open to accepting a PR on this if you feel it is important to you. Please go ahead and submit the PR for review.

Regarding your second question, yes you can include state info as features in your model during dry/live runs. Please have a look at the documentation where we outline all the functionality of Reinforcement Learning and associated parameters:

https://www.freqtrade.io/en/stable/freqai-parameter-table/#reinforcement-learning-parameters

cheers,

rob

xsa-dev · 2024-04-27T09:06:56Z

The developer suggests adding functionality that would allow using values from the DataFrame in the calculate_reward function without including them as features. This would enhance model training since the reward evaluation would be more accurate, utilizing additional information about local price extremes without the bias of peeking into the future.

Describe the enhancement

I'm exploring the reinforcement learning aspect of Freqai and have a suggestion regarding the calculate_reward function. I believe that utilizing columns in the DataFrame which not designated as features might offer some benefits for training the model.

I'm currently experimenting with creating a new column in the feature_engineering_standard method to mark the positions of local highest and lowest prices without including them as features. My intention is to leverage this information in the calculate_reward function.

Clearly, directly incorporating this result as a feature would introduce lookahead bias. Determining the highest and lowest prices requires data from several candles into the future, which is unacceptable for features.

Therefore, I propose to use this information in the reward calculation without including it as a feature. During the initial training process, where the model solely relies on historical data, these local optima values could potentially enhance training efficiency by reinforcing the correlation between correct actions and current features. This could accelerate early-stage training to some extent.

Although this mechanism would be ineffective in subsequent real-time runs due to the lack of future price data, it could still serve as a valuable aid during model training.

Unfortunately, in the MyRLEnv class, the DataFrame I encountered only contains columns designated as features, preventing me from practically implementing this idea.

Is it feasible in the current version of Freqai to utilize values from the DataFrame in the calculate_reward function without including them as features? I believe this would greatly benefit model training, as the features would still be derived solely from past data, while the reward function provides a more accurate evaluation.

Hey there! Could you possibly include a code snippet about the issue you're facing? It would help make the problem clearer.

richardjozsa · 2024-05-02T20:13:36Z

Here is another question:

Would the current state of the model - such as the current position type, trading time, floating profit and loss, etc. - be added as features and used for training?

When we evaluate the model's behavior using the reward function, does the model have the ability to understand its own state? Are aspects like the current position and floating profit and loss included as part of the features observable by the model?

Clarifying the above questions could be helpful for creating a good model. If real-time state information is part of the features, it seems that the model can adjust its next actions based on the current investment situation. For example, when adjusting positions is not allowed, the model may be more inclined to not issue opening signals when there is an existing position.

If other types of investments and floating profit and loss for different coins can also be included as features, then the model may have the potential to learn to control the investment ratio in different directions.

The environment does append thoose data, to the features -like loss and trade duration- it is already there in the code, you just need to read it.

The idea is good and already possible the self.prices table is the raw prices-close,open, high,low-, you are free to use them and calculate the indicators what you need even if they are not there in the features table(I've done it many times). This is also easy to do by reading the code. :)

From RL perspective, might not be a good idea to reward the agent based on future stuff as you introduce more 'partial observation' stuff to your own agent.

xmatthias added the freqAI Issues and PR's related to freqAI label Feb 24, 2024

xmatthias assigned robcaulk Feb 24, 2024

xmatthias changed the title ~~Suggestion for Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation~~ Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation #9859

Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation #9859

2302680972 commented Feb 23, 2024

2302680972 commented Feb 24, 2024

robcaulk commented Feb 24, 2024

xsa-dev commented Apr 27, 2024

Describe the enhancement

richardjozsa commented May 2, 2024 •

edited

Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation #9859

Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation #9859

Comments

2302680972 commented Feb 23, 2024

Describe your environment

Describe the enhancement

2302680972 commented Feb 24, 2024

robcaulk commented Feb 24, 2024

xsa-dev commented Apr 27, 2024

Describe the enhancement

richardjozsa commented May 2, 2024 • edited

richardjozsa commented May 2, 2024 •

edited