Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation #9859

Open
2302680972 opened this issue Feb 23, 2024 · 4 comments
Assignees
Labels
freqAI Issues and PR's related to freqAI

Comments

@2302680972
Copy link

Describe your environment

  • Operating system: Windows11
  • Python Version: 3.9.18 (python -V)
  • CCXT version: using docker with latest version of freqtrade (pip freeze | grep ccxt)
  • Freqtrade Version: 2024.1 (freqtrade -V or docker compose run --rm freqtrade -V for Freqtrade running in docker)

Describe the enhancement

I'm exploring the reinforcement learning aspect of Freqai and have a suggestion regarding the calculate_reward function. I believe that utilizing columns in the DataFrame which not designated as features might offer some benefits for training the model.

I'm currently experimenting with creating a new column in the feature_engineering_standard method to mark the positions of local highest and lowest prices without including them as features. My intention is to leverage this information in the calculate_reward function.

Clearly, directly incorporating this result as a feature would introduce lookahead bias. Determining the highest and lowest prices requires data from several candles into the future, which is unacceptable for features.

Therefore, I propose to use this information in the reward calculation without including it as a feature. During the initial training process, where the model solely relies on historical data, these local optima values could potentially enhance training efficiency by reinforcing the correlation between correct actions and current features. This could accelerate early-stage training to some extent.

Although this mechanism would be ineffective in subsequent real-time runs due to the lack of future price data, it could still serve as a valuable aid during model training.

Unfortunately, in the MyRLEnv class, the DataFrame I encountered only contains columns designated as features, preventing me from practically implementing this idea.

Is it feasible in the current version of Freqai to utilize values from the DataFrame in the calculate_reward function without including them as features? I believe this would greatly benefit model training, as the features would still be derived solely from past data, while the reward function provides a more accurate evaluation.

@xmatthias xmatthias added the freqAI Issues and PR's related to freqAI label Feb 24, 2024
@2302680972
Copy link
Author

Here is another question:

Would the current state of the model - such as the current position type, trading time, floating profit and loss, etc. - be added as features and used for training?

When we evaluate the model's behavior using the reward function, does the model have the ability to understand its own state? Are aspects like the current position and floating profit and loss included as part of the features observable by the model?

Clarifying the above questions could be helpful for creating a good model. If real-time state information is part of the features, it seems that the model can adjust its next actions based on the current investment situation. For example, when adjusting positions is not allowed, the model may be more inclined to not issue opening signals when there is an existing position.

If other types of investments and floating profit and loss for different coins can also be included as features, then the model may have the potential to learn to control the investment ratio in different directions.

@robcaulk
Copy link
Member

Hello,

For your first question regarding carrying non features into the environment, this is certainly possible, but it requires development. We are open to accepting a PR on this if you feel it is important to you. Please go ahead and submit the PR for review.

Regarding your second question, yes you can include state info as features in your model during dry/live runs. Please have a look at the documentation where we outline all the functionality of Reinforcement Learning and associated parameters:

https://www.freqtrade.io/en/stable/freqai-parameter-table/#reinforcement-learning-parameters

cheers,

rob

@xmatthias xmatthias changed the title Suggestion for Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation Mar 26, 2024
@xsa-dev
Copy link
Contributor

xsa-dev commented Apr 27, 2024

The developer suggests adding functionality that would allow using values from the DataFrame in the calculate_reward function without including them as features. This would enhance model training since the reward evaluation would be more accurate, utilizing additional information about local price extremes without the bias of peeking into the future.

Describe the enhancement

I'm exploring the reinforcement learning aspect of Freqai and have a suggestion regarding the calculate_reward function. I believe that utilizing columns in the DataFrame which not designated as features might offer some benefits for training the model.

I'm currently experimenting with creating a new column in the feature_engineering_standard method to mark the positions of local highest and lowest prices without including them as features. My intention is to leverage this information in the calculate_reward function.

Clearly, directly incorporating this result as a feature would introduce lookahead bias. Determining the highest and lowest prices requires data from several candles into the future, which is unacceptable for features.

Therefore, I propose to use this information in the reward calculation without including it as a feature. During the initial training process, where the model solely relies on historical data, these local optima values could potentially enhance training efficiency by reinforcing the correlation between correct actions and current features. This could accelerate early-stage training to some extent.

Although this mechanism would be ineffective in subsequent real-time runs due to the lack of future price data, it could still serve as a valuable aid during model training.

Unfortunately, in the MyRLEnv class, the DataFrame I encountered only contains columns designated as features, preventing me from practically implementing this idea.

Is it feasible in the current version of Freqai to utilize values from the DataFrame in the calculate_reward function without including them as features? I believe this would greatly benefit model training, as the features would still be derived solely from past data, while the reward function provides a more accurate evaluation.

Hey there! Could you possibly include a code snippet about the issue you're facing? It would help make the problem clearer.

@richardjozsa
Copy link
Contributor

richardjozsa commented May 2, 2024

Here is another question:

Would the current state of the model - such as the current position type, trading time, floating profit and loss, etc. - be added as features and used for training?

When we evaluate the model's behavior using the reward function, does the model have the ability to understand its own state? Are aspects like the current position and floating profit and loss included as part of the features observable by the model?

Clarifying the above questions could be helpful for creating a good model. If real-time state information is part of the features, it seems that the model can adjust its next actions based on the current investment situation. For example, when adjusting positions is not allowed, the model may be more inclined to not issue opening signals when there is an existing position.

If other types of investments and floating profit and loss for different coins can also be included as features, then the model may have the potential to learn to control the investment ratio in different directions.

The environment does append thoose data, to the features -like loss and trade duration- it is already there in the code, you just need to read it.

The idea is good and already possible the self.prices table is the raw prices-close,open, high,low-, you are free to use them and calculate the indicators what you need even if they are not there in the features table(I've done it many times). This is also easy to do by reading the code. :)

From RL perspective, might not be a good idea to reward the agent based on future stuff as you introduce more 'partial observation' stuff to your own agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
freqAI Issues and PR's related to freqAI
Projects
None yet
Development

No branches or pull requests

5 participants