Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize dtypes for hyperopt and backtesting to decrease memory usage #9305

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from

Conversation

AliSayyah
Copy link

@AliSayyah AliSayyah commented Oct 15, 2023

Summary

optimizes the dtypes of historical data to decrease ram usage.

Quick changelog

  • updated pandas to version 2.1.1 to prevent loss of meaningful decimals.
  • added a function to downcast dataframes using pd.to_numeric.
  • added Bottleneck to requirements. pandas recommend this library for Accelerates certain types of nan by using specialized cython routines to achieve large speedup.

What's new?

decreases RAM usage especially if lots of indicators are used. pandas defaults to Float64 but most columns can be downcasted to Float32.

Problems?

two failing tests could be looked into deeper to understand the cause of failure. I could use some help to understand the cause of this behavior. haven't seen anything else.

dependabot bot and others added 11 commits October 14, 2023 14:28
Bumps [pandas](https://github.com/pandas-dev/pandas) from 2.0.3 to 2.1.1.
- [Release notes](https://github.com/pandas-dev/pandas/releases)
- [Commits](pandas-dev/pandas@v2.0.3...v2.1.1)

---
updated-dependencies:
- dependency-name: pandas
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [orjson](https://github.com/ijl/orjson) from 3.9.7 to 3.9.9.
- [Release notes](https://github.com/ijl/orjson/releases)
- [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md)
- [Commits](ijl/orjson@3.9.7...3.9.9)

---
updated-dependencies:
- dependency-name: orjson
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
requirements.txt Outdated Show resolved Hide resolved
Copy link
Member

@xmatthias xmatthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comments.
will have a proper look once these are adressed.

@AliSayyah
Copy link
Author

AliSayyah commented Oct 15, 2023

hyperopt logs for SampleStrategy with one pair. could give a good insight about gains.

2023-10-15 16:42:11,744 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 9.63 MB
2023-10-15 16:42:11,750 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 6.42 MB
2023-10-15 16:42:11,750 - freqtrade.optimize.backtesting - INFO - Loading data from 2021-08-31 07:20:00 up to 2023-09-01 00:00:00 (730 days).
2023-10-15 16:42:11,750 - freqtrade.optimize.hyperopt - INFO - Dataload complete. Calculating indicators
2023-10-15 16:42:11,901 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 33.72 MB
2023-10-15 16:42:11,941 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 20.07 MB

@AliSayyah
Copy link
Author

AliSayyah commented Oct 15, 2023

my custom strategy with lots of indicators and 60 pair: (these are for indicator calculation step)

2023-10-15 18:12:39,643 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 154.48 MB
2023-10-15 18:12:39,794 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 120.13 MB
2023-10-15 18:12:51,101 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:12:51,239 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 86.38 MB
2023-10-15 18:13:02,388 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:13:02,520 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 86.38 MB
2023-10-15 18:13:13,591 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:13:13,725 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 86.38 MB
2023-10-15 18:13:24,864 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:13:25,004 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 86.38 MB
2023-10-15 18:13:36,377 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:13:36,517 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 85.58 MB
...

huge memory gains can be seen.

@xmatthias
Copy link
Member

my custom strategy with lots of indicators and 60 pair: (these are for indicator calculation step)

on how many candles? unfortunately that's not visible in the logs - so 730 days can be 730 candles, or 1m candles - where it'll be north of 730_000 candles.
it's also not immediately clear what each log-entry is for.

In reality, we'll want to benchmark 3 things i think to have something comparable

  • With this enhancement "as is" - in 3 places in the strategy (eventually also with intermediate results - it's not immediately clear what the logs above show
  • without the calls in populate_entry and exit (which is how i'd apply this)

for each, we'd also want the timing (how long did it take to reduce the size once or 3 times).

I'd ignore hyperopt directly - we can interpolate hyperopt from backtesting results - as we know that it'll simply execute the 2nd and 3rd step (populate_(entry, exit)_trend()) over and over again.

Copy link
Member

@xmatthias xmatthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not a huge fan of how this is done (mostly, the change in "history_utils).

Debugging failing test shows the reason:

image

The "first" df.head() is

    # Additions at the top of the page / top of the function to fix output
    pd.set_option('display.precision', 15)
    pd.set_option('display.max_columns', 1000)
    pd.set_option('display.expand_frame_repr', False)

the open/high/low/close values change.
The reason is probably clear - as it's a rounding issue - but it highlights the reason (and importance) to exclude ohlcv columns.

While this is a small absolute change, it's no longer corresponding to the original exchange candles - without the ability for the user to opt-out of this.

i think we should remove the call in this location (allow loading of the data "as is").
in all other cases, the function should be called with skip_original - to not modify the exchange data.

if not hist.empty:
hist = reduce_dataframe_footprint(hist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not a huge fan of putting this here (mostly, because it's non optional, but also because it changes values in a wrong way - see other comment for details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants