optimize dtypes for hyperopt and backtesting to decrease memory usage #9305

AliSayyah · 2023-10-15T11:29:42Z

Summary

optimizes the dtypes of historical data to decrease ram usage.

Quick changelog

updated pandas to version 2.1.1 to prevent loss of meaningful decimals.
added a function to downcast dataframes using pd.to_numeric.
added Bottleneck to requirements. pandas recommend this library for Accelerates certain types of nan by using specialized cython routines to achieve large speedup.

What's new?

decreases RAM usage especially if lots of indicators are used. pandas defaults to Float64 but most columns can be downcasted to Float32.

Problems?

two failing tests could be looked into deeper to understand the cause of failure. I could use some help to understand the cause of this behavior. haven't seen anything else.

Bumps [pandas](https://github.com/pandas-dev/pandas) from 2.0.3 to 2.1.1. - [Release notes](https://github.com/pandas-dev/pandas/releases) - [Commits](pandas-dev/pandas@v2.0.3...v2.1.1) --- updated-dependencies: - dependency-name: pandas dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [orjson](https://github.com/ijl/orjson) from 3.9.7 to 3.9.9. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.9.7...3.9.9) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

Bump pandas from 2.0.3 to 2.1.1

….9.9 Bump orjson from 3.9.7 to 3.9.9

freqtrade/data/history/history_utils.py

requirements.txt

xmatthias

See other comments.
will have a proper look once these are adressed.

freqtrade/data/history/history_utils.py

freqtrade/strategy/interface.py

AliSayyah · 2023-10-15T13:15:00Z

hyperopt logs for SampleStrategy with one pair. could give a good insight about gains.

2023-10-15 16:42:11,744 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 9.63 MB
2023-10-15 16:42:11,750 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 6.42 MB
2023-10-15 16:42:11,750 - freqtrade.optimize.backtesting - INFO - Loading data from 2021-08-31 07:20:00 up to 2023-09-01 00:00:00 (730 days).
2023-10-15 16:42:11,750 - freqtrade.optimize.hyperopt - INFO - Dataload complete. Calculating indicators
2023-10-15 16:42:11,901 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 33.72 MB
2023-10-15 16:42:11,941 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 20.07 MB

AliSayyah · 2023-10-15T14:45:52Z

my custom strategy with lots of indicators and 60 pair: (these are for indicator calculation step)

2023-10-15 18:12:39,643 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 154.48 MB
2023-10-15 18:12:39,794 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 120.13 MB
2023-10-15 18:12:51,101 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:12:51,239 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 86.38 MB
2023-10-15 18:13:02,388 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:13:02,520 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 86.38 MB
2023-10-15 18:13:13,591 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:13:13,725 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 86.38 MB
2023-10-15 18:13:24,864 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:13:25,004 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 86.38 MB
2023-10-15 18:13:36,377 - freqtrade.data.history.history_utils - INFO - Memory usage of dataframe is 149.66 MB
2023-10-15 18:13:36,517 - freqtrade.data.history.history_utils - INFO - Memory usage after optimization is: 85.58 MB
...

huge memory gains can be seen.

freqtrade/data/converter/converter.py

xmatthias · 2023-10-16T17:28:20Z

my custom strategy with lots of indicators and 60 pair: (these are for indicator calculation step)

on how many candles? unfortunately that's not visible in the logs - so 730 days can be 730 candles, or 1m candles - where it'll be north of 730_000 candles.
it's also not immediately clear what each log-entry is for.

In reality, we'll want to benchmark 3 things i think to have something comparable

With this enhancement "as is" - in 3 places in the strategy (eventually also with intermediate results - it's not immediately clear what the logs above show
without the calls in populate_entry and exit (which is how i'd apply this)

for each, we'd also want the timing (how long did it take to reduce the size once or 3 times).

I'd ignore hyperopt directly - we can interpolate hyperopt from backtesting results - as we know that it'll simply execute the 2nd and 3rd step (populate_(entry, exit)_trend()) over and over again.

freqtrade/data/converter/converter.py

# Conflicts: # requirements.txt

xmatthias

i'm not a huge fan of how this is done (mostly, the change in "history_utils).

Debugging failing test shows the reason:

The "first" df.head() is

    # Additions at the top of the page / top of the function to fix output
    pd.set_option('display.precision', 15)
    pd.set_option('display.max_columns', 1000)
    pd.set_option('display.expand_frame_repr', False)

the open/high/low/close values change.
The reason is probably clear - as it's a rounding issue - but it highlights the reason (and importance) to exclude ohlcv columns.

While this is a small absolute change, it's no longer corresponding to the original exchange candles - without the ability for the user to opt-out of this.

i think we should remove the call in this location (allow loading of the data "as is").
in all other cases, the function should be called with skip_original - to not modify the exchange data.

xmatthias · 2023-11-28T06:14:24Z

freqtrade/data/history/history_utils.py

        if not hist.empty:
+            hist = reduce_dataframe_footprint(hist)


i'm not a huge fan of putting this here (mostly, because it's non optional, but also because it changes values in a wrong way - see other comment for details).

dependabot bot and others added 11 commits October 14, 2023 14:28

Merge pull request #4 from AliSayyah/dependabot/pip/develop/pandas-2.1.1

4528e36

Bump pandas from 2.0.3 to 2.1.1

Merge pull request #14 from AliSayyah/dependabot/pip/develop/orjson-3…

ce25b87

….9.9 Bump orjson from 3.9.7 to 3.9.9

optimize dtypes

c987b2b

Merge remote-tracking branch 'origin/develop' into develop

f665cff

added Bottleneck for performance

dafac3d

don't optimize dtype if runmode is dry or live

dac3a76

revert orjson

071ad06

revert orjson

ab63296

fixed some tests

a961d84