High memory consumption in python xgboost #5474

pplonski · 2020-04-02T13:18:05Z

I'm working on python AutoML package and one of my users reported very high memory usage while using xgboost.

I've made an investigation to show memory consumption by xgboost. You can find the notebook here. From the code, you can see that the model allocate over 7GB of RAM memory. When I save the model to hard disk (5 kB !) and then load it back I can save a huge amount of RAM.

For me, it look's like xgboost is storing the copy of data in its structure? Am I right?

Is there any way to slim down memory usage by xgboost? Do you think that saving model to the hard drive and then loading it back is way to handle this issue?

trivialfis · 2020-04-02T13:31:23Z

@pplonski We are trying to eliminate the copy for histogram algorithm. It's a working in progress. For GPU it is mostly done: #5420 #5465

CPU still has some more work to do.

SmirnovEgorRu · 2020-04-02T14:51:19Z

@pplonski, we implemented reducing memory consumption on CPU also in this PR #5334, but for 'hist' method only. It's included in master only for now, but I hope it will be a part of the future release.

memory, Kb	Airline	Higgs1m
Before	28311860	1907812
#5334	16218404	1155156
reduced:	1.75	1.65

Agree with @trivialfis, there are many things to do in the area.

dhruvrnaik · 2020-04-29T13:00:48Z

Hi, I have recently faced a similar high memory problem with xgboost. I am using 'gpu_hist' for training.

I notice large system memory spikes when train() method is executed, which leads to my jupyter kernel crashing.

Is it correct to say that Xgboost is making a copy of my data in system RAM (even when I'm using 'gpu_hist')?
I was under the assumption that xgboost loads the entire training data to the GPU. Is that also incorrect?

trivialfis · 2021-07-23T08:42:56Z

The memory usage has since improved a lot with inplace predict and DeviceQuantileDMatrix. Free free to try them out. One issue remain is sometimes CPU algorithm uses memory linear to n_threads * n_features.

newmanwang · 2021-11-23T13:48:22Z

xgboost.train() also consume a lot of memory(not gpu memory) when making a copy of Booster as returned model , in my case 9GB before bst.copy() and 34G after.
How about not making copy as a option?

trivialfis · 2021-11-23T15:12:57Z

What about not making copy as a option?

I'm working on it.

newmanwang · 2021-12-04T15:11:11Z

I am not really sure whether Booter.save_model() would produce exactly the same model file If bst.copy() is ommited when Booster.train() return. Would it be safe that Bootsr.train() simply return bst directly without copy If nothing happen between Bootster.train() and Bootser.save_model()? I' m hoping that it won't make any difference on making prediction on the model produced. xgboost-1.5.1 @trivialfis

trivialfis · 2022-01-10T11:59:11Z

Copy is exact, no change happens during the copy.

newmanwang · 2022-01-12T08:09:59Z

Copy is exact, no change happens during the copy.

Thanks for the reply, really appreciate it. Looking forward for the upcoming release!

trivialfis · 2022-10-20T11:27:11Z

We have implemented QuantileDMatrix for hist tree method and set it as default for sklearn, along with which, inplace prediction is used for sklearn whenever feasible. Other than these, some additional memory reduction optimizations for hist are implemented. Lastly, UBJSON is used as the default serialization format for pickle. I think we can close this issue and consider other memory reduction work as further optimization.

pplonski mentioned this issue Apr 2, 2020

too much memory consumption by xgboost mljar/mljar-supervised#13

Closed

peter-WeiZhang mentioned this issue Aug 5, 2020

XgboostError mljar/mljar-supervised#134

Closed

trivialfis mentioned this issue Jan 10, 2022

[RFC] Add support for Universal Binary JSON #7545

Closed

6 tasks

trivialfis added this to 2.0 TODO in 2.0 Roadmap via automation Apr 28, 2022

trivialfis added the feature-request label Apr 28, 2022

trivialfis moved this from 2.0 TODO to 2.0 In Progress in 2.0 Roadmap May 20, 2022

trivialfis closed this as completed Oct 20, 2022

2.0 Roadmap automation moved this from 2.0 In Progress to 2.0 Done Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory consumption in python xgboost #5474

High memory consumption in python xgboost #5474

pplonski commented Apr 2, 2020 •

edited

trivialfis commented Apr 2, 2020 •

edited

SmirnovEgorRu commented Apr 2, 2020 •

edited

dhruvrnaik commented Apr 29, 2020

trivialfis commented Jul 23, 2021

newmanwang commented Nov 23, 2021 •

edited

trivialfis commented Nov 23, 2021

newmanwang commented Dec 4, 2021 •

edited

trivialfis commented Jan 10, 2022

newmanwang commented Jan 12, 2022

trivialfis commented Oct 20, 2022

High memory consumption in python xgboost #5474

High memory consumption in python xgboost #5474

Comments

pplonski commented Apr 2, 2020 • edited

trivialfis commented Apr 2, 2020 • edited

SmirnovEgorRu commented Apr 2, 2020 • edited

dhruvrnaik commented Apr 29, 2020

trivialfis commented Jul 23, 2021

newmanwang commented Nov 23, 2021 • edited

trivialfis commented Nov 23, 2021

newmanwang commented Dec 4, 2021 • edited

trivialfis commented Jan 10, 2022

newmanwang commented Jan 12, 2022

trivialfis commented Oct 20, 2022

pplonski commented Apr 2, 2020 •

edited

trivialfis commented Apr 2, 2020 •

edited

SmirnovEgorRu commented Apr 2, 2020 •

edited

newmanwang commented Nov 23, 2021 •

edited

newmanwang commented Dec 4, 2021 •

edited