[MRG] Reuse grower and splitter memory #88

NicolasHug · 2018-12-26T18:09:49Z

Closes #81

Instead of creating a new grower and a new splitter for every tree, we reuse the existing objects and thus the allocated arrays (ordered_gradients, partition, etc.)

I didn't observe any significant speed improvement with:

benchmarks/bench_higgs_boson.py --no-lightgbm  --n-trees 100  --subsample 1000000

but that might come with parallelization.

I haven't run any memory usage benchmark, I'm not sure if memory_profiler is really suited for this.

@ogrisel if you're OK with this in general I'll fix the tests (compare_lightgbm is already passing) and I will parallelize SplittingContext.reset().

NicolasHug · 2018-12-26T18:11:14Z

benchmarks/bench_higgs_boson.py

@@ -86,7 +86,10 @@ def load_data():
                                         n_iter_no_change=None,
                                         random_state=0,
                                         verbose=1)
-pygbm_model.fit(data_train, target_train)
+@profile


I'll remove this

ogrisel · 2019-01-07T11:00:17Z

Here is the output of a run of Higgs boson with memory_profiler:

mprof run benchmarks/bench_higgs_boson.py --no-lightgbm --n-trees 100 --n-leaf-nodes 255
mprof plot

on master:

on this branch:

So basically there is no significant difference. I think it means that the memory used by the reused splitting context instance is very little compared to the memory used by the histograms stored on each node instance. Those linked nodes in the grower tree it-self cannot be reused very easily without significant refactoring as far as I know.

ogrisel · 2019-01-07T12:29:28Z

Also note that according to the above chart, the memory usage of pygbm is not that bad even if it's fluctuating because of the garbage collections of the temporary histogram arrays.

NicolasHug · 2019-01-07T14:10:54Z

Thanks for the plots!

Indeed it doesn't seem to help much, at least with this number of samples.

I don't think histograms use much memory compared to the splitter though, according to my calculations:

histograms for one iteration:

n_bins * n_features * sizeof(HIST_DTYPE) * n_nodes = 255 * 28 * 12 * 255 = 22MB

SplittingContext for one iteration (re-used):

there are 5 arrays of size uint32 or float32 (gradients, hessians, partition, left_indices_buffer, right_indices_buffer) so total = 5 * 4 * n_samples = 220MB

That being said, LightGBM uses an LRU cache for the histograms.

Maybe it would be helpful if @maartenbreddels could try this branch on the big dataset that used to cause a memory stress? Until then I'm in favor for closing the PR, and maybe resuscitate it eventually if needed.

ogrisel · 2019-01-08T09:49:35Z

Indeed you are right about the relative sizes of the datastructures. But then I don't understand what is causing the fluctuations when running the benchmark on this PR (with the reset-able splitting context variant).

NicolasHug · 2019-01-08T15:18:26Z

Well maybe is just like you said, the TreeNode objects (that contain the histograms) accumulate, and are freed regularly?

maartenbreddels · 2019-01-08T15:20:51Z

Happy to test this branch out (maybe this week? can't guarantee), note that I was using OS X, maybe that mattters.

ogrisel · 2020-09-03T10:02:50Z

Here is a possible fix for the accumulated memory usage of the stored histograms that proved very efficient in scikit-learn: scikit-learn/scikit-learn#18242

ogrisel · 2020-09-03T10:04:49Z

Also note that Higgs boson is not the best benchmark for memory efficiency: it has many samples and few features and a single binary target. So compared to the amount of computation, it allocates few histograms.

Reset grower and splitter instead of instanciating new ones

2d4b01b

NicolasHug changed the title ~~[MRG] Reset grower and splitter~~ [MRG] Reuse grower and splitter memory Dec 26, 2018

NicolasHug commented Dec 26, 2018

View reviewed changes

NicolasHug mentioned this pull request Aug 18, 2020

HistGradientBoosting memory improvement scikit-learn/scikit-learn#18163

Closed

NicolasHug closed this Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Reuse grower and splitter memory #88

[MRG] Reuse grower and splitter memory #88

NicolasHug commented Dec 26, 2018

NicolasHug Dec 26, 2018

ogrisel commented Jan 7, 2019 •

edited

ogrisel commented Jan 7, 2019

NicolasHug commented Jan 7, 2019

ogrisel commented Jan 8, 2019

NicolasHug commented Jan 8, 2019

maartenbreddels commented Jan 8, 2019

ogrisel commented Sep 3, 2020 •

edited

ogrisel commented Sep 3, 2020

[MRG] Reuse grower and splitter memory #88

[MRG] Reuse grower and splitter memory #88

Conversation

NicolasHug commented Dec 26, 2018

NicolasHug Dec 26, 2018

Choose a reason for hiding this comment

ogrisel commented Jan 7, 2019 • edited

ogrisel commented Jan 7, 2019

NicolasHug commented Jan 7, 2019

ogrisel commented Jan 8, 2019

NicolasHug commented Jan 8, 2019

maartenbreddels commented Jan 8, 2019

ogrisel commented Sep 3, 2020 • edited

ogrisel commented Sep 3, 2020

ogrisel commented Jan 7, 2019 •

edited

ogrisel commented Sep 3, 2020 •

edited