Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Reuse grower and splitter memory #88

Closed
wants to merge 1 commit into from

Conversation

NicolasHug
Copy link
Collaborator

Closes #81

Instead of creating a new grower and a new splitter for every tree, we reuse the existing objects and thus the allocated arrays (ordered_gradients, partition, etc.)

I didn't observe any significant speed improvement with:

benchmarks/bench_higgs_boson.py --no-lightgbm  --n-trees 100  --subsample 1000000

but that might come with parallelization.

I haven't run any memory usage benchmark, I'm not sure if memory_profiler is really suited for this.

@ogrisel if you're OK with this in general I'll fix the tests (compare_lightgbm is already passing) and I will parallelize SplittingContext.reset().

@NicolasHug NicolasHug changed the title [MRG] Reset grower and splitter [MRG] Reuse grower and splitter memory Dec 26, 2018
@@ -86,7 +86,10 @@ def load_data():
n_iter_no_change=None,
random_state=0,
verbose=1)
pygbm_model.fit(data_train, target_train)
@profile
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this

@ogrisel
Copy link
Owner

ogrisel commented Jan 7, 2019

Here is the output of a run of Higgs boson with memory_profiler:

mprof run benchmarks/bench_higgs_boson.py --no-lightgbm --n-trees 100 --n-leaf-nodes 255
mprof plot
  • on master:

bench_higgs_boson_master

  • on this branch:

bench_higgs_boson_pr-88

So basically there is no significant difference. I think it means that the memory used by the reused splitting context instance is very little compared to the memory used by the histograms stored on each node instance. Those linked nodes in the grower tree it-self cannot be reused very easily without significant refactoring as far as I know.

@ogrisel
Copy link
Owner

ogrisel commented Jan 7, 2019

Also note that according to the above chart, the memory usage of pygbm is not that bad even if it's fluctuating because of the garbage collections of the temporary histogram arrays.

@NicolasHug
Copy link
Collaborator Author

Thanks for the plots!

Indeed it doesn't seem to help much, at least with this number of samples.

I don't think histograms use much memory compared to the splitter though, according to my calculations:

histograms for one iteration:

  • n_bins * n_features * sizeof(HIST_DTYPE) * n_nodes = 255 * 28 * 12 * 255 = 22MB

SplittingContext for one iteration (re-used):

  • there are 5 arrays of size uint32 or float32 (gradients, hessians, partition, left_indices_buffer, right_indices_buffer) so total = 5 * 4 * n_samples = 220MB

That being said, LightGBM uses an LRU cache for the histograms.

Maybe it would be helpful if @maartenbreddels could try this branch on the big dataset that used to cause a memory stress? Until then I'm in favor for closing the PR, and maybe resuscitate it eventually if needed.

@ogrisel
Copy link
Owner

ogrisel commented Jan 8, 2019

Indeed you are right about the relative sizes of the datastructures. But then I don't understand what is causing the fluctuations when running the benchmark on this PR (with the reset-able splitting context variant).

@NicolasHug
Copy link
Collaborator Author

Well maybe is just like you said, the TreeNode objects (that contain the histograms) accumulate, and are freed regularly?

@maartenbreddels
Copy link
Contributor

Happy to test this branch out (maybe this week? can't guarantee), note that I was using OS X, maybe that mattters.

@ogrisel
Copy link
Owner

ogrisel commented Sep 3, 2020

Here is a possible fix for the accumulated memory usage of the stored histograms that proved very efficient in scikit-learn: scikit-learn/scikit-learn#18242

@ogrisel
Copy link
Owner

ogrisel commented Sep 3, 2020

Also note that Higgs boson is not the best benchmark for memory efficiency: it has many samples and few features and a single binary target. So compared to the amount of computation, it allocates few histograms.

@NicolasHug NicolasHug closed this Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reuse grower (and thus the splitter) instead of creating a new one
3 participants