Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally use left/right indices buffer #83

Open
maartenbreddels opened this issue Dec 18, 2018 · 7 comments
Open

Optionally use left/right indices buffer #83

maartenbreddels opened this issue Dec 18, 2018 · 7 comments

Comments

@maartenbreddels
Copy link
Contributor

continuing from #79

In splitting.py, the left/right_indices_buffer will use up 8GB for 10^9 rows. If that causes swapping, the performance benefit of multithreading (which requires these buffers) are most probably not worth it. Would it be an option to disable this?
Is there also a method that could without the buffer (I'm still wrapping my head around the algo, maybe you already thought about it).

@ogrisel
Copy link
Owner

ogrisel commented Dec 18, 2018

In splitting.py, the left/right_indices_buffer will use up 8GB for 10^9 rows. If that causes swapping, the performance benefit of multithreading (which requires these buffers) are most probably not worth it. Would it be an option to disable this?

I don't see how we could have multithreading at that level anymore. You suggest disabling thread-based parallelism for the split_indices operation? Maybe that could be an option. @NicolasHug might know better how LightGBM does for this part of the code.

@NicolasHug
Copy link
Collaborator

Would it be an option to disable this?

Technically yes... I suppose we could use a single-threaded quick-sort like partitioning scheme.

Is there also a method that could without the buffer

I don't think so, or at least no with the current strategy. those arrays are used to that sample indices don't overwrite each other

@NicolasHug
Copy link
Collaborator

@NicolasHug might know better how LightGBM does for this part of the code

I haven't checked again but I don't think they have an option to disable parallel splitting.

@maartenbreddels could you check if you have the same issue on LightGBM? Note that they are reusing allocated data like we plan to do in #81 so we need to take this into account

@maartenbreddels
Copy link
Contributor Author

There are some parallel in place partition algorithms: http://www.lsi.upc.edu/~lfrias/research/parpar/wea08.pdf
they don't appear super trivial, not something I'd do in 1 evening.

But I think one of the buffers could be avoided, that would already save a bit. Would you be interested in a PR that does either a single threaded split, or uses 1 buffer, or both? I can't promise I can do it, but if the 1 buffer PR makes the code less readable, and that is a reason not to merge, I won't bother.

could you check if you have the same issue on LightGBM?

I cannot use LightGBM as it is now, the current implementation makes at least 2 memory copies, my vaex-ml hack avoids 1 copy, but still the memory usage is excessive.

My plan is to see what is possible with pygbm (much easier to understand, and easier to edit), and possible see how they can be translated to lightgbm.

@NicolasHug
Copy link
Collaborator

I think a single-threaded version would be welcome and should not be too complicated to add.

I'd be curious to know how to avoid using one of the two arrays though!

@maartenbreddels
Copy link
Contributor Author

I think a single-threaded version would be welcome and should not be too complicated to add.

I'll open an PR for that, any guidelines for how this should be configurable?

I'd be curious to know how to avoid using one of the two arrays though!

I thought of using the sample_indices for the 'left' indices, and a scratchpad/buffer for the 'right' indices. Basically, sample_indices takes over the role of left_indices_buffer. That should work right?

@NicolasHug
Copy link
Collaborator

If you can make sure that no entry in samples_indices gets overwritten before it's written into the other buffer then I guess so. But there's the "if" ^^

any guidelines for how this should be configurable?

Let go simple for now, you can try passing a parameter e.g. parallel_splitting from BaseGradientBoosing all the way down to SplittingContext.__init__, and make split_indices dispatch to either split_indices_parallel or split_indices_single_thread. We can work out the details later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants