-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(wip): cupy histograms #1095
base: master
Are you sure you want to change the base?
Conversation
29b2e77
to
38b0625
Compare
…throughput on multi-dim histograms
(cf https://docs.cupy.dev/en/stable/reference/generated/cupy.histogramdd.html , which is the standard regular-array API conforming to the numpy version - no boost or anything) |
@martindurant - cool, it appears to use the same techniques (fill by atomics) so it'll be subject to the same scaling limitations I'm seeing. However, the number of calls to fill is a bit more lean so maybe it's worth backing an implementation with it. I'll have to try some benchmarks. Otherwise - there's significant functionality missing from the cupy hists that we'll still have to add on top, if it turns out to run faster in the first place. |
Yes, I expect this is the case. dask-histogram also uses boost and adds a numpy- compatible API on top for those that expect it; and of course, it's designed to work with awkward. I expect there's an amount of sharing and refactoring that can eventually be done. |
Yeah - I think it is all possible. Right now is really getting all the pipe-fittings in place. I'll let you know if there's any clear win in the benchmarks. |
@Saransh-cpp: @jpivarski and I talked last Friday and it came up that you might be interested in taking this "pilot project" and turning it into a full-blow UHI compatible histogramming interface (a la scikit-hep/hist), but for cupy/cuda histograms. What's in this PR has the necessary functionality for HEP and we can convert to scikit-hep/hist, but it would be nice to have a cohesive ecosystem and only convert to CPU memory-space at the last moment. This would grant us more or less infinite scaling. We also has some ideas towards warpwise-distributed histograms where a (collection) of warps would tend to a sub-range of bins so that filling can be done more in parallel. This old implementation description demonstrates that if you stick to a warp (i.e. 32 bins) and replicate histograms to do filling parallel you can reach 10GB/s filling rates, because there's no use of atomics. This also has interesting parallels to cluster-distributed histograms where a (relatively enormous) histogram could be distributed across a whole dask cluster and achieve scaling to 100s of GBs in size or more. This would effectively remove scaling limitations for histograms for the foreseeable future and is probably important for achieving precision HL-LHC analyses. Anyway - please let us know if you are interested in turning this into a more mature package and possibly adding features to it! We're happy to answer any questions you may have. |
work in progress on using cupy in the old coffea-hist package as a demonstrator