New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HistGradientBoosting pickle portability between 64bit and 32bit arch #27952
Comments
A while ago, I worked on fixing something similar for the trees see #21552 for context. I am pretty sure at the time I realised that other estimators were problematic but I left them for later. From my notes: the common approach is to try to convert attributes at unpickling time in In the mean-time, your work-around seems completely fine. I would recommend using And needless to say a PR making it work for |
Thanks! That's super useful. Will try and use this as a guide to put together a PR. |
Sounds good! Just curious, can you tell a bit more about your use case? Maybe you want to show the prediction of a For completeness, I have been involved in making scikit-learn work better in Pyodide and I am curious what people use it for 😉 For example:
|
Hey sorry for the delay in replying. The project we are working on is this one : https://urban-analytics-technology-platform.github.io/demoland-web/ The goal is to let policy makers change land use details in a city and see how that effects several key indicator variables (air pollution / house prices etc). We developed the model and train it on UK wide data but at inference time we only need to apply it to smaller areas. So we train outside pyodide and are using pyodide to get the predictions in the browser where they can be visualized. The core modeling package also has to be available in regular old python so we kind of need a solution that works for both. My first attempt at this was using pyodide to train a model and then store it and use that, but then we end up with two pickle files, one for pyodide and one for regular python which is just a little harder to manage. We also envision training larger models in future and would rather to do that outside pyodide. I was actually surprised how well scikit worked in pyodide, it was just this one little hiccup but everything else was pretty smooth |
OK, super interesting, thanks for the info!
Glad to hear that, if you ever bump into other issues, don't hesitate to report them! |
Describe the bug
HistGradinetBoosting models use
np.intp
to represent thefeature_idx
in TreePredictor nodesscikit-learn/sklearn/ensemble/_hist_gradient_boosting/common.pyx
Lines 19 to 36 in 0f8a777
This seems to cause issues with using pickled HistGradientBoosting models which are trained on a 64 bit environment, in 32 bit environments ( like Pyodide which is where I encountered this issue).
I know that for a while the other Tree models in sklearn had a similar problem but I am not 100% what the solution was.
Would changing the type to be
np.uint32
be an acceptable solution here?Steps/Code to Reproduce
Steps to reproduce
see this repo for a full example: https://github.com/stuartlynn/hist_gradient_boost_bug
Expected Results
The pyodide code to run and give the expected output
Actual Results
Error message
Running the above gives the following error message when trying to execute the Pyodide code
Things I have already checked
Hacky fix
So what I found to work is the following. In pyodide, after loading the model if we manually change the types of the nodes for the predictors, then the model runs fine. There is an example of this in the example repo
Versions
The text was updated successfully, but these errors were encountered: