Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import error when loading a pickled model pulled from Pipeline #16570

Closed
bmulas1535 opened this issue Feb 27, 2020 · 3 comments
Closed

Import error when loading a pickled model pulled from Pipeline #16570

bmulas1535 opened this issue Feb 27, 2020 · 3 comments
Labels

Comments

@bmulas1535
Copy link

bmulas1535 commented Feb 27, 2020

Describe the bug

Pickling a RandomForestClassifier pulled from an sklearn Pipeline appears to result in a ModuleNotFoundError when loading into another notebook. The errant module does exist, but cannot be found: sklearn.ensemble._forest.

Steps/Code to Reproduce

from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
import pickle
import numpy as np


pipeline = make_pipeline(
    # Other steps in pipeline as well
    RandomForestClassifier(),
)

# Create some fake data
X_train = np.array([[2,8,5],[4,7,2],[1,9,4]])
y_train = np.array([26, 29, 18])
# Train the model
pipeline.fit(X_train, y_train)

# Pickle the model
model = pipeline.named_steps['randomforestclassifier']
outfile = open("model.pkl", "wb")
pickle.dump(model, outfile)
outfile.close()

In another notebook:

from sklearn.ensemble import RandomForestClassifier
import pickle


# Attempt to load the pickled model in another file / notebook:
infile = open("model.pkl", "rb")
model = pickle.load(infile)
infile.close()

# It's lonely over here
model

Expected Results

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

Actual Results

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-20-c8d1783e8b58> in <module>
      5 # Attempt to load the pickled model in another file / notebook:
      6 infile = open("model.pkl", "rb")
----> 7 model = pickle.load(infile)
      8 infile.close()

ModuleNotFoundError: No module named 'sklearn.ensemble._forest'

Versions

System:
    python: 3.7.4 (default, Oct  9 2019, 16:55:50)  [GCC 7.4.0]
executable: /usr/local/bin/python3.7
   machine: Linux-4.15.0-88-generic-x86_64-with-debian-buster-sid

Python deps:
       pip: 19.3
setuptools: 40.8.0
   sklearn: 0.21.3
     numpy: 1.17.2
     scipy: 1.3.1
    Cython: None
    pandas: 0.25.1

EDIT: Fix description

@bmulas1535 bmulas1535 added the Bug label Feb 27, 2020
@bmulas1535
Copy link
Author

Don't appear to be able to import from sklearn.ensemble._forest. Is this intended?

@glemaitre
Copy link
Member

Could you tell use which version of scikit-learn did you use to pickle the pipeline and which version are you trying to unpickle. I assume that you are trying to unpickle in 0.21.3 while pickling in 0.22.1
You can then just update. However, be aware that we don't consider it as a bug because we don't support pickling/unpickling across different scikit-learn version

@glemaitre glemaitre added Question and removed Bug labels Feb 27, 2020
@bmulas1535
Copy link
Author

That was exactly the issue. I didn't realize that I had started the new notebook in a different kernel. Thank you very much for the prompt response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants