Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't pickle ... #11

Open
AdrianSosic opened this issue Nov 1, 2020 · 5 comments
Open

Can't pickle ... #11

AdrianSosic opened this issue Nov 1, 2020 · 5 comments

Comments

@AdrianSosic
Copy link

Hi @jcmgray, I'm currently using your awesome package to automate my experiments and noticed a problem related to pickling certain data types. While the cloudpickle backend of joblib should work fine to handle, for example, lambda functions, I get an error when working with certain modules based on torch.

Here is a minimal example:

import xyzpy
import botorch
import torch

@xyzpy.label(['model'])
def fun(a):
	x = torch.tensor([[0.]])
	y = torch.tensor([[0.]])
	return botorch.models.SingleTaskGP(x, y)

combos = dict(
	a=range(10)
)

h = xyzpy.Harvester(fun, 'result')
c = h.Crop('test')
c.sow_combos(combos)
c.grow_missing()
c.reap()

It produces the following error:

_pickle.PicklingError: Can't pickle <function _HomoskedasticNoiseBase.__init__.<locals>.<lambda> at 0x14cd89e18>: it's not found as gpytorch.likelihoods.noise_models._HomoskedasticNoiseBase.__init__.<locals>.<lambda>

Tested with Python 3.7.3 and

botorch==0.2.1
torch==1.6.0
xyzpy==1.0.0

After a short search, I found this related post: cornellius-gp/gpytorch#907
A potential solution seems to be using dill instead of pickle. Do you think this option can be added to xyzpy?

For now, my workaround is to remove all problematic variables from the object returned by function to be evaluated after all internal computations have been completed. However, it would be much nicer, of course, if the objects could be naturally handled by xyzpy.

Kind regards,
Adrian

@jcmgray
Copy link
Owner

jcmgray commented Nov 3, 2020

Hi Adrian, thanks for the issue and glad xyzpy is being useful! It should be straightforward and seems useful to add a picklelib arg or something to Crop. I think the only functions called are dumps and loads.

Just as a quick first check you could try switching this line at the top of batch.py:

from joblib.externals import cloudpickle
# to -->
import dill as cloudpickle

and see if everything runs for you?

@AdrianSosic
Copy link
Author

Hi @jcmgray, thanks for getting in touch. Unfortunately, your suggested change did not resolve the issue but raise the following error:

  File "/Users/M280152/Downloads/xyzpy/xyzpy/gen/farming.py", line 631, in Crop
    num_batches=num_batches)
  File "/Users/M280152/Downloads/xyzpy/xyzpy/gen/batch.py", line 226, in __init__
    self._sync_info_from_disk()
  File "/Users/M280152/Downloads/xyzpy/xyzpy/gen/batch.py", line 333, in _sync_info_from_disk
    farmer = None if farmer_pkl is None else pickle.loads(farmer_pkl)
ModuleNotFoundError: No module named '__builtin__'

Any thoughts on this?

@jcmgray
Copy link
Owner

jcmgray commented Nov 7, 2020

OK that seems to be a separate problem - the farmer_pkl currently is pickled and unpickled by different libraries, which I am surprised currently works. That can be easily fixed.

The main problem is in fact not to do with pickling the function (what cloudpickle is currently used for), but using joblib.dump to write the result inside the grow function. Since I had assumed this to always be numeric types and arrays etc.

As an easier workaround than your current, you could simply pickle the return yourself:

    return dill.dumps(botorch.models.SingleTaskGP(x, y))

then unpickle on the other end.

And it might be nice to have this as a separate picklelib options as well.

@AdrianSosic
Copy link
Author

Hi @jcmgray, I see. Is there a particular reason why you are using both cloudpickle and joblib instead of only one of them, i.e. would it be possible to also use dill (e.g. via setting an option) for the grow function?

In any case, am using your suggested solution at them moment as a workaround, which is indeed much smarter than simply throwing away the objects ;-)

Thanks a lot for your help! Much appreciated!

@jcmgray
Copy link
Owner

jcmgray commented Nov 12, 2020

The reasoning was I think as follows:

  1. cloudpickle is specialised for saving functions (so is used just for the function), it has some overhead
  2. joblib is specialised for saving arrays (& can't process functions), which is what I generally had envisioned would be returned by the function!

This logic might not be necessary anymore, & I defo agree it would nice to be able to be able to customize which picklers are used.

I can try and add this at some point (unless you want to!), but it might not be immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants