Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues when combining deep_architect and ray.tune #8

Open
iacolippo opened this issue Apr 24, 2020 · 4 comments
Open

Some issues when combining deep_architect and ray.tune #8

iacolippo opened this issue Apr 24, 2020 · 4 comments

Comments

@iacolippo
Copy link

Hi, first of all, I'd like to thank you for building and releasing deep_architect.

I am opening this issue because I'd like to use deep_architect together with ray.tuneto get the best of both worlds, but I encountered some issues. Feel free to close this if you think it is out of the scope of the project.

My goal is to use the sampling capabilities of deep_architect and the tools for multiprocessing and logging of ray and ray.tune. Therefore I'm using tune.run and tune.Trainable with the searchers, helpers and modules of deep_architect.

If I write my code with the call to the sampling function inside the _setup method of a tune.Trainable

https://gist.github.com/iacolippo/1262c8afbfd9f5e491add5fbae105afa (line 124)

then I have an issue with ray(tensorboard) logging. I'd say this is not an issue of deep_architect, and it shouldn't be too hard to fix in the source code of ray if need be.

If I write my code as ray wants it (the config["model"] is the model object, in this case, a PytorchModel from deep_architect), then I have a different error.

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

https://gist.github.com/iacolippo/3f815fa90c254f7a065bdc446406233a (not that the () disappeared at line 124)

This might be an issue with deep_architect and multiprocessing, or Pytorch itself, I don't know, I didn't dig into it too much for lack of time. Here is the traceback.

traceback.log

I am using

-e git+git@github.com:negrinho/deep_architect.git@3427c5d45b0cbdc9c2fe1f4e5213f6961ef41749#egg=deep_architect
ray==0.8.4
torch==1.5.0
torchvision==0.6.0a0+82fd1c8

Stay safe!

@richardliaw
Copy link

richardliaw commented Apr 25, 2020

Hey there - this seems like a problem with Ray's documentation being unclear.

What if you just did:

class SimpleClassifierTrainable(tune.Trainable):
    def _setup(self, config):
        use_cuda = torch.cuda.is_available()
        self.device = torch.device("cuda" if use_cuda else "cpu")
        self.batch_size = config["batch_size"]
        self.learning_rate = config.get("lr", 0.01)
        self.train_loader, self.val_loader = get_dataloaders(self.batch_size)
        ##############################
		# CREATE MODEL HERE
        model = sample_model(in_features=784, num_classes=10))
        self.model = model.to(self.device)
        ###############################
        self.criterion = nn.CrossEntropyLoss()
        self.optimizer = optim.Adam(self.model.parameters(),
                                    lr=self.learning_rate)

@negrinho
Copy link
Owner

negrinho commented Sep 9, 2020

Hi Iacopo. Apologies for the delay. Unfortunately, I haven't been able to dedicate much time to the DeepArchitect lately, but I'm looking to resume soon. I'm curious about whether how far did you go with DeepArchitect in your work. I'm not familiar with Ray but happy to integrate some functionality as it seems widely adopted now. I don't see any inherent problems in using DeepArchitect with Ray, provided that Ray does not need too much information about the workload that it is running (e.g., the exact architecture).

@iacolippo
Copy link
Author

Hi @negrinho

No need to apologize :-) didn't have much time to work on this either.

The idea would be to be able to use DeepArchitect functions as a sampler for a Pytorch (or TF) model in the tune.run parameter config (see here: https://gist.github.com/iacolippo/3f815fa90c254f7a065bdc446406233a#file-ray_deep_architect_ex2-py-L201). This would make it really easy to scale an architecture search from a single machine to a cluster.

I will have a person working on a closely related project starting in October, so I will hopefully be able to give more detailed information soon.

@negrinho
Copy link
Owner

negrinho commented Nov 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants