Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation and experiment replication [MacOS (M1/ARM)] #12

Open
RMichae1 opened this issue Sep 28, 2023 · 3 comments
Open

Installation and experiment replication [MacOS (M1/ARM)] #12

RMichae1 opened this issue Sep 28, 2023 · 3 comments

Comments

@RMichae1
Copy link

Hello,

I'm having trouble replicating the exact environment and results that's described in the readme.md and requirements.txt.
Namely, running the commands one-by-one lands in an error at the pip install -r requirements.txt --upgrade. The changes that were required to complete the installation process are the following:

  1. Changing torchvision from 0.11.1 to 0.11.2
  2. Removing the strict requirements from vina. There seems to be a bug on one of their __init__.py.
  3. installing tokenizers fails unless the rust compiler is installed.
  4. Running the examples shows that protobuf needs to be below 3.20.x.

It looks like the majority of these issues are MacOS (>=13.5.*) (M1/ARM) specific and linux64 based system don't have these issues. Here one can install torchvision==0.11.1, vina as specified.
Though the protobuf error also occurs on Linux (see below).

Once the environment is setup, if we want to run the protein optimization task, as in:

python scripts/black_box_opt.py optimizer=lambo optimizer.encoder_obj=mlm task=proxy_rfp tokenizer=protein surrogate=multi_task_exact_gp acquisition=ehvi trial_id=2 at commit 431b052 add LSBO comparison notebook, we run into nan values during the computation - see error below (for completeness I've attached the log of the complete run).

For sake of replicability we run on a Linux system, with the system setup as close to the requirements.txt as possible (I've also attached the environment as linux_env.txt)

Protobuf

TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

Experiment Run

[2023-09-28 15:07:36,464][root][ERROR] - Expected parameter logits (Tensor of shape (16, 230, 26)) of distribution Categorical(logits: torch.Size([16, 230, 26])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[[nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         ...,
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan]],

        [[nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],

test_run.log
linux_env.txt

@RMichae1
Copy link
Author

Given that the commit is quite ahead of the original submission and paper I checked out commit SHA 22afec26da0b9ea1810e65f8a60ea7988c021cef , here the algorithm stages optimizing candidates and querying objective function appear to run correctly, without the previously encountered error in distribution Categorical nan matrix.

Perhaps somewhere between this particular commit and the latest main commit the way the (discrete) MT-GP posterior gets sampled broke?

@samuelstanton
Copy link
Owner

thanks for the detailed issue. When I was writing the code for this paper the MTGP features of GPyTorch and BoTorch were under active development, which is why the requirements file is pinned to that specific commit. I briefly tried removing the requirement but last I checked it seemed like a PR to one or both of those libraries would be needed. In LaMBO-2 I actually abandoned GPs in favor of partial deep ensembles, and I'm hoping to open-source that code sometime in the nearish future.

https://arxiv.org/abs/2305.20009

@samuelstanton
Copy link
Owner

Hi @RMichae1 just wanted to follow up and let you know that the open-source alpha release of LaMBO-2 is live :)

https://github.com/prescient-design/cortex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants