Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicating multi-GPU EfficientZero Atari results #223

Closed
selfsim opened this issue May 10, 2024 · 1 comment
Closed

Replicating multi-GPU EfficientZero Atari results #223

selfsim opened this issue May 10, 2024 · 1 comment
Labels
dependencies Issue with Python package dependencies discussion Discussion of a typical issue or concept

Comments

@selfsim
Copy link

selfsim commented May 10, 2024

Hello,

I want to replicate all E0 Atari results to compare my custom learner against.

I have SSH access to 32 CPU core + 4 GPU nodes on a cluster. I managed to get the single GPU atari_efficientzero_config.py to work, but I have been unable to run the multi-GPU atari_efficientzero_multigpu_ddp_config.py successfully. Can you confirm that the latter works on the current version? And if so, can you provide the environment details (python version etc.) and the execution instructions used?

Also, do you have the hyperparameters for the E0 Atari trials? I'm assuming they are all default but it would be nice to have confirmation.

@puyuan1996 puyuan1996 added dependencies Issue with Python package dependencies discussion Discussion of a typical issue or concept labels May 11, 2024
@puyuan1996
Copy link
Collaborator

puyuan1996 commented May 11, 2024

Hello, we have confirmed that the file atari_efficientzero_multigpu_ddp_config.py on the main branch can be run normally using the following command:

python -m torch.distributed.launch --nproc_per_node=2 zoo/atari/config/atari_efficientzero_multigpu_ddp_config.py

The environment configuration is as follows:

  • CentOS Linux 7
  • torch==2.1.1+cu118
  • python==3.9.12
  • gym==0.25.1
  • ale-py==0.8.0

Could you please specify the error message you are encountering? Regarding the hyperparameters for EfficientZero, you can use the default settings. If necessary, you can also adjust parameters such as model_update_ratio to optimize performance. More information on multi-gpu setting can be found here. I hope this information is helpful to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Issue with Python package dependencies discussion Discussion of a typical issue or concept
Projects
None yet
Development

No branches or pull requests

3 participants