About Replicating SampledZero Performance in the Hopper-V3 Environment #210

hyLiu1994 · 2024-04-09T07:39:30Z

I attempted to replicate the sampledefficientzero results displayed in the Hopper-V3 environment's readme benchmark section using the default configuration file (zoo/mujoco/config/mujoco_sampled_efficientzero_config.py). However, I encountered two main issues during the process:

I was unable to achieve the results illustrated by the blue line in the following graph.

Additionally, I observed significant discrepancies between the results of two runs using the identical configuration file, as depicted in the graph below. Both the blue and gray lines represent outcomes obtained from the same configuration file.

Could you suggest possible reasons for these discrepancies and any solutions to achieve consistent results similar to those presented in the benchmark?

puyuan1996 · 2024-04-10T10:03:31Z

Hello, thank you for your feedback. Currently, our repository includes an open-source implementation similar to SampledMuZero, which is the only example available since the original authors did not release their source code. Consequently, our implementation may differ from the original in aspects such as network architecture, loss functions, hyperparameters, and training processes. These differences could be one of the reasons for suboptimal performance and instability in training our SampledEfficientZero in continuous action spaces, such as Mujoco. A robust and stable open-source implementation of SampledMuZero would be highly valuable to the community and warrants further investigation. We plan to delve deeper into this matter and will provide updates here. Thank you once again for your valuable input and patience.

hyLiu1994 · 2024-04-14T06:57:51Z

Thank you for detail response ～

I will try to optimize for this.

If I have any conclusion, I will share with you.

hyLiu1994 changed the title ~~About 复现 sampledzero 于 Hopper-V3~~ About Replicating SampledZero Performance in the Hopper-V3 Environment Apr 9, 2024

puyuan1996 added config New or improved configuration discussion Discussion of a typical issue or concept enhancement New feature or request labels Apr 10, 2024

puyuan1996 mentioned this issue Apr 11, 2024

Sampled MuZero google-deepmind/mctx#87

Closed

puyuan1996 mentioned this issue Apr 18, 2024

the sampled efficient zero portion of the code #218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Replicating SampledZero Performance in the Hopper-V3 Environment #210

About Replicating SampledZero Performance in the Hopper-V3 Environment #210

hyLiu1994 commented Apr 9, 2024

puyuan1996 commented Apr 10, 2024

hyLiu1994 commented Apr 14, 2024

About Replicating SampledZero Performance in the Hopper-V3 Environment #210

About Replicating SampledZero Performance in the Hopper-V3 Environment #210

Comments

hyLiu1994 commented Apr 9, 2024

puyuan1996 commented Apr 10, 2024

hyLiu1994 commented Apr 14, 2024