Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some problems when running tester.py #10

Open
CILabTaegwan opened this issue Jan 10, 2023 · 6 comments
Open

Some problems when running tester.py #10

CILabTaegwan opened this issue Jan 10, 2023 · 6 comments

Comments

@CILabTaegwan
Copy link

Hi, Could you please offer an example of tester.py?
The code worked without a problem with the trainer.py, but the problem occurred with the tester. The error is as follows.

Traceback (most recent call last):
File "tester.py", line 194, in
run_test(ego, env, args.total_episodes, args.render)
File "tester.py", line 50, in run_test
action = ego.get_action(obs, False)
File "C:\Users\user\Desktop\pantheon\PantheonRL\pantheonrl\common\agents.py", line 72, in get_action
actions, _, _ = action_from_policy(obs.obs, self.policy)
AttributeError: 'numpy.ndarray' object has no attribute 'obs'

This problem does not occur when using the FIXED agent in the trainer.
My torch version is 1.13.1, stable-baseline3 version is 1.6.2

Thanks

@bsarkar321
Copy link
Collaborator

Thanks for finding this issue! The latest version of pantheonrl should fix this bug (it was caused by a new observation type that is unsupported from SB3 by default).

For reference, tester.py follows a similar syntax to trainer.py (except there are no presets). For example, if we want to run Liars dice, we can train two agents with:

python3 trainer.py LiarsDice-v0 PPO PPO --seed 10 --preset 1 -t 50000

And then we can test these two agents with:

python3 tester.py LiarsDice-v0 PPO PPO --seed 10 -t 5000 --ego-load models/LiarsDice-v0-PPO-ego-10.zip --alt-load models/LiarsDice-v0-PPO-alt-10.zip

@CILabTaegwan
Copy link
Author

Thank you for the quick handling of the issue! Additionally, is it possible to set LOAD LOAD on the trainer? This process is included in LOAD PPO setting, but I thought it is necessary to control save-load process in the command. What part should be changed to implement LOAD LOAD setting?

@bsarkar321
Copy link
Collaborator

Oh yeah, reloading both should be possible for fine tuning. We can essentially create a separate version of the gen_fixed function, but we need to wrap it in the appropriate Agent type (OnPolicyAgent or AdapAgent). I can add this feature soon-ish, but if you need this functionality in the meantime you can also write a script that loads the policy you want.

If you look at the overcookedtraining.py within the examples folder, you can replace the PPO('MlpPolicy', env, verbose=1) with PPO.load('your_file') for both the ego and partner agents.

@bsarkar321 bsarkar321 reopened this Jan 11, 2023
@CILabTaegwan

This comment was marked as off-topic.

@CILabTaegwan
Copy link
Author

Thanks for your previous apply, and I got one more question about Logger output( "rollout/ep_len_mean", "time/fps", "train/loss", ...etc.) when running the trainer.py.

While running trainer.py, the outputs of the logger are printed from "ego.learn()". I thought the reference of this module is "algos/modular/learn.py", but even if I changed the code ( such as "self.logger.record("time/fps",fps)") on learn.py, the outputs of the logger were not changed.

Where can I control the contents of the logger output? Isn't the ego.learn() module from "algos/modular/learn.py" or "algos/adap/adap_learn" ?

@bsarkar321
Copy link
Collaborator

Great question! Based on the code you have given earlier, it seems like you are using the PPO policy from stablebaselines3, so all of the logger logic comes from there. If you would like to change the logger interface, you would probably need to define a separate PPO implementation that logs the information you want.

Alternatively, you could also use CleanRL's implementation of PPO (https://github.com/vwxyzjn/cleanrl), which is easier to understand and cleanly defines the logging behavior. However, it is not a drop-in replacement for SB3's PPO, so you may need to do some extra work to integrate pantheonrl with this different interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants