Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment hangs up when spawning from different processes #75

Open
GraphicsHunter opened this issue Mar 18, 2019 · 16 comments
Open

Environment hangs up when spawning from different processes #75

GraphicsHunter opened this issue Mar 18, 2019 · 16 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@GraphicsHunter
Copy link

Hello,

I'm trying to run the obstacle tower environment through the large-scale-curiosity project. However, it seems to hangup when it tries to create the environment from its subprocesses. It prints out that the CrashReporter is initalized and the mono config paths, then does nothing for a while and hangs up with the following image:

image

This is run on a MBP 13'' 2018, without a GPU. Any way to troubleshoot and debug this? I can't really do anything as there aren't really logged anything from inside the environment.

@awjuliani awjuliani self-assigned this Mar 19, 2019
@awjuliani awjuliani added the help wanted Extra attention is needed label Mar 19, 2019
@awjuliani
Copy link
Contributor

Hi @tianfanzhu

Can you confirm that you are running the latest version of Obstacle Tower (v1.2)? Also, does it work when using the basic usage python notebook we provide as an example?

@GraphicsHunter
Copy link
Author

Hi @awjuliani ,

I am indeed running this on the latest version, v1.2. Also, I found out from the basic usage notebook that the screen is gray, as shown above, until the env is reset or stepped.

@binoalien
Copy link

I have the same problem.
iMac end 2015
osx 10.14.3

@NancyFulda
Copy link

Me too. Except I'm running this via the Unity Obstacle Tower Challenge run.py script, and at startup I see the game character appear and then fall off the blank screen into nothingness. After that, empty gray screen.

iMac running 10.14.4

@NancyFulda
Copy link

However, when I click on the obstacletower.app file directly, it runs flawlessly.

@harperj
Copy link
Contributor

harperj commented Apr 1, 2019

Hi all, it may be difficult to tell whether everyone is experiencing the same issue. A couple of important things to note:

  • When running multiple environments, the worker_id value (in the environment constructor) must be set to a different integer value for each environment. This is because the gym wrapper and the Unity executable communicate with one another via GRPC over a particular port and each reserves that port.
  • When running run.py, if you are running in evaluation mode the run.py script must be launched before the environment executable.

So with that in mind:
@tianfanzhu can you confirm whether your environment construction sets a different worker_id value for each environment?

@NancyFulda are you running in evaluation mode or just directly running the run.py script? If you're directly running the script, could you look for a file called UnitySDK.log in the same folder as the ObstacleTower.app file and share the contents?

@harperj harperj self-assigned this Apr 1, 2019
@NancyFulda
Copy link

Hi @harperj, thanks for looking into this!

I'm directly executing run.py. Interestingly, the behavior this morning is different than it was on Saturday (maybe I rebooted in between??) I still see grayness, but the game character does not appear anymore. However, the run.py script no longer hangs, but instead prints out the reward for each episode.

Is this the expected behavior? It would be nice to be able to watch the agent's character navigate the world (to see where it's messing up), but since the environment seems to be executing at faster-than-real-time speed, maybe the grayed out screen is normal?

The UnitySDK.log contents are as follows:

4/1/2019 1:35:59 PM

Log
Academy resetting

Log
Seed: 52

Log
Seed: 47

Log
Academy resetting

Log
Seed: 26

Log
Seed: 91

Log
Academy resetting

Log
Seed: 65

Log
Seed: 17

Log
Academy resetting

Log
Seed: 44

Log
You reached floor: 1

Log
Seed: 64

Log
Academy resetting

Log
Seed: 34

Log
Seed: 58

Log
Academy resetting

Log
Seed: 85

@harperj
Copy link
Contributor

harperj commented Apr 1, 2019

@NancyFulda This is the expected behavior. When training, the camera isn't turned on in order to improve performance. You can see the camera by turning on realtime mode in the environment (realtime_mode=True in the constructor).

@NancyFulda
Copy link

@harperj Ah, that worked perfectly! Everything seems to be in order now. Thank you!

@stevenh-tw
Copy link

Hi @harperj @awjuliani I also encountered the same issue:

I tried to use ML-Agent 0.8.1 by simply let options['--env'] = 'ObstacleTower/ObstalceTower'
and set options['--num-envs'] = 2

After launching 2 envs, 1 env had the agent just spawning and falling down, another env just 'not responding', and my cpu and gpu usage of the falling-down agent env is very high.

This issue occurs in my Windows machine (Windows10), but it has no problem with the same setting on my Mac, also I've checked that I'm using ObstacleTower-v1.3

Here's the reference video [https://youtu.be/u-J7mlwlmr0]

@Sohojoe
Copy link

Sohojoe commented May 9, 2019

I was able to get large-scale-curiosity + Obstacle Challenge working up to about 32 agents

  • make sure worker_id is unique for each instance
  • timeout_wait=6000
  • add a sleep(2) between creating each instance (i.e. 2 seconds)
  • some worker_id may clash with windows - for me i needed to add if rank >= 35: rank += 1
  • I copied the render module from OpenAI.Gym to visualize training (realtime_mode=True slows down training)

@karta1297963 what you see in your video is what happens when the Unity environment does not sync with Python. Even with everything I did above, I still see this 1 in 5 times when starting off a run (even with different code bases)

@harperj
Copy link
Contributor

harperj commented May 9, 2019

Like @Sohojoe said, this looks like an issue with the connection between Python and Obstacle Tower / Unity. It could be that the port is in use for something else, that the worker_id is not being set correctly, or that the environment takes longer than the timeout_wait to start up. You could potentially have your script fail gracefully and re-launch on timeout as well, or try a new worker_id if you have a reserved port that conflicts.

@stevenh-tw
Copy link

@Sohojoe @harperj thanks for helping,
I've tried the solution @Sohojoe mentioned but it didn't work, later I tried to cross-validate the compatibility between mlagent-env v0.8 and unity instance built with mlagent v0.6 (like obstacle tower)

I built 2 instances with mlagent default task - Pyramids with SDK v0.6 and v0.8 respectively, turns out one with v0.6 has the same sync issue while v0.8 instance doesn't.
Then I compare the git history seems like v0.8 have the ability to customize gRPC communication message, I guess it's the reason python and unity don't sync (but somehow with only 1 environment the issue doesn't occur)

I guess the possible solutions:

  1. Wait for ObstalceTower update to mlagent v0.8
  2. Use mlagent-env v0.6 and somehow make it works with mlagent v0.8 SubprocessUnityEnvironment

@Sohojoe
Copy link

Sohojoe commented May 22, 2019

@karta1297963 - what platform / OS are you using?

@stevenh-tw
Copy link

@Sohojoe I'm using Windows 10.
I currently have a workaround by using the OpenAI baseline - SubprocVecEnv class, it works! but seems like this approach cannot have the step function return both visual and vector observation at the same time.

@Sohojoe
Copy link

Sohojoe commented May 24, 2019

@karta1297963 - create a simple repro that spawns many instances as an example of how i do it - https://github.com/Sohojoe/many_towers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants