Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic hang or race on posix systems because auto.py waits for wrong PID to exit #102

Open
saulrh opened this issue Jul 14, 2021 · 1 comment
Labels
backend issue for map generation Linux/Mac

Comments

@saulrh
Copy link

saulrh commented Jul 14, 2021

From #37:

Problem lies with the fact that on posix the pid of the main process isnt the original pid anymore when it was created, I don't know how to solve this for now. One solution could be to just kill all factorio instances but I want to avoid that.

If we can't kill by PID because the PID is inaccurate, the loop on line 367 that waits for pid to have exited may either complete before factorio has exited (if the PID is still unused), hang randomly (if the PID has been reallocated), or deadlock (if the PID has been reallocated to one of the worker subprocesses that auto.py spins up for handling images).

FactorioMaps/auto.py

Lines 367 to 368 in 3479b2a

while psutil.pid_exists(pid):
time.sleep(0.1)

This likely also explains #89. I'm not entirely sure what a fix would look like; will think about it.

This problem is about half-theoretical. I'm currently working on trying to export an hourly timelapse of my Space Exploration victory, all 550 hours and 15 planets of it. I have the computational horsepower and disk space to actually pull it off but I have a feeling I'm going to run into all sorts of low-occurrence stability problems while I'm attempting this, and this is the first big one - I'm running into a nondeterministic hang at this step that might be explained by this bit of code working the way I think it does.

@saulrh
Copy link
Author

saulrh commented Jul 14, 2021

Oh, you're actually in the forum topic about set_wait_for_screenshots_to_finish and it doesn't sound like you ever got an answer for your last question. Hrrrrrm. If you could force the game to process the screenshot queue all at once that'd be ideal, but otherwise ick.

saulrh added a commit to saulrh/FactorioMaps that referenced this issue Jul 15, 2021
Polling psutil.process_exists, and manually invoking things like
`killall factorio`, can have some pretty nasty bugs if PIDs are reused
or factorio exits before we expect. psutil us from all of this. It
also means we don't have to write our own platform-specific code to
kill factorio.

Should fix L0laapk3#102.
@L0laapk3 L0laapk3 added backend issue for map generation Linux/Mac labels Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend issue for map generation Linux/Mac
Projects
None yet
Development

No branches or pull requests

2 participants