Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New process architecture for benchexec in container mode #875

Open
PhilippWendler opened this issue Nov 7, 2022 · 0 comments
Open

New process architecture for benchexec in container mode #875

PhilippWendler opened this issue Nov 7, 2022 · 0 comments
Labels
container related to container mode

Comments

@PhilippWendler
Copy link
Member

Currently, the architecture of benchexec in the localexecution.py module is such that it starts a number of worker threads depending on --numOfThreads and each of the worker threads repeatedly executes runs using RunExecutor. In container mode, execution of a run involves calling clone() to create a subprocess.

This architecture has several problems:

  • Glibc's clone() is not safely usable in processes with more than one thread and can produce deadlocks (BenchExec subprocess hangs in __malloc_fork_lock_parent #656). We currently have a workaround for this, but it is not a full solution.
  • Since Python 3.8, the API documentation requests that subprocesses are only created from the main thread, not from worker threads as we do. We have not encountered problems related to this so far, but it could happen.
  • With a high (3-digit) number of threads we have not seen the expected throughput. While no profiling has been performed yet it is plausible that this is caused by the fact that all the pre- and postprocessing of runs (e.g., log analysis, writing results) is performed in Python threads, of which only one can be active at the same time due to the GIL.

(Note that all these problems do not affect users of runexec / containerexec, where the run execution is started from a single-threaded process.)

So in the long term it would probably be good to change this architecture. There at least two potential solutions:

  • Switch from worker threads to worker processes just like the multiprocessing module. Each worker process would be single threaded.
  • Have one designated (single-threaded) subprocess that is created in the beginning and whose sole responsibility is to spawn all further subprocesses on request. (Android uses this and calls it the Zygote process.)

Instead of clone() one can also use unshare() and os.fork() for creating a container, which should be safer, but due to the way how unshare() works with PID namespaces this would involve yet another process per run and probably complicate process handling even more than any of the other alternatives.

Things to consider:

  • Whether and how this affects and works for cases where benchexec is not called as a command-line tool, but executed as part of a larger Python program (that may have created threads before benchexec is even loaded).
  • The subprocess that we start for each run needs to be cloned from a process that already has all the required modules loaded, because this process is inside the container for the run and might not have access to the Python interpreter's files on disk.
  • How communication is possible with the process that hosts the tool-info module if more than one worker process needs to communicate, or if each worker process should also get its own separate process with an instance of the tool-info module.
  • The fact that preprocessing, actual run execution, and postprocessing is serialized within each worker thread and there is no overlap (i.e., the next run is not already being executed while a previous run is postprocessed) is by design. Otherwise we would have to reserve some cores for the postprocessing threads, which would lead to asymmetric core assignments. However, the fact that postprocessings of parallel threads compete for the GIL is not desired.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
container related to container mode
Development

No branches or pull requests

1 participant