Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust handling of worker and subworker crashes #39

Open
gavento opened this issue Apr 13, 2018 · 1 comment
Open

Robust handling of worker and subworker crashes #39

gavento opened this issue Apr 13, 2018 · 1 comment
Assignees

Comments

@gavento
Copy link
Contributor

gavento commented Apr 13, 2018

Currently a crash of a subworker may crash a worker, and a crash of a worker may crash the server. We need to improve this. However, we are not aiming for infrastructure resiliency now. Subworker crash may still fail the task (and so also the session) and worker crash may still lose all the objects and fail all involved sessions. The main goal is to keep the server running and deliver a graceful error.

A robust failure handling will open up the road to retrying tasks (possibly on different workers) and later to worker crash resiliency.

@spirali
Copy link
Collaborator

spirali commented Jul 2, 2018

Executor (=subworker) crashing is now handled with a graceful error with logs. Governor (=worker) crash still results into overall panic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants