Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't destroy the job unless the executor crashes #56

Open
calebwin opened this issue Oct 15, 2021 · 0 comments
Open

Don't destroy the job unless the executor crashes #56

calebwin opened this issue Oct 15, 2021 · 0 comments
Assignees
Labels
banyan-jl Concerning Banyan.jl enhancement New feature or request

Comments

@calebwin
Copy link
Contributor

We should really only end a running job if the program crashes on the executor or the user explicitly calls destroy_job.

When scheduling fails

On a call to a writing function or to collect, recorded lazy computation is scheduled and executed. If the scheduling fails, we currently destroy the job. If you're using Banyan Julia from a notebook, this is undesirable since then you have to restart the job (can take 1-2 minutes) just because a single cell failed. Instead, we should make it so that a call to a writing function or to collect does not modify global state but will roll back in the case of a failure.

When an exception occurs on the cluster

If the job crashes in the backend, we kind of have to destroy the job. But if there's just an exception that occurs, we should ideally propagate that back to the client side and roll back in the same way that we would roll back in the case of a scheduling failure.

@calebwin calebwin added enhancement New feature or request banyan-jl Concerning Banyan.jl labels Oct 15, 2021
@calebwin calebwin self-assigned this Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
banyan-jl Concerning Banyan.jl enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant