Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor shutdown callback couldn't complete properly #311

Open
JiaminZhu opened this issue Jul 24, 2017 · 4 comments
Open

executor shutdown callback couldn't complete properly #311

JiaminZhu opened this issue Jul 24, 2017 · 4 comments

Comments

@JiaminZhu
Copy link

driver.stop() in shutdown cause executor exit immediately without waiting for executor shutdown callback complete.
https://github.com/mesos/mesos-go/blob/master/api/v0/executor/executor.go#L484-L501

@janisz
Copy link

janisz commented Jul 25, 2017

I'll put some context here:

Looking at the go driver code, looks like the driver does a stop() when it gets a shutdown message from the agent. This will likely cause the driver and the executor to exit immediately. This is different https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L428 from how the C++ driver responds to the shutdown message.

https://lists.apache.org/thread.html/e2d87b2e988a026cb497c3aabfc7b2c7d5632d8fdb5b09b503661f50@%3Cdev.mesos.apache.org%3E

@JiaminZhu
Copy link
Author

Thanks @janisz

@jdef
Copy link
Contributor

jdef commented Jul 27, 2017

it sounds like you're looking for semantics more like this?

...
  t := time.NewTimer(shutdownGracePeriod)
  ch := make(chan struct{})
  driver.withExecutor(func(e Executor) {
    defer close(ch)
    e.Shutdown(driver)
  })
  // only stop the driver once Shutdown callback has completed
  go func() {
    defer t.Stop()
    select {
      case <-t.C:
      case <-ch:
    }
    // driver.stop() will cause process to eventually stop.
    driver.stop()
  }()

@JiaminZhu
Copy link
Author

Thanks @jdef I'm thinking that we can keep driver.stop() in Shutdown callback, so that we won't have this issue. And you are right, we do need to keep a timer for shutdown grace period. Isn't executor_shutdown_grace_period one of mesos agent config? Why we need it here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants