Plug.Cowboy.Drainer shut down abnormally #92

dantswain · 2023-02-22T14:59:49Z

Hi!

My app has been shutting down seemingly randomly. This only happens on my local machine (not on other devs, not in production environments) so I suspect it has something to do with my set up.

There are no logs when my app shuts down. It doesn't seem to correlate to any particular behavior. This has been happening for at least a couple weeks with regular daily development happening.

I finally got a clue by enabling SASL logging:

[error] Child Plug.Cowboy.Drainer of Supervisor MyApp.Endpoint shut down abnormally
** (exit) killed
Pid: #PID<0.1011.0>
Start Call: Plug.Cowboy.Drainer.start_link([refs: [MyAppWeb.Endpoint.HTTP]])
Restart: :permanent
Shutdown: 5000
Type: :worker

[notice] Application my_app exited: shutdown

Based on looking at what Plug.Cowboy.Drainer does, this seems like the error could be a red herring? Any ideas how I could get more diagnostic information? I highly suspect this is a memory limit or running out of file descriptors or something else specific to my environment but I can't seem to get any more clues.

I'm using plug_cowboy 2.6.0 with Phoenix 1.7.0-rc.2 on Elixir 1.14.3, OTP 25

The text was updated successfully, but these errors were encountered:

josevalim · 2023-02-22T15:20:21Z

Can you try running with config :logger, handle_sasl_reports: true? My suspicion is that another child in your supervision tree is crashing causing the tree to restart. During restart, it has to shut down the current children. Cowboy cannot drain everything in 5000ms (probably because a browser is open) and then it is killed after 5 seconds.

dantswain · 2023-02-22T15:31:16Z

This is with config :logger, handle_sasl_reports: true 😬

iex(4)> Application.get_env(:logger, :handle_sasl_reports)
true

The log message just before this is (in the most recent crash) [info] Sent 200 in 36ms. Open to trying anything else!

josevalim · 2023-02-22T15:43:03Z

Hrm, that's very weird then! What could be killing it? Maybe you can play with Erlang's dbg or tracer and trace the messages between processes?

dantswain · 2023-02-22T15:58:50Z

I agree!

My hunch is I'm running out of file descriptors or something at a system level? I don't know how else it would be so quiet...

I've attached a tracer to my drainer process, so maybe that will show something. I don't think I can get away with tracing everything since it can be hours between crashes and I need to be able to work in the meantime 😓

dantswain · 2023-02-22T16:47:52Z

OK that got a little more info!

(<0.763.0>) out {gen,do_call,4} (Timestamp: {1677,83170,904349})
(<0.763.0>) in {gen,do_call,4} (Timestamp: {1677,83170,904389})
(<0.763.0>) << {#Ref<0.23370919.4183031816.181956>,
                [{specs,1},{active,4},{supervisors,0},{workers,4}]} (Timestamp: {1677,
                                                                                 83170,
                                                                                 904398})
(<0.763.0>) out {timer,sleep,1} (Timestamp: {1677,83170,904448})
(<0.763.0>) in {timer,sleep,1} (Timestamp: {1677,83171,905187})
(<0.763.0>) << timeout (Timestamp: {1677,83171,905236})
(<0.763.0>) <0.661.0> ! {'$gen_call',{<0.763.0>,
                                      #Ref<0.23370919.4183031816.181957>},
                                     count_children} (Timestamp: {1677,83171,
                                                                  905324})
(<0.763.0>) out {gen,do_call,4} (Timestamp: {1677,83171,905357})
(<0.763.0>) in {gen,do_call,4} (Timestamp: {1677,83171,905403})
(<0.763.0>) << {#Ref<0.23370919.4183031816.181957>,
                [{specs,1},{active,4},{supervisors,0},{workers,4}]} (Timestamp: {1677,
                                                                                 83171,
                                                                                 905417})
(<0.763.0>) out {timer,sleep,1} (Timestamp: {1677,83171,905467})
(<0.763.0>) in {timer,sleep,1} (Timestamp: {1677,83172,906176})
(<0.763.0>) << timeout (Timestamp: {1677,83172,906224})
(<0.763.0>) <0.661.0> ! {'$gen_call',{<0.763.0>,
                                      #Ref<0.23370919.4183031816.181958>},
                                     count_children} (Timestamp: {1677,83172,
                                                                  906282})
(<0.763.0>) out {gen,do_call,4} (Timestamp: {1677,83172,906313})
(<0.763.0>) in {gen,do_call,4} (Timestamp: {1677,83172,906356})
(<0.763.0>) << {#Ref<0.23370919.4183031816.181958>,
                [{specs,1},{active,4},{supervisors,0},{workers,4}]} (Timestamp: {1677,
                                                                                 83172,
                                                                                 906368})
(<0.763.0>) out {timer,sleep,1} (Timestamp: {1677,83172,906414})
(<0.763.0>) in {timer,sleep,1} (Timestamp: {1677,83172,911083})
(<0.763.0>) exit killed (Timestamp: {1677,83172,911114})
(<0.763.0>) out_exited 0 (Timestamp: {1677,83172,911131})
[error] Child Plug.Cowboy.Drainer of Supervisor MyAppWeb.Endpoint shut down abnormally
** (exit) killed
Pid: #PID<0.763.0>
Start Call: Plug.Cowboy.Drainer.start_link([refs: [MyAppWeb.Endpoint.HTTP]])
Restart: :permanent
Shutdown: 5000
Type: :worker
[notice] Application my_app exited: shutdown

There's more before that but the pattern repeats. I know this isn't always necessarily repeatable but if I restart the application and get the same pid (<0.763.0>) for the drainer then the process that sent the message, <0.661.0>, appears to be ranch_cons_sup.

josevalim · 2023-02-22T16:52:42Z

Hrm, this is weird because ranch has no idea about this process unless we are shutting it down!

dantswain · 2023-02-22T16:59:30Z

Am I reading the trace correctly that the drainer is sending {$gen_call', {... count_children}} to <0.661.0> or is it the other way around? It looks like the drainer could be doing that here: https://github.com/elixir-plug/plug_cowboy/blob/master/lib/plug/cowboy/drainer.ex#L103-L105

josevalim · 2023-02-22T17:14:34Z

Right, but the point is that we should not be in that piece of the code unless the drainer started to terminate!

dantswain · 2023-02-22T19:06:45Z

I'm trying to catch it in the act but I think it might be getting triggered by or during the automatic rebuild cycle. Does that ring any bells?

Is the trace saying that the call is timing out? What might cause that to timeout? It might hint at the root problem.

I'll keep digging here but it's sort of hard because it's very distracting compared to the mild inconvenience of having to manually restart periodically 😂 Definitely open to ideas of things to try.

dantswain · 2023-02-23T19:54:48Z

OK I think I've actually tracked my problem down to another source - joken-elixir/joken_jwks#40

I'm wondering, though, if what I've seen here might hint at a problem in plug_cowboy? The traces make it look like a timeout is happening and then the drainer exits abnormally.

josevalim · 2023-02-23T23:07:59Z

Right, that was my point since the beginning. This is definitely a secondary failure, something is making your app fail before, but I am not sure why you can't see the first failure! I am not sure if there is a bug here, for all purposes, the shutdown/kill is happening for correct reasons.

einariii · 2023-07-24T15:29:51Z

I encountered this issue when a project didn't have the .env file used in the original dev environment.

nb. this happened immediately when running mix phx.server, unlike np who seemed to see this after deployment.

whatyouhide added the Kind:Bug label Feb 23, 2023

whatyouhide removed the Kind:Bug label Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plug.Cowboy.Drainer shut down abnormally #92

Plug.Cowboy.Drainer shut down abnormally #92

dantswain commented Feb 22, 2023

josevalim commented Feb 22, 2023

dantswain commented Feb 22, 2023

josevalim commented Feb 22, 2023

dantswain commented Feb 22, 2023

dantswain commented Feb 22, 2023

josevalim commented Feb 22, 2023

dantswain commented Feb 22, 2023

josevalim commented Feb 22, 2023

dantswain commented Feb 22, 2023

dantswain commented Feb 23, 2023

josevalim commented Feb 23, 2023

einariii commented Jul 24, 2023

Plug.Cowboy.Drainer shut down abnormally #92

Plug.Cowboy.Drainer shut down abnormally #92

Comments

dantswain commented Feb 22, 2023

josevalim commented Feb 22, 2023

dantswain commented Feb 22, 2023

josevalim commented Feb 22, 2023

dantswain commented Feb 22, 2023

dantswain commented Feb 22, 2023

josevalim commented Feb 22, 2023

dantswain commented Feb 22, 2023

josevalim commented Feb 22, 2023

dantswain commented Feb 22, 2023

dantswain commented Feb 23, 2023

josevalim commented Feb 23, 2023

einariii commented Jul 24, 2023