Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Varnish 7.1.1 panic - cache/cache_wrk.c:629 #3868

Open
Vormillion opened this issue Nov 25, 2022 · 3 comments
Open

Varnish 7.1.1 panic - cache/cache_wrk.c:629 #3868

Vormillion opened this issue Nov 25, 2022 · 3 comments

Comments

@Vormillion
Copy link

Vormillion commented Nov 25, 2022

Today during high load Varnish dropped all connections and panic.show displays:

Panic at: Tue, 22 Nov 2022 19:58:15 GMT
Wrong turn at cache/cache_wrk.c:629:
Worker Pool Queue does not move
version = varnish-7.1.1 revision 7cee1c581bead20e88d101ab3d72afb29f14d87a, vrt api = 15.0
ident = Linux,5.4.172-90.336.amzn2.x86_64,x86_64,-junix,-smalloc,-sdefault,-hcritbit,epoll
now = 1849122.075794 (mono), 1669147095.298170 (real)
Backtrace:
  ip=0x55da1c2b8d30 sp=0x7f0cb4433870 <PAN__DumpStruct+0x110b>
  ip=0x55da1c314ab7 sp=0x7f0cb44339b0 <VAS_Fail+0x4d>
  ip=0x55da1c2d3da2 sp=0x7f0cb44339f0 <pool_herder+0x66>
  ip=0x7f0cba16b1f5 sp=0x7f0cb4433ac0 <pthread_exit+0x280>
errno = 110 (Operation timed out)
argv = {
  [0] = \"varnishd\",
  [1] = \"-F\",
  [2] = \"-f\",
  [3] = \"/data/varnish.vcl\",
  [4] = \"-s\",
  [5] = \"malloc,4096m\",
  [6] = \"-p\",
  [7] = \"default_ttl=3600\",
  [8] = \"-p\",
  [9] = \"default_grace=3600\",
  [10] = \"-p\",
  [11] = \"feature=+esi_ignore_https\",
  [12] = \"-p\",
  [13] = \"feature=+esi_disable_xml_check\",
  [14] = \"-p\",
  [15] = \"http_resp_hdr_len=42000\",
}
pthread.self = 0x7f0cb4433b38
pthread.name = (pool_herder)
pthread.attr = {
  guard = 8192,
  stack_bottom = 0x7f0cb4413000,
  stack_top = 0x7f0cb4433b08,
  stack_size = 133896,
}
thr.req = NULL
thr.busyobj = NULL
thr.worker = NULL
vmods = {
  std = {0x7f0cb4e59e20, Varnish 7.1.1 7cee1c581bead20e88d101ab3d72afb29f14d87a, 0.0},
},
pools = {
  pool = 0x7f0cb9f9b560 {
    nidle = 1,
    nthr = 5000,
    lqueue = 2
  },
  pool = 0x7f0cb9f9b690 {
    nidle = 0,
    nthr = 5000,
    lqueue = 2
  },
},

I appreciate if someone can help me understand what can be potential source of this issue.

@nigoroll
Copy link
Member

nigoroll commented Nov 28, 2022

Firstly, the intention behind this panic is a good one - to improve your system's uptime by restarting the varnish worker process if something goes terribly wrong. The respective parameter is thread_pool_watchdog.
The potential reasons for the watchdog barking are numerous: Your varnish worker did not run for some time, system resources are exhausted (in particular, memory), your vm got migrated. We really can't know.

Other than that, this could be a bug in the watchdog code, but we have not seen anything like that for long, so at this point I would be tempted to point to "other reasons".

On a related issue, the stack trace points to a potential issue in our panic handler, but that is secondary and non-causal.

nigoroll added a commit that referenced this issue Nov 28, 2022
@nigoroll
Copy link
Member

@Vormillion can you please let us know which binaries you are using? If you built varnishd yourself, please make the binary available for download somewhere.
FWIW, the PAN__DumpStruct+0x110b offset looks like it was not actually happening in PAN__DumpStruct, because that function is just 0x2a0 bytes in length on my system.

@nigoroll
Copy link
Member

On a related issue, the stack trace points to a potential issue in our panic handler, but that is secondary and non-causal.

pan_pool() is last in pan_ic() and looks like it has completed, so the top stack frame could just be the abort() in pan_ic() .

bsdphk pushed a commit that referenced this issue Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants