Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert error in mgt_vcl_discard(): Condition((vp->warm) #4049

Open
nigoroll opened this issue Jan 31, 2024 · 1 comment
Open

Assert error in mgt_vcl_discard(): Condition((vp->warm) #4049

nigoroll opened this issue Jan 31, 2024 · 1 comment

Comments

@nigoroll
Copy link
Member

nigoroll commented Jan 31, 2024

With #4048 applied, the test case from nigoroll/libvmod-dynamic#110 can easily trigger this assertion failure when a lookup thread takes longer than cli_timeout to finish (I first saw this by accident while träwelling on an ICE train):

***  v1    debug|Error: Child (501422) not responding to CLI, killed it.
***  v1    debug|Assert error in mgt_vcl_discard(), mgt/mgt_vcl.c line 701:
***  v1    debug|  Condition((vp->warm) == 0) not true.

This happens when the mgt_vcl_askchild() times out, because the worker process is processing the cold event for longer.

This hack avoids the issue:

diff --git a/bin/varnishd/mgt/mgt_vcl.c b/bin/varnishd/mgt/mgt_vcl.c
index 9c4e01e9c..e20ebf0c0 100644
--- a/bin/varnishd/mgt/mgt_vcl.c
+++ b/bin/varnishd/mgt/mgt_vcl.c
@@ -696,6 +696,10 @@ mgt_vcl_discard(struct cli *cli, struct vclprog *vp)
                vp->warm = 0;
        } else {
                (void)mgt_vcl_setstate(cli, vp, VCL_STATE_COLD);
+               while (vp->warm) {
+                       usleep(1000*1000);
+                       (void)mgt_vcl_setstate(cli, vp, VCL_STATE_COLD);
+               }
        }
        if (MCH_Running()) {
                AZ(vp->warm);

How should we fix this issue? Some ideas:

  • disable the timeout for a COLD event issued from mgt_vcl_discard()
  • re-schedule the discard for later like in auto mode.
@nigoroll
Copy link
Member Author

nigoroll commented Feb 5, 2024

bugwash: just dial up cli_timeout to some infinity-ish value. good point though: phk: slink, we may want to use a smaller timeout still for the MGT's ping/pong's...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant