Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend connection queue #4030

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

walid-git
Copy link
Member

This patch allows a task to be queued when a backend reaches its max_connections. The task will queue on the backend and wait for a connection to become available, rather than immediately failing. This capability is off by default and must be enabled with new parameters.

The following parameters have been added:
backend_wait_timeout: the amount of time a task will wait (default 0.0).
backend_wait_limit: the maximum number of tasks that can wait (default 0).

The two parameters can also be overriden for individual backends in vcl:

    backend foo {
        .host = "bar.com";
        .wait_timeout = 3s;
        .wait_limit = 10;
    }

Authored by: @drodden (with minor rearrangements)

@walid-git
Copy link
Member Author

With this change, it seems that the default value for thread_pool_stack on 32 bit systems is no more sufficient (reason why e00029.vtc is failing on ubuntu_bionic).

@bsdphk
Copy link
Contributor

bsdphk commented Dec 18, 2023

As an initial matter I would prefer queue_length and queue_timeout which I think are almost self-explanatory.

I'm not sure I think it is a good idea however, but default-off mitigates that.

@dridi
Copy link
Member

dridi commented Dec 18, 2023

FWIW this is a partial implementation of what we agreed on for VIP 31.

https://github.com/varnishcache/varnish-cache/wiki/VIP31%3A-Backend-connection-queue

There are two loose points not covered, disembarking and health status. Disembarking fetch tasks is a large project, one we can disregard or move to its own VIP. The saturation of both .max_connections and .wait_limit/backend_wait_limit is probably reasonable to address before closing VIP 31.

Copy link
Member

@nigoroll nigoroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am 👍🏽

bin/varnishd/cache/cache.h Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_backend.c Outdated Show resolved Hide resolved
include/vrt.h Show resolved Hide resolved
@walid-git walid-git force-pushed the upstream_backend_queue branch 2 times, most recently from 478c1ef to 65bebcb Compare January 15, 2024 13:37
@nigoroll
Copy link
Member

nigoroll commented Jan 15, 2024

Thank you, this looks good to me overall.

I would still have some suggestions, but would also be OK to polish after merge, if you agree:

  • struct backend_cw should have a magic value which we check for access from other threads. While I did suggest to move it to the stack, I am well aware of the risk of smashing foreign stacks, and magic checks can help raise a flag if that happens.
  • the magic would be initialized with INIT_OBJ(), which would also initialize the list pointers for clarity. The code is still correct, IIUC, even with uninitialized list pointers, but I think it helps debugging a lot if they are zeroed.
  • cw_state should be moved into struct backend_cw. This allows for additional assertions (e.g. in vbe_connwait_signal_locked, we can assert that the state is CW_QUEUED.
  • when we dequeue, we should change the state to a new state, e.g. CW_DEQUEUED.
  • We should add a backend_cw_fini function which asserts that the state is != CW_QUEUED before destroying the condition variable.
  • Regarding the inlining of the dequeue, I think I would actually prefer a vbe_connwait_dequeue_locked for clarity, because that would now also change the state.

Copy link
Member

@nigoroll nigoroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see top level comment

bin/varnishd/cache/cache_backend.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_backend.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_backend.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_backend.h Outdated Show resolved Hide resolved
@walid-git walid-git force-pushed the upstream_backend_queue branch 2 times, most recently from 7fb8ce2 to f82cb60 Compare February 5, 2024 11:18
@walid-git
Copy link
Member Author

Rebased and addressed all review comments. Ready for a (hopefully) last review.

bin/varnishd/cache/cache_backend.c Outdated Show resolved Hide resolved
@nigoroll nigoroll self-requested a review February 19, 2024 14:52
nigoroll added a commit to nigoroll/varnish-cache that referenced this pull request Feb 19, 2024
Copy link
Member

@nigoroll nigoroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always feel bad when it looks like I was holding things up, so I would like to apologize for not having spotted some issues earlier.

bin/varnishd/cache/cache_backend.c Show resolved Hide resolved
bin/varnishd/cache/cache_backend.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_backend.c Show resolved Hide resolved
bin/varnishd/cache/cache_backend.c Show resolved Hide resolved
@@ -149,6 +246,12 @@ vbe_dir_getfd(VRT_CTX, struct worker *wrk, VCL_BACKEND dir, struct backend *bp,
if (bo->htc == NULL) {
VSLb(bo->vsl, SLT_FetchError, "out of workspace");
/* XXX: counter ? */
if (cw->cw_state == CW_QUEUED) {
Lck_Lock(bp->director->mtx);
vbe_connwait_dequeue_locked(bp, cw);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not move this whole htc alloc block to the top, even before the cw init?

Reason: the ws does not change during waiting, so if it overflows, it does so right from the start.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we allocate workspace before we are sure that we can get a backend connection ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wouldn't be able to allocate the htc in the first place, why wait at all?

On the other hand, if we are effectively not connecting and workspace would have been too tight, then we fail for the wrong reason, and we don't visit vcl_backend_error at all.

Since the default values for the parameters make this an opt-in feature, may I suggest adding an XXX comment for now to take the proper time later to see how to best approach this? (snapshot/reset for certain paths for example)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an XXX comment as suggested

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, if we are effectively not connecting and workspace would have been too tight, then we fail for the wrong reason, and we don't visit vcl_backend_error at all.

I disagree on "fail for the wrong reason". This code only runs because we do want to connect and having enough workspace is a precondition for the connect to succeed. Potentially running into the connection or wait limit does not make it a "wrong reason" to fail for insufficient workspace.

@walid-git
Copy link
Member Author

I have addressed most of the last review items, and mentioned the potential drawbacks of this feature in the docs as requested during last bugwash.

bin/varnishd/cache/cache.h Outdated Show resolved Hide resolved
@@ -149,6 +246,12 @@ vbe_dir_getfd(VRT_CTX, struct worker *wrk, VCL_BACKEND dir, struct backend *bp,
if (bo->htc == NULL) {
VSLb(bo->vsl, SLT_FetchError, "out of workspace");
/* XXX: counter ? */
if (cw->cw_state == CW_QUEUED) {
Lck_Lock(bp->director->mtx);
vbe_connwait_dequeue_locked(bp, cw);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wouldn't be able to allocate the htc in the first place, why wait at all?

On the other hand, if we are effectively not connecting and workspace would have been too tight, then we fail for the wrong reason, and we don't visit vcl_backend_error at all.

Since the default values for the parameters make this an opt-in feature, may I suggest adding an XXX comment for now to take the proper time later to see how to best approach this? (snapshot/reset for certain paths for example)

bin/varnishd/cache/cache_busyobj.c Outdated Show resolved Hide resolved
Copy link
Member

@nigoroll nigoroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree on one detail still, but I do not want to hold this PR up any longer.

@@ -149,6 +246,12 @@ vbe_dir_getfd(VRT_CTX, struct worker *wrk, VCL_BACKEND dir, struct backend *bp,
if (bo->htc == NULL) {
VSLb(bo->vsl, SLT_FetchError, "out of workspace");
/* XXX: counter ? */
if (cw->cw_state == CW_QUEUED) {
Lck_Lock(bp->director->mtx);
vbe_connwait_dequeue_locked(bp, cw);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, if we are effectively not connecting and workspace would have been too tight, then we fail for the wrong reason, and we don't visit vcl_backend_error at all.

I disagree on "fail for the wrong reason". This code only runs because we do want to connect and having enough workspace is a precondition for the connect to succeed. Potentially running into the connection or wait limit does not make it a "wrong reason" to fail for insufficient workspace.

drodden and others added 11 commits February 29, 2024 10:32
This patch allows a task to be queued when a backend reaches its
max_connections.  The task will queue on the backend and wait for a
connection to become availble, rather than immediately failing.

This initial commit just adds the basic functionality.  It temporarily
uses the connect_timeout as the queue wait time, until new parameters
are added in followup effort.
The following parameters have been added:
    the amount of time a task will wait.
    the maximum number of tasks that can wait.

    - global parameters:
        backend_wait_timeout  (default 0.0)
        backend_wait_limit    (default 0)

    - those parameters can be overridden in the backend:
        backend foo {
            .host = "bar.com";
            .wait_timeout = 3s;
            .wait_limit = 10;
        }

The backend wait queue capability is off by default and must be
enabled by setting both of the new parameters defined above.

Note that this makes an ABI breaking change.
These counters were added to main:
backend_wait - count of tasks that waited in queue for a connection.
backend_wait_fail - count of tasks that waited in queue but did not
    get a connection (timed out).
This makes sure that we won't abort a backend connection
attempt if the backend can take it. It covers for any
potential missing connwait_signal call.
@nigoroll
Copy link
Member

nigoroll commented Mar 1, 2024

bugwash:

proposed (re)name:

global parameters:

  • backend_queue_limit
  • backend_queue_timeout

VCL:

backend foo {
  .queue_limit = 42;
  .queue_timeout 1m;
}
sub vcl_backend_fetch {
  set bereq.queue_limit = 42;
  set bereq.queue_timeout = 1m;
}

@gquintard
Copy link
Member

hi all! I'd really like this to get into the next release, and from what I'm reading, it's only a naming exercise from now on.

As a refresher, the current PR offers backend_wait_timeout/backend_wait_limit and the bugwash proposed backend_queue_timeout/backend_queue_limit. I feel like this isn't a big enough contention point to block a merge.

Any chance to get the original names in? I hate to bring it up and I'll probably get slapped for it but: we have customers using the feature and consistency is pretty important, I don't want to have to translate parameter names depending on the platform people are running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants