Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expire: Force a cache MISS when req.ttl = 0s #4041

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AlveElde
Copy link
Contributor

@AlveElde AlveElde commented Jan 18, 2024

This commit changes the undefined behavior of setting req.ttl = 0s. Setting req.ttl = 0s would previously cause req.ttl to have no effect, but this breaks with setting req.grace = 0s, which does have an effect.

The use case is to achieve hash_always_miss semantics but with coalescing (waitinglist) (edit @nigoroll during bugwash)

The new behavior of setting req.ttl = 0s is similar to setting req.hash_always_miss = true, the main difference is that req.ttl allows for request coalescing. Useful for short-lived non-private objects.

I would like to define this behavior in the documentation somewhere, but didn't find a good place to put it.

Both req.ttl and req.grace are initialized to -1, which is why this change does not break everything. But setting req.ttl/grace to -1s in VCL ends up setting them to 0s, as these two req attributes have extra logic in their respective definitions:

REQ_VAR_L(ttl, d_ttl, VCL_DURATION, if (!(arg>0.0)) arg = 0;)
REQ_VAR_R(ttl, d_ttl, VCL_DURATION)
REQ_VAR_L(grace, d_grace, VCL_DURATION, if (!(arg>0.0)) arg = 0;)
REQ_VAR_R(grace, d_grace, VCL_DURATION)

Since this commit effectively prevents VCL writers from "clearing" req.ttl by setting it to 0s, maybe we should consider allowing users to clear req.ttl/grace by setting them to -1s?

@dridi
Copy link
Member

dridi commented Jan 18, 2024

Since this commit effectively prevents VCL writers from "clearing" req.ttl by setting it to 0s, maybe we should consider allowing users to clear req.ttl/grace by setting them to -1s?

Maybe this is for the best? What about this instead?

unset req.ttl;
unset req.grace;

@nigoroll
Copy link
Member

My take: Integrate Dridis suggestion and update the docs, then it's a good improvement!

@dridi
Copy link
Member

dridi commented Jan 25, 2024

A thought occurred to me, shouldn't we use NAN when req.{ttl,grace} are unset?

@hermunn
Copy link
Member

hermunn commented Jan 31, 2024

The problem with this PR is that it does not do what the title says. In the event of a waiting list, and there is a queue of clients waiting for an object, a newly inserted object will be "fresh", even if the "waiting list clients" all have req.ttl = 0s; and req.grace = 0s;, so these clients will get hits on the inserted object (the one which the clients on the waiting list were waiting for). This should be documented in a test case. Of course, this might be desired or undesired for the user, which is a problem.

The question is if we should special case the situation req.ttl = 0s; and req.grace = 0s; to mean I don't want a hit, grace or not, but do go on the waiting list if there is a busy object, and do grab the stale_oc if you find somthing. I don't like the idea of adding a new symbol for this behavior, in the way we have hash_always_miss and hash_ignore_busy, but it might be the "most correct" solution.

@dridi
Copy link
Member

dridi commented Jan 31, 2024

I think it would be more accurate to say "consider all objects stale when req.ttl = 0s".

Aren't we creating a perpetual waiter if the client disregards both TTL and grace? That is, when there is a busy object, the outcome will always be to disembark the request. So until the request wins the miss race, it will compete with other "req.ttl = req.grace = 0s" requests.

This is waiting list serialization once again, except that we swapped beresp with req to trigger it. I can try combining this change with #4032 to check whether it would eventually be mitigated.

@dridi
Copy link
Member

dridi commented Jan 31, 2024

Aren't we creating a perpetual waiter if the client disregards both TTL and grace?

We are not since we treat zero as the lack of TTL limit for the client. Let's revisit that specific point once unset is implemented and we treat NAN as the lack of limit instead.

@hermunn
Copy link
Member

hermunn commented Jan 31, 2024

This is waiting list serialization once again

Yeah, you are right, my comment does not add much, and I think we should simply go with the PR as it is now, with all the timestamp confusion which exist.

@hermunn
Copy link
Member

hermunn commented Jan 31, 2024

Sorry, strike that, I still think we need a test case demonstrating how you can get a hit, even when the request set both req.ttl and req.grace to 0s because of the waiting list and timestamps.

Copy link
Member

@hermunn hermunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at the test case again, and I think it does what it should do. I think we can merge this.

@AlveElde
Copy link
Contributor Author

If you, like me, were wondering why the test case provided does not return a grace hit, this is the reason: #2422. The object in cache is still fresh, and since we ignore req.ttl, we get no candidates for a grace hit. Whether this is the correct behavior is a subject for future discussion.

@dridi
Copy link
Member

dridi commented Jan 31, 2024

I think the trade off from #2422 can be let go in favor of the trade off from #4032. You should end up with cache hits again, for a different reason.

Copy link
Member

@nigoroll nigoroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I am happy, but the one question I have might be relevant.

ctx->req->elem = val; \
ctx->req->elem = val; \
}

REQ_VAR_R(backend_hint, director_hint, VCL_BACKEND)

REQ_VAR_L(ttl, d_ttl, VCL_DURATION, if (!(arg>0.0)) arg = 0;)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still want to set negative arguments to zero? Could it be a better option to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's reasonable to expect that req.ttl can be set to a negative duration, we even do so in a test case:

Failing on a negative duration has a good chance of breaking existing VCL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a coverage test.
What are the semantics of a negative duration for req.ttl and req.grace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as beresp.ttl and beresp.grace:

if (a < 0.0) \
a = 0.0; \

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this "has always been like that", but also please remember that for quite some time, varnish had no way of properly failing at runtime (VRT_fail()).

The particular lines you are referring to are from 10 years ago and I am still wondering what a negative ttl/grace is supposed to mean. ttl: already expired before it even went into the cache? If those were the semantics, then we would not be allowed to take the respective objects as "just refreshed". grace: I would think a negative value should move an object into grace mode earlier than ttl.

So, as we come across these questions, we should answer them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree that "has always been like that" is a weak argument, but I personally prefer to limit breakage if I can. It's not obvious to me what a negative TTL/grace should mean, and I think that question is bigger than the scope of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to cover this quickly during the next bugwash. I am also fine with continuing to ignore negative values, but I'd like to at least ask for opinions.

This commit changes the undefined behavior of setting req.ttl = 0s.
Setting req.ttl = 0s would previously cause req.ttl to have no effect,
but this breaks with setting req.grace = 0s, which does have an effect.

The new behavior of setting req.ttl = 0s is similar to setting
req.hash_always_miss = true, the main difference is that req.ttl allows
for request coalescing. Useful for short-lived non-private objects.

This aligns the behavior of setting req.ttl = 0s with req.grace = 0s.
We continue to use a default value of -1 instead of changing the
default to NAN. These variables can be read from VCL, and VCL does not
want to deal with NAN.
@AlveElde
Copy link
Contributor Author

I force pushed for two reasons:

  • Squash the squashme commit
  • Drop the commit changing the default value of req.ttl and req.grace to NAN

CCI alerted me to the fact that a VTC was failing due to a NAN being converted to string in the following line:

set resp.http.X-req-grace = req.grace;

Unless we want to convert NAN to some number when reading req.ttl and req.grace, I suggest we stick with -1 as the default value.

@nigoroll
Copy link
Member

Unless we want to convert NAN to some number when reading req.ttl and req.grace, I suggest we stick with -1 as the default value.

I think we should print NAN

@dridi
Copy link
Member

dridi commented Mar 5, 2024

This pull request didn't strictly need #4069 to move forward, but #4069 cemented the semantics for NAN in a vtim_dur. We should use that for req.ttl and req.grace now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants