Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add req.filters and bereq.filters #4035

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nigoroll
Copy link
Member

@nigoroll nigoroll commented Jan 1, 2024

Summary

These changes add support for request body filters on the client and backend side. The motivation for this feature will hopefully be obvious: To be able to transform or process "uploads" in varnish-cache without the need for a backend application.

VDPs and VFPs are used here "in reverse" to the existing use on response bodies: the client side fetches the request body, so it uses a VFPs, while the backend side delivers the request body to be backend side, so it uses VDPs.

Please read the individual commit messages. This description is intended so serve as an overview.

Preparation work

This PR is based on some minor changes which were pushed directly to trunk because they do not represent API changes, preparing VDP for use on the backend side, where we do not have a struct req:

  • 81d406e renames some arguments to improve greppability for vdp_ctxes
  • 3ca7b1d reduces use of struct req from VDPs to where absolutely necessary (client side only VDPs).
  • 13cf51e renames (struct req).filter_list for consistency
  • b5d8e57 & 8d0b37d make the http line VDP available to other code
  • 0835051 adds an OBJ_ITER_END flag to the request body objiterate_f call

Overview of commits

The first commit generalizes the VDP API to make it suitable for use on the backend side. The main change here is to pass, for generic VDPs (those which are suitable for backend side use), the "outgoing" headers and a pointer to a length field, rather than using struct req. As is, this change serves clarity and code compaction only, all information is already present in the VRT_CTX argument, but the cl argument becomes important later on.

The next two commits add VDPs for bereq.body without introducing custom VDPs yet. That is, only the http1 VDP is pushed, keeping functionality otherwise. As a consequence of this change, fetches can now also fail for out-of-workspace for VDP.

Prepare VCL_StackVDP for backend use changes the implementation to work either on a struct req or struct busyobj, but not both.

Once this groundwork is laid, adding the actual fields and VRT interface for req.filters and bereq.filters is rather trivial. We stack the filters like elsewhere and add a test case.

To be done after merge

changelog

@nigoroll
Copy link
Member Author

nigoroll commented Jan 2, 2024

If anyone is interested how the adjustments look like for external vmods:

@gquintard
Copy link
Member

thanks a bunch @nigoroll, I've been wanting that for rers for a while!

Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that instead of breaking so much of the API to enable it for *req.body we could instead enhance the vdp_ctx and add a bunch of fields (bo, hd and cl) and let processors check or use what they need from the context (similarly to vrt_ctx).

For example, the ESI processor could assert(vdc->hd == req->resp) after grabbing reqand we wouldn't need to change much more.

That should also mean changing this prototype:

void VDP_Init(struct vdp_ctx *, struct *req, struct *bo);

Considering how it currently operates strictly on req fields it would probably be an improvement anyway. And that would open the pipe case, that I didn't see addressed in this patch series (but this was not a thorough review so don't be mad if pipe is covered).

Now regarding the objcore, we could imagine something like this for VDPs that need it:

struct objcore * VDP_Objcore(struct vdp_ctx *);

It would check whether this is the first member of the stack and grab the right one depending on the calling context.

Overall I'm in favor but I get the feeling that this change unpacks too much of the API instead of keeping as much as relevant contained.

Comment on lines -26 to +38
client c1 {
# Start with enough workspace to receive a good result
txreq -hdr "WS: 320"
# Start with enough workspace to receive some good results
client c1 -repeat 5 {
txreq -hdr "Connection: close"
rxresp
expect resp.status == 200
} -run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe submit that kind of change independently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

@@ -1,6 +1,9 @@
varnishtest "sweep through tight backend workspace conditions"

# XXX almost the same as r01834.vtc - retire?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not if r1834 didn´t catch #2645 at the time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r1834 was not originally using the "sweep" idea. Now I have changed it to also use that, so now they are almost the same.

bin/varnishd/cache/cache.h Outdated Show resolved Hide resolved
bin/varnishd/http1/cache_http1_fetch.c Outdated Show resolved Hide resolved
Comment on lines -140 to +155
VDP_Push(VRT_CTX, struct vdp_ctx *vdc, struct ws *ws, const struct vdp *vdp,
void *priv)
VDP_Push(VRT_CTX, struct vdp_ctx *vdc, struct ws *ws,
const struct vdp *vdp, void *priv,
struct objcore *oc, struct req *req,
struct http *hd, intmax_t *cl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick, the priv argument tends to be first in callbacks and last otherwise (though not always).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did ponder this question for a bit: The priv argument is the priv member from struct vdp_entry, so it belongs to the vdp. Thus I kept it next to it.

@nigoroll
Copy link
Member Author

nigoroll commented Jan 8, 2024

I feel that instead of breaking so much of the API to enable it for *req.body we could instead enhance the vdp_ctx and add a bunch of fields (bo, hd and cl) and let processors check or use what they need from the context (similarly to vrt_ctx).

These questions are, in my mind, answered in the commit message of 0274ebc:

We already have (had, from the perspective of the patch) a req member in vdp_ctx which is only implicitly scoped to VDP init time by being NULLed here.

I think the API should make it clear that VDPs should not mess with a request, a busy object, headers or the content length after init time, thus I think that we should change the init function signature and specifically not add vdp_ctx fields.

Also please note the following from the commit message regarding the motivation for this change:

Note that none of the new arguments would be necessary, they could all (including the existing oc argument) be derived from the VRT_CTX. The reasons for having these arguments are code clarity and avoidance of code duplication, e.g. for repeatedly figuring out the correct header and length pointer on the client vs. backend side.

I did actually try this for VDPs and ended up with repetitive code - you basically have to repeat the additions to VCL_StackVDP() in each VDP init function.

The changed API makes it easier (and, in my mind, clearer) how to write a VDP which works everywhere: You get the headers and the content length, make modifications to both as needed and be done. See for example the vmod_re change, which mostly consists of boilerplate changes to CHECK_OBJ_* the arguments. The meat of the change is *cl = -1, and that's it.

For VDPs which need objcore access or only work on the client side, I believe, the API change also clarifies the situation: The respective init function arguments are present only if the respective access is allowed.

For example, the ESI processor could assert(vdc->hd == req->resp) after grabbing reqand we wouldn't need to change much more.

Please look at the commit again, it also does not change much more: Besides the CHECK_OBJ_* boilerplate, the only real change is to fail if there is no req.

And that would open the pipe case, that I didn't see addressed in this patch series

I think we should work on one question at a time, and if pipe mode requires another API change, then so be it. But does it?

@asadsa92
Copy link
Contributor

There is a leak reported by ASAN:

***  v1    debug|Info: Child (73917) said Child dies
**** dT    10.444
***  v1    debug|Info: Child (73917) said 
***  v1    debug|Info: Child (73917) said =================================================================
***  v1    debug|Info: Child (73917) said ==73917==ERROR: LeakSanitizer: detected memory leaks
***  v1    debug|Info: Child (73917) said 
***  v1    debug|Info: Child (73917) said Direct leak of 8 byte(s) in 1 object(s) allocated from:
***  v1    debug|Info: Child (73917) said     #0 0x7fc5d007e9cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
***  v1    debug|Info: Child (73917) said     #1 0x7fc5c1c9c262 in xyzzy_vfp_rot13_init ../../../vmod/vmod_debug.c:118
***  v1    debug|Info: Child (73917) said     #2 0x557abdd183c5 in VDP_Push ../../../../bin/varnishd/cache/cache_deliver_proc.c:185
***  v1    debug|Info: Child (73917) said     #3 0x557abde4931b in VCL_StackVDP ../../../../bin/varnishd/cache/cache_vrt_filter.c:298
***  v1    debug|Info: Child (73917) said     #4 0x557abded605c in V1F_SendReq ../../../../bin/varnishd/http1/cache_http1_fetch.c:101
***  v1    debug|Info: Child (73917) said     #5 0x557abdcd2ad5 in vbe_dir_gethdrs ../../../../bin/varnishd/cache/cache_backend.c:304
***  v1    debug|Info: Child (73917) said     #6 0x557abdd1bb64 in VDI_GetHdr ../../../../bin/varnishd/cache/cache_director.c:147
***  v1    debug|Info: Child (73917) said     #7 0x557abdd5dcc1 in vbf_stp_startfetch ../../../../bin/varnishd/cache/cache_fetch.c:432
***  v1    debug|Info: Child (73917) said     #8 0x557abdd6d248 in vbf_fetch_thread ../../../../bin/varnishd/cache/cache_fetch.c:1109
***  v1    debug|Info: Child (73917) said     #9 0x557abde9c2c0 in Pool_Work_Thread ../../../../bin/varnishd/cache/cache_wrk.c:496
***  v1    debug|Info: Child (73917) said     #10 0x557abde9552c in WRK_Thread ../../../../bin/varnishd/cache/cache_wrk.c:157
***  v1    debug|Info: Child (73917) said     #11 0x557abde9cdbb in pool_thread ../../../../bin/varnishd/cache/cache_wrk.c:529
***  v1    debug|Info: Child (73917) said     #12 0x7fc5cf39d043  (/lib/x86_64-linux-gnu/libc.so.6+0x89043)
***  v1    debug|Info: Child (73917) said 
***  v1    debug|Info: Child (73917) said -----------------------------------------------------
***  v1    debug|Info: Child (73917) said Suppressions used:
***  v1    debug|Info: Child (73917) said   count      bytes template
***  v1    debug|Info: Child (73917) said       2        136 HSH_config
***  v1    debug|Info: Child (73917) said       1      16384 VSL_Setup
***  v1    debug|Info: Child (73917) said       2         64 WRK_BgThread
***  v1    debug|Info: Child (73917) said -----------------------------------------------------
***  v1    debug|Info: Child (73917) said 
***  v1    debug|Info: Child (73917) said SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).

dridi added a commit to dridi/varnish-cache that referenced this pull request Jan 29, 2024
This is not going through a VDP, but it eventually will.

Refs varnishcache#4035
@nigoroll
Copy link
Member Author

I have rebased, squashed the squashable commits and force-pushed .

This commit is to prepare for use of the VDP API also for the backend
side to filter bereq.body through bereq.filters:

The req member of struct vdp_ctx is removed, because it should only be
used for initialization and, consequently, is already nulled in
VDP_DeliverObj().

vdp_init_f() and its caller VDP_Push() gain arguments:

- req is only present if the VDP is used on the client side.

- hd is a pointer to the outgoing headers corresponding to the body
  being worked on: (struct req).resp on the client side and (struct
  busyobj).bereq on the backend side (tbd).

- cl is a pointer to the content-length. -1 indicates 'unknown'.

VDPs should aim for compatibility with both the client and backend use
case by using only hd and cl instead of req. If they depend on req,
they should return an error if req is NULL.

Note that none of the new arguments would be necessary, they could all
(including the existing oc argument) be derived from the VRT_CTX. The
reasons for having these arguments are code clarity and avoidance of
code duplication, e.g. for repeatedly figuring out the correct header
and length pointer on the client vs. backend side.
This change introduces additional failure points to V1F_SendReq() when
there is insufficient workspace to push the V1L VDP.

We adjust r01834.vtc to cover the respective 503 errors and simplify it
by example of r02645.vtc. The two test cases now resemble each other a
lot, but I would think, at least for now, their purpose is so important
that the overhead does not matter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants