WAF Proxy: defensive body parsing: consider validating & parsing payloads in the WAF #216

RomainMuller · 2023-09-22T09:33:36Z

The current agent implementation (e.g: DataDog/datadog-agent#19654) parses payload content when possible before passing it to the WAF, which may open it up for exploits of vulnerabilities in the various parsers (JSON, url-encoded, mime/multiplart, ...).

Since most of these vulnerabilities stem from careless processing of untrusted data, I believe it would be ideal if the WAF received the raw payload bytes + associated headers, so it could validate or sanitize the payloads before determining whether they are deemed "safe to parse", and eventually parsing them to inspect the contents.

One drawback obviously is that this requires the WAF to ship with the various required parsers (which may significantly inflate the image size), and it probably should favor highly defensive implementations the parsers (which often comes at a significant performance cost). In order to mitigate the first of these problems, an alternative could be for the WAF to expose a function that inspects the raw payload data + content-type headers, and returns a sanitized form of it, or a flag indicating whether that payload may be parsed or not -- this way the parsers can continue to reside in the agent (possibly being provided by the standard library) instead of being hoisted/duplicated into the WAF.

Julio-Guerra · 2023-09-22T10:18:23Z

Adding extra context: I think this question becomes especially important when integrating libddwaf into proxies, like we do for AWS Lambda, and like we think of doing for envoy and co, where we no longer have the possibility to rely on the lazy monitoring of what the app is actually doing (instrumenting parsers and so on). So a defensive parsing approach would be preferable.

Anilm3 · 2023-09-22T10:44:14Z

Performing raw body parsing in the WAF has been discussed over and over, although there hasn't been a need to do this and traditionally raw body parsing has been discouraged due to the inherent performance penalty, favouring instrumentation that provides access to the parsed body instead.

With the introduction of schema extraction (and now serverless), this is being considered and discussed, however, while I'm generally in favour of having common functionality in the WAF, there are other important considerations in this instance:

Since the body parsing needs to be performed before rule and filter evaluation, this could result in timeouts which inhibit the evaluation of rules, and thus provide an easy mechanism to bypass the WAF. A clever strategy involving the lazy generation of the raw body could potentially work, although the interaction between rule filters, input filters and rules would become much more complex. Similarly providing different levels of granularity for the timeout could also work.
Another concern is the potential limitations around the strings provided to the WAF, if the string is very large and the library has to perform a conversion + copy, it could also be quite costly. Alternatives such as truncation could make it difficult or impossible to properly parse the body.
The binary size increase could be unacceptable due to the size limitations in certain serverless environments.
While currently the WAF is the sole consumer of the parsed body, this might not be the case in the near future. In this case, the process of serialising the raw body, parsing it, copying it and deserialising it could be more expensive than parsing it in the library rather than the WAF.
Many libraries (e.g. java) are against introducing redundancies in the WAF, such as parsers, which are already available and better integrated to the language in question.

We could certainly restrict this feature to serverless, but that opens other issues unrelated to the WAF.

RomainMuller · 2023-09-22T12:09:50Z

While currently the WAF is the sole consumer of the parsed body, this might not be the case in the near future. In this case, the process of serialising the raw body, parsing it, copying it and deserialising it could be more expensive than parsing it in the library rather than the WAF.

Many libraries (e.g. java) are against introducing redundancies in the WAF, such as parsers, which are already available and better integrated to the language in question.

Yeah I fully agree with these... This is why I was wondering if a simple sanitization/quick-preflight could be possible: e.g, given a binary payload + content-type, quickly recognize known parser-exploiting-inputs & either sanitize or reject them... then still let the instrumentation or proxy perform the parsing as/if needed.

Julio-Guerra changed the title ~~Consider validating & parsing payloads in the WAF~~ WAF Proxy: defensive body parsing: consider validating & parsing payloads in the WAF Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WAF Proxy: defensive body parsing: consider validating & parsing payloads in the WAF #216

WAF Proxy: defensive body parsing: consider validating & parsing payloads in the WAF #216

RomainMuller commented Sep 22, 2023

Julio-Guerra commented Sep 22, 2023 •

edited

Anilm3 commented Sep 22, 2023

RomainMuller commented Sep 22, 2023 •

edited

WAF Proxy: defensive body parsing: consider validating & parsing payloads in the WAF #216

WAF Proxy: defensive body parsing: consider validating & parsing payloads in the WAF #216

Comments

RomainMuller commented Sep 22, 2023

Julio-Guerra commented Sep 22, 2023 • edited

Anilm3 commented Sep 22, 2023

RomainMuller commented Sep 22, 2023 • edited

Julio-Guerra commented Sep 22, 2023 •

edited

RomainMuller commented Sep 22, 2023 •

edited