Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Werkzeug not conforming to RFC2616 #940

Closed
Nessphoro opened this issue May 26, 2016 · 6 comments
Closed

Werkzeug not conforming to RFC2616 #940

Nessphoro opened this issue May 26, 2016 · 6 comments

Comments

@Nessphoro
Copy link

Nessphoro commented May 26, 2016

In particular, internally Werkzeug converts headers of type "Transfer-Encoding" or "X-PoweredBy" into "HTTP_TRANSFER_ENCODING" and "X_POWERED_BY" respectively. However, if two headers "App-Header-1" and "App_Header_1" were both present in the request -- or even just "App_Header_1" -- both will be stored in the same way internally, moreover Werkzeug does not distinguish between the two. This behavior is not conforming to the RFC2616 HTTP 1.1 Spec, where the Header name take can be any valid token without control or separator characters. This has many issues, as if a request came in with "App_Header_1" it will then be converted to "HTTP_APP_HEADER_1", now frameworks like Flask will interpret it as "App-Header-1" which is not the same, and passing those values on to systems that do depend on the correct behavior will fail as "App_Header_1" != "App-Header-1" under any sane comparison strategy.

@untitaker
Copy link
Contributor

This behavior is caused by the WSGI spec itself, and it is impossible for Werkzeug to make a distinction between those two headers while still adhering to the WSGI spec. I don't know of any situation where this has been an issue in practice.

RFC 2616 is deprecated by the way, but the newer RFCs don't change anything in that regard anyway.

@untitaker
Copy link
Contributor

untitaker commented May 26, 2016

Also Werkzeug is not responsible for parsing or writing HTTP requests or responses on a syntactical level. This is done by whichever WSGI server you choose. Werkzeug does have a built-in server, but that only extends the WSGI server from the standard library.

@Nessphoro
Copy link
Author

Nessphoro commented May 26, 2016

The behavior is not caused by the WSGI spec, if you are referring to PEP 3333.
PEP 3333 states that:
HTTP_ Variables
Variables corresponding to the client-supplied HTTP request headers (i.e., variables whose names begin with "HTTP_" ). The presence or absence of these variables should correspond with the presence or absence of the appropriate HTTP header in the request.

There is nothing in the Spec about treating underscores differently. In fact, you are not conforming to PEP 3333 either because I cannot know which header is present or absent, i.e. if "App_H" or "App-H" is present.

@untitaker
Copy link
Contributor

untitaker commented May 26, 2016

WSGI is a Python-specific port of CGI. PEP 0333 references the CGI spec for the definition of "CGI environment variables", which states:

The HTTP header field name is converted to upper case, has all occurrences of "-" replaced with "_" and has `HTTP_' prepended to give the meta-variable name.

See https://tools.ietf.org/html/rfc3875#section-4.1.18

All of this is irrelevant as Werkzeug, for the most part, does not parse raw HTTP requests and does not produce the WSGI environment. It parses the WSGI environment. I don't see why you're filing this issue against this particular implementation. Don't shoot the messenger.

Basically there's a lot of legacy from CGI here, which is inherited by WSGI, which is what Werkzeug is trying to implement.

@Nessphoro
Copy link
Author

I really don't mean to start a fight; But it is a problem with security implications, especially for web servers that sit behind "legacy"-ish application gateways that pass in sensitive / application specific information in headers containing underscores without filtering other headers (I have concrete examples, that is, this is actually an issue in practice, which you have dismissed by your initial comment). CGI is over 20 years old, hence WSGI and HTTP specs should take precedence.

As for raw parsing:

for key, value in self.headers.items():

@untitaker
Copy link
Contributor

untitaker commented May 26, 2016

(I have concrete examples, that is, this is actually an issue in practice, which you have dismissed by your initial comment)

Then all I can say is that I'm sorry you have a server like that. Werkzeug can't do anything about it. The snippet you showed was precisely why I said "for the most part". Werkzeug also includes a simple testserver that can be used in development (to avoid having to set up Gunicorn + Nginx/Apache, for example). But if you're exposing that server to the internet, the potential issue you mention will be the least of your (security) problems.

WSGI was designed to be compatible with CGI to the most extent possible. WSGI does not override or supersede CGI. WSGI is CGI. There is very little work involved in making a WSGI application a valid CGI application. os.environ of a Python CGI script looks mostly like the equivalent WSGI environment. There is a lot of legacy to consider when changing anything about this. Werkzeug as a project doesn't have a say in this, and especially not me.

Again, this issue for this project is pointless. Werkzeug is not alone in this behavior. Last time I checked nginx blocks HTTP headers with underscores by default, due to the whole ambiguity issue. Most other tools probably just conflate them. If most tools can't tell the difference between a dash and a underscore, it does not matter anymore when the standard says there is one. Everybody is forced to play along, and eventually the tool emitting headers with underscores is to blame.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants