-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header "Transfer-Encoding: chunked" set even if Content-Length is provided which causes body to not actually get chunked #1648
Comments
I agree, we should pick one of those two options. =) |
So commenting here as I should have done >6 hours ago:
You're implying that in this instance we didn't actually chunk the data. Do you have actual evidence of that? The output alone is not nearly enough to demonstrate that. Furthermore running your example above in 2.0.0 does not work: I get this traceback:
This isn't even an actual issue as far as I can tell. Not at least on 2.0.0 which is the only currently supported version of requests around.
What gave you the impression that this would be the case? |
What gave me the impression was the fact that Requests did not chunk (even if it set the header) once I set Content-Length. The example works with Python 3.3. |
Trying it on Python 3.3 it does not blow up, on the other hand it did chunk the content. The chunking is not platform dependent so I am thoroughly confused by that.
Secondly, I'd still like to know what gave you the impression that setting the |
@sigmavirus24 You're not using Requests 2.0.0 on your Python 3.3 installation. |
@ysangkok good point. :-) With 2.0 it's definitely broken. Regardless my opinion is definitely in the court of us removing the |
You're probably right @sigmavirus24, we should probably strip the Content-Length header. |
Hm, I'm not entirely convinced of my own argument anymore. I went looking for the old discussions surrounding users setting the I'm still of the opinion that users should really not be setting those headers themselves and that we should remove them, however, I think this issue points out a far more important issue, which is the vastly different behaviour of requests on two different versions of Python. On 2.7 (as I demonstrated) setting the One easy way to fix this is to remove the header when using a generator (or always to provide a consistent behaviour), the flaw with this is that this is backwards incompatible behaviour. The other easy way is to break the consistency of the API by not always using chunked transfer encoding with a generator. @Lukasa this definitely needs some deeper thought since I can't find the old conversations where users were admonished for setting that header themselves. To be honest though, I would never expect that setting a header would change the behaviour of using a generator though. This certainly is an extremely sticky situation |
Bleh. I'll ponder. |
Also this isn't a breaking API change that I would hesitate to make because I have to wonder how many people are actually relying on this behaviour. |
I've run into a situation where the length of a generator (for example, a streamed file from another server or really large disk on file) is known. For example: response = requests.get('http://example.com/mybig.iso', stream=True)
length = response.headers.get('Content-Length')
def exhaust(response):
while True:
out = response.raw.read(1024*1024)
if not out:
break
yield out
response = requests.post('http://example.com/upload', data=exhaust(response), headers={'Content-Length': length}) That might be a valid reason to keep Content-Length around. The only workaround right now (that I know of) is to exhaust the generator in memory and pass the bytes directly in. |
@bryanhelmig does the server you're uploading to require you to send the Content-Length header? |
@bryanhelmig did you see the comments in the linked pull request? Anyway, I don't understand why Content-Transfer-Encoding isn't just a flag. No need to delete any headers (or do any other kind of hand-holding), it was never a question of the Content-Length being right or wrong, the actual issue is that the presence of Content-Length semi-disables Content-Transfer-Encoding, which makes no sense at all. But just making Requests ignore Content-Length is not solving the real problem, which is, that Requests uses Content-Transfer-Encoding when it feels like it (sounds like that is supposed to be when reading from a generator), even though many web servers don't even support it. Ignoring Content-Length will confuse people who supply it. If you (@sigmavirus24) insist on not transmitting it, why not just throw an exception? As you said, this functionality is probably not used widely. In the pull request, you said "The use case in the issue is just an example of the behaviour. There is no justification there for why you're doing what you're doing.". I disagree, I think the original code in this issue is perfectly normal behaviour, and in fact I think that streaming of POST data is a huge use case, and that it's ridiculous if one is forced to use Content-Transfer-Encoding or resort to lower-level libraries when streaming/using generators. So to summarize: Content-Transfer-Encoding should be a flag, illegal parameter combinations should provoke exceptions, and user-supplied flags should be sent when possible. And of course, it shouldn't be possible to semi-disable Content-Transfer-Encoding. |
Stop stop stop. Everyone take a breather. @ysangkok You can do streaming uploads without generators just fine. Provide Requests a file-like object in the data parameter and that will work. Yes, it's not as simple as using a generator, but that's ok because it's still not very hard. In the meantime, Requests should not suggest that it is chunking data when it does not. We are all agreed on that. The question is what we should do in your specific case: namely, providing a generator and a @ysangkok You've said that "Ignoring Content-Length will confuse people who supply it." @sigmavirus24 contends that ignoring the very clear documentation when provided with a generator will confuse people who do that. You are both right. (As a side note, the fact that many servers don't understand One way or another we're going to have to pick what we do here. It's possible that the correct decision is to throw an exception when both a generator and I'm naturally inclined to sit on the fence here and throw a
|
@sigmavirus24 it does. :-( 411 for attempts without a Content-Length. Quite annoying IMO.
@ysangkok I did.
@Lukasa That was my original edit actually, but I'm not sure what that nets us in this case besides a chance to link to the docs. Unless I am mistaken, a file object isn't guaranteed to give you a length, generators are just guaranteed to not give a length. I noticed in testing that small files didn't get chucked, a generator forced it. Thanks for all the thorough replies everyone. Really though, it is fine if users in exotic situations have to swap to a different lib for one out of a 100 requests. Life is much easier for the 99 other cases. |
I'd prefer Requests Make Easy Things Easy & Hard Things Possible :) |
@piotr-dobrogost Agreed, but if making a hard thing possible requires making an easy thing harder, we'd rather leave the easy thing easy. =) |
A few things: As far as I'm concerned, this has been (and will continue to be, until we come to a decision) undefined behaviour
There's a difference between servers not understanding a chunked Transfer-Encoding and servers not wanting to respect it. I suspect the latter is the case in @bryanhelmig's case. Alternatively, the application could have been written by someone who doesn't understand or know about Transfer-Encoding and so requires a Content-Length.
Exceptions might be too extreme in this case. We don't generally raise exceptions for anything other than invalid URLs. The way we process the data and files parameters can raise exceptions but we don't special case anything there. That said, we've been talking about how poor an idea it is when users specify their own Content-Length and Host headers (among others which I'm probably forgetting). Since these aren't technically invalid practices, but rather practices we advise against, I suggest that we instead trigger a warning and then do the right thing in certain well documented situations.
Using warnings is a more gentle way to inform the user what they're doing is not advisable. It also covers the fact that so many of our users do not seem to bother to read the documentation and so the library becomes self-documenting of sorts. This also gives us the ability to change the behaviour in the future. And even better, if users want to disable the warnings, they can because python provides a way of silencing warnings. |
I actually think this is very likely the case in most of my examples. We (@zapier) attempt to upload files to over a dozen different APIs and the few of them that require Content-Length (seem to) timeout with chunked Transfer-Encoding.
I could put together a test suite of how various services respond to Content-Length/Transfer-Encoding, but I feel like even if it is incorrectly implemented APIs, that shouldn't really advise the design of python-requests. Easier still, I could just name names based on my experience fighting this for the last week, but again, if they are API/server bugs, what is the use of such information to python-requests?
Agree on default behavior, but sometimes reality trumps the "right thing" (especially when you have no control over a potentially broken server). It might be nice to document a technique to override (even if it advocates for the user to do a lot of work, like writing a custom Adapter). |
Just to clarify: by "the right thing" I do not necessarily mean the RFC thing. (In case that's what you think I meant)
I agree that misbehaving servers shouldn't inform our design.
We already know how broken the web is. Regardless, knowing whether we should delete the header after warning the user or leave it would be where that data could be useful. |
Bleh. This makes me sad. Ok, I think @sigmavirus24's plan is the best one here, at least for now. |
A few things that came up are distracting me from getting you guys some detailed data, but here is a brain dump of what I've seen: The biggest offender I've seen are API endpoints that require multipart upload: they usually need a Content-Length or they freak out on Transfer-Encoding chunked (I'm not sure which). That isn't the case everywhere, but here are the bits I can pull out of our source with a quick grep:
The reason this is fresh in my mind is we've built a custom multipart body generator that works with "lazy" files that can also calculate the length, this happened to bubble up a lot of the issues we're talking about here. Right now we're just cheating and doing The other examples are harder to grep for, but the file companies (Box, Dropbox, SugarSync, etc...) all get it and work perfectly fine with chunked encoding, so no worries there. Hopefully this shines a little more light on our real world use case. I wish I could get you more information to confirm which headers commonly cause which errors (they are usually timeouts). |
This should be easier to test now that we have a few APIs that we can reproduce against, namely Twitter. |
I have the same exact use case : uploading data to a server that does not support chunked encoding. Data length is known in advance, but does not come from a file (from a generator). |
@netheosgithub I'd argue that changing this behaviour absolutely constitutes a change in the API. Consider the current state of the API, from the documentation:
Note that the documentation doesn't say "To send a chunk-encoded request, simply provide a generator and do not set the Content-Length header." This makes this an API change in my book. Not just any API change: a bad one. The idea that setting a header changes the body of the request makes me feel deeply uncomfortable. Consider if someone asked us for a similar request whereby if they set the header I think we should try to address the root of the problem: why are people using generators in this situation? |
Currently, providing a generator and specifying a content-length is an error (it generates an invalid http request), so this use case should not be used by anybody. That's why I was thinking it would not break your user's programs. |
If only your 'invalid HTTP request' argument was valid. I wish I lived in that world. However, many many servers will surely happily accept such a combination. RFC 2616 provides this illuminating section on determining the message length:
To clarify: we are in agreement that Requests is doing the wrong thing. However, we are not in agreement that setting a |
I should also point out that any server that fails with chunked encoding is also in violation of RFC 2616:
This is not an attempt to say that Requests shouldn't fix this problem, it's simply to point out that everyone involved is violating RFC 2616. |
In issue #1895 I encountered this issue even with a file-like object. The request was a PUT, and the file-like object happened to be sys.stdin, which is in fact so file-like that it triggered chunked encoding like it does for a regular file. Providing a Content-Length (which I learned via outside means) in this situation caused HTTPAdapter.send to not do chunking like one might expect, but PreparedRequest.prepare_body still noticed that the file-like object was iterable and added a Transfer-Encoding header anyway. The server (Amazon S3) choked on that header even though the request would have otherwise worked. |
@gholms Nope.
The general expectation is that users will never explicitly set the Content-Length header. This is because Requests may end up messing with how the request body is set up. Finally, the idea that 'one might expect' that providing a Content-Length header will disable chunking isn't true either: I'd expect us to remove the header. =) |
Anyway, this discussion has gone on long enough. When provided with a lengthless iterator and a user-supplied Content-Length header, we have these options:
Any of these three options brings us into compliance with the RFC. It is clear that everyone raising this issue prefers (3). @sigmavirus24? |
I've always been of the opinion that we should blow away the header. In my experience the user thinks they know what they're doing but rarely know enough to actually know what they're doing. I generally like to let the user shoot themselves in the foot and we've done that until now and it hasn't gotten us much. In fact, we had the opportunity to blow away that header before and specifically didn't do that. That has lead to exactly this issue. Let's be logical about this: Our API and documentation has always maintained that passing an iterable with no way of measuring length means we will use chunked transfer encoding. By the spec in this case we should be blowing away the Content-Length header because some servers do not obey it. It should be ignored. We're not doing anything wrong by sending it regardless of the encoding, the server is doing the wrong thing by not ignoring it. To protect against bad servers like this we should be deleting it. If in 3.0.0 we decide to change the API such that providing the header does not chunk that's a different beast. We're not considering 3.0.0 though and that's the real issue here. What can we do to solve this issue now. What we can do is follow the spec. |
I'm inclined to agree with you @sigmavirus24. I'll see if I can get a few minutes with Kenneth at some point soon to talk this over with him. |
I'm in the same boat as @bryanhelmig and @netheosgithub, I have a generator where I know in advance what size the combined chunks will have, and have a server that does not support chunked uploads (a WSGI app, WSGI according to my research does not support chunked encoding at all). The data from the generator is too big to fit into RAM, so combining the chunks before and passing it to requests is out of the question. |
@jbaiter then what you need is to pass an object that behaves like a file with a class JBaiterStreamer(object):
def __init__(self, size, iterator):
self.size = size
self.iterator = iterator
def __len__(self):
return self.size
def read(self, *args):
try:
return next(self.iterator)
except StopIteration:
return b'' Granted I haven't tried to see if that code will work, but something along those lines should. |
Thanks @sigmavirus24 👍, I did precisely that, i was just wondering if by now there was a more elegant way to go about it |
Perhaps there's a good way of providing this via the toolbelt, eh @Lukasa ? |
@sigmavirus24 Absolutely. =) |
@Lukasa any feedback on https://gitlab.com/sigmavirus24/toolbelt/merge_requests/2 |
Streaming Iterator See https://github.com/kennethreitz/requests/issues/1648 for some more detailed reasoning. In short, if you have a iterator that you know the size of, and a server that doesn't like or accept chunked transfer-encoding, you want to use this class. It will wrap the iterator and provide a sufficiently file-like object for requests to use to stream the data. For example, ```python from requests_toolbelt.streaming_iterator import StreamingIterator import requests r = requests.post('https://httpbin.org/post', data=StreamingIterator(length, iterator)) ```
@sigmavirus24 "We're not doing anything wrong by sending it regardless of the encoding, the server is doing the wrong thing by not ignoring it." is actually now wrong: RFC 7230: "A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field." |
@ztane this is a different case than what that stack overflow question is referring to. In the future, please closely read a discussion before generating notifications for all of its participants. |
@sigmavirus24 that was caused by "transfer-Encoding: chunked, content-length set => body not chunked." And RFC 7230 forbids setting Content-Length with Transfer-Encoding. |
@ztane thank you for continuing to not read this discussion. This particular bug is about someone setting a Content-Length header by hand and expecting requests to not use a chunked upload when they provide a generator (which always triggers a chunked upload with requests). The stackoverflow link you provided and what you're talking about is a different bug that is being handled elsewhere. You're muddying this discussion with irrelevant details because we allow users to shoot themselves in the feet. For example, we allow users to specify an incorrect content-length header or invalid host header, or basically whatever they please. We don't police everything nor will we. Please focus your conversation on the correct issue in the future. |
I stumbled into this when using boto3 to do a PUT from a stream against AWS S3 (boto/botocore#911). In my case, the size of the stream is known -- it is an object from another provider. When using S3 version 2 signatures, the "Transfer-Encoding: chunked" is not supported and S3 returns the 501 error. The boto3 documentation seems to imply that setting the Content-Length would get around this issue, as they state the following:
While boto3 should do two things -- fix their documentation and ensure that "Transfer-Encoding: chunked" is not set -- for the indirect consumer of requests this behavior is hard to debug. If the Content-Length header is ignored, raising an exception would at least make it obvious what is going on. As is, it was obfuscated by the fact that S3 only returns a 501 error with an explanation that one of the headers is not supported (but not specifying the header). The suggested workaround to wrap it and provide the length works, but seems ugly. Would expanding the API to allow for toggling chunked encoding (while keeping the current behavior as default) be a palatable way forward (as opposed to using the Content-Length header as the flag)? [1] http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object |
@timuralp I remain opposed to adding a flag for this. It's really just unacceptable to have a HTTP/1.1 implementation that cannot handle chunked transfer encoding in 2016. It's been a specification requirement for so long that the first specification that required it is nearly old enough to vote in the United States of America: I don't think we can keep cutting entities slack for not doing it. From my perspective the bug here remains that we may incorrectly emit both Content-Length and Transfer-Encoding. Of course, my perspective is non-binding. ;) |
Further, if boto3 wants to forcibly override the Content-Length header, they should utilize the prepared request flow so they can remove the |
@Lukasa @sigmavirus24 fair enough -- thanks for the prompt reply. I'll continue to look to fix the boto issue in that project. |
The way I've gotten around this inability to control whether something is chunked is to control whether my POST uses the "data" or "files" piece of the method call. Seems to be working fine.
where the only difference between
|
With #3897, Requests 3.0.0 will raise an exception in this case preventing the sending of both headers. Users who wish to send their own Content-Length header will be able to modify the headers using the PreparedRequests flow. Note that Content-Length will still not work for data passed as a generator. If the user MUST specify the Content-Length, generators will need to be consumed and passed in something with a string/file-like representation. In most cases these headers should simply be left alone for automatic handling by Requests. |
Test script
Actual result
Received on the server:
Expected result
Did not expect "Transfer-Encoding: chunked" since I provided the Content-Length. If requests insists on doing chunked transfer encoding, it should disregard the content length and actually chunk the content (as it does if there is not Content-Length header given).
The text was updated successfully, but these errors were encountered: