New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Place Content-encoding header into response's flags is desired #5290
base: master
Are you sure you want to change the base?
Conversation
# Conflicts: # scrapy/downloadermiddlewares/httpcompression.py # tests/test_downloadermiddleware_httpcompression.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviving these changes!
Most of my feedback is probably about code you might not even have written yourself, and feedback that I failed to provide to the previous author, so apologies beforehand for that.
- HttpCompressionMiddleware constructor deprecation warning - refactor process_response - put COMPRESSION_KEEP_ENCODING_HEADER in settings.py.tmpl - test HttpCompressionMiddleware init - test COMPRESSION_KEEP_ENCODING_HEADERS presence - ensure b'decoded' flag is set if decoded
in addition to the previous comments:
|
Please note that the tests failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is similar to the one I had before. In this case, the "decoded" flag does not actually say which was the value of the original header.
Since I wrote the above comment, #1877 was solved, so I think now an alternative would be to add a response metadata attribute to store this value (or at a minimum, store content-encoding=<value>
as a flag).
Furthermore, I see no reason for the flag to be bytes
, existing flags are all str
.
Codecov Report
@@ Coverage Diff @@
## master #5290 +/- ##
==========================================
+ Coverage 88.56% 88.58% +0.02%
==========================================
Files 163 163
Lines 10635 10648 +13
Branches 1812 1815 +3
==========================================
+ Hits 9419 9433 +14
Misses 939 939
+ Partials 277 276 -1
|
…anges in test_engine.py
- amend tests & logic to follow the new default of COMPRESSION_KEEP_ENCODING_HEADER=True - remove redundant condition
I couldn't figure out the previous build error
I've amended my changes to touch less files, while in there I spotted the mutation on the original response content encoding array and decided to remove it. I think if the following build fails in a similar fashion I'll need a hand |
|
@Gallaecio thank you, will keep an eye out for when that's merged |
Closing and reopening for a fresh CI run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@T0shik Thanks again for your work. I am very sorry it took me so long to get back to this.
My concern is similar to the one I had before. In this case, the "decoded" flag does not actually say which was the value of the original header. Since I wrote the above comment, #1877 was solved, so I think now an alternative would be to add a response metadata attribute to store this value (or at a minimum, store
content-encoding=<value>
as a flag).
I don’t think we need this, since after this change it is possible to access the original header.
Furthermore, I see no reason for the flag to be
bytes
, existing flags are allstr
.
While I agree, I think we should keep it as bytes for backward compatibility. We could add it as both, and deprecate somehow the bytes
version (wrapping the set to detect access by key?), but I think that would be a non-trivial change and I think it may be best to keep it out of this pull request.
if not stats: | ||
warnings.warn( | ||
"HttpCompressionMiddleware now accepts a 'stats' parameter which should be specified.", | ||
ScrapyDeprecationWarning, | ||
) | ||
if settings: | ||
self.keep_encoding_header = settings.getbool('COMPRESSION_KEEP_ENCODING_HEADER') | ||
if not self.keep_encoding_header: | ||
warnings.warn( | ||
"Setting COMPRESSION_KEEP_ENCODING_HEADER=False is deprecated", | ||
ScrapyDeprecationWarning, | ||
) | ||
else: | ||
self.keep_encoding_header = False | ||
warnings.warn( | ||
"HttpCompressionMiddleware now accepts a 'settings' parameter which should be specified.", | ||
ScrapyDeprecationWarning, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add stacklevel=2
to the first and last warn()
calls, so that the warning becomes about the super().__init__()
line that is missing the corresponding parameters.
Also, I don’t agree with what @elacuesta suggested earlier about changing the warning message in the middle. Precisely because of what he explains between parenthesis (“when not passing settings at all also stops being supported, I guess”), I find the current message confusing, because a user that did not set the setting to any value will now be told not to set it to False
, which the user is already not doing. I think the previous message was better. Maybe we could give it additional context for clarification instead:
warnings.warn(
"The default value of COMPRESSION_KEEP_ENCODING_HEADER, "
"False, is deprecated, and will stop working and stop "
"being its default value in a future version of Scrapy. "
"Set COMPRESSION_KEEP_ENCODING_HEADER=True in your "
"settings to remove this warning.",
ScrapyDeprecationWarning,
stacklevel=2,
)
@@ -37,6 +37,7 @@ | |||
COMMANDS_MODULE = '' | |||
|
|||
COMPRESSION_ENABLED = True | |||
COMPRESSION_KEEP_ENCODING_HEADER = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be False
for backward compatibility. It should only be set to True
in settings.py.tmpl
.
Accordingly, we modify self.assertIn(b'gzip', newresponse.headers['Content-Encoding'])
and similar in the non-setting tests to use assertNotIn
instead and remove the follow-up line that checks the header value.
Fixes #1988
Continuation of #4025 due to abandonment.
additional changes:
'tests/ignores.txt'
inconftest.py
to rely onPath
unittest
assertionscreate_spider_mw
and make config test specific (can change this if needed)Resolves #4025