Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httpcompression middleware: warn of unexpected and prevent manual unupported encodings #6145

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

cakemd
Copy link
Contributor

@cakemd cakemd commented Nov 13, 2023

This change is made for solving issue #4697.

PR #4697 is used in terms of fast understanding where the code base must be changed. Please, inform me if this PR must be to that branch first and not directly to master of scrapy/scrapy.

Please, not that in initial version of this PR, the body becomes empty bytes in case of b"br" is used in response but not supported by executing machine. Please, inform me if it's not correct approach.

Will be happy to get review.

Fixes #4697, closes #5480

Copy link

codecov bot commented Nov 13, 2023

Codecov Report

Merging #6145 (872468a) into master (b4acf5c) will decrease coverage by 0.02%.
Report is 1 commits behind head on master.
The diff coverage is 81.81%.

❗ Current head 872468a differs from pull request most recent head fc2b410. Consider uploading reports for the commit fc2b410 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6145      +/-   ##
==========================================
- Coverage   88.59%   88.57%   -0.02%     
==========================================
  Files         159      159              
  Lines       11582    11599      +17     
  Branches     1885     1891       +6     
==========================================
+ Hits        10261    10274      +13     
- Misses        994      996       +2     
- Partials      327      329       +2     
Files Coverage Δ
scrapy/downloadermiddlewares/httpcompression.py 87.05% <81.81%> (-2.65%) ⬇️

@cakemd cakemd force-pushed the feature/unsupported_middleware_encoding_behaviour branch from 7dc4408 to 62f4c2d Compare November 13, 2023 14:13
@cakemd cakemd force-pushed the feature/unsupported_middleware_encoding_behaviour branch from 62f4c2d to fcbc7fa Compare November 13, 2023 14:19
@@ -17,6 +18,9 @@
from typing_extensions import Self

ACCEPTED_ENCODINGS: List[bytes] = [b"gzip", b"deflate"]
logger = logging.getLogger(__name__)

ACCEPTED_ENCODINGS = [b"gzip", b"deflate"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a merge error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I'm working on some more changes. Many tests got broken, fixed that already. Polishing next commit now.

@@ -85,6 +91,19 @@ def process_response(

return response

def _raise_unsupported_compressors(self, request: Request):
encodings = request.headers.getlist("Accept-Encoding")
unsupported = [key for key in encodings if key not in ACCEPTED_ENCODINGS]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to split the header value.

if b"br" in ACCEPTED_ENCODINGS:
body = brotli.decompress(body)
else:
body = bytes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
body = bytes()
body = b''

Alternatively, keep it compressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, was still thinking what to return. Will change.

@wRAR
Copy link
Member

wRAR commented Nov 13, 2023

So, this raises exceptions for requests with unsupported encodings declared in Accept-Encoding. I'm not sure if this is the best course of action. In the previous issues/PRs there were also such options suggested as logging a warning and removing the unsupported encodings from the header.

In any case, the warning/error logged/raised should clearly explain what is the problem in trhe case of brotli and zstd: that the user should install a module to get it supported.

@wRAR wRAR added the discuss label Nov 13, 2023
@cakemd
Copy link
Contributor Author

cakemd commented Nov 14, 2023

@wRAR we can change raising an error to logging error message, not a problem. I just followed @Gallaecio 's issue description.

I've changed PR already, so it fires exception only if it's b"br" manually added by user and not supported by machine.
(Initially, I added raising error for any not supported encoding, but I faced failed tests caused by some encodings that are not present in ACCEPTED _ENCODINGS but are in use).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn about br handling if brotlipy is not installed
3 participants