Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve how combining DEFAULT_REQUEST_HEADERS with the Referer middleware is handled #6184

Open
Gallaecio opened this issue Dec 22, 2023 · 2 comments
Labels

Comments

@Gallaecio
Copy link
Member

Currently, setting the Referer header through the DEFAULT_REQUEST_HEADERS setting has no effect, because the Referer spider middleware effectively prevents the default request header downloader middleware from redefining the header (since setdefault is used by both, and the referer spider middleware runs first).

I kind of expected the opposite to happen, for DEFAULT_REQUEST_HEADERS to take priority.

I think we should either:

  • Let DEFAULT_REQUEST_HEADERS take priority.
  • Log a warning if Referer is defined in DEFAULT_REQUEST_HEADERS without also disabling the referer spider middleware.
@kmike
Copy link
Member

kmike commented Dec 25, 2023

What's the use case of setting Referrer through DEFAULT_REQUEST_HEADERS?

@Gallaecio
Copy link
Member Author

Gallaecio commented Dec 26, 2023

I think on some websites setting a Referer header with a static value like a Google URL can improve your success rate, but it is a good question, usually Referer makes more sense on a per-request basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants