New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve cookie handling #5431
Comments
I am looking forward to participate in gsoc 22 .can i work on this issue @Gallaecio |
Sounds great! Make sure you check out the links in #5408 |
Ok |
hey @Gallaecio Is it okay If I do part of it (not for GSoC)? I would like to work on Providing a user-friendly API to interact with cookiejars. |
@OrestisKan Sounds great. |
@Gallaecio Hello! a colleague and I are interested in working on allowing cookie storage, however we would need a bit more clarification what this entails. Specifically, the documentation at https://github.com/scrapedia/scrapy-cookies/blob/master/docs/topics/settings.rst#cookies_storage suggests cookie storage is indeed available, and we were wondering what could our contribution look like? |
@atatabitovska That documentation link you provide is not for Scrapy itself, but for a third-party plugin, so Scrapy itself does not support cookie storage. I would usually not suggest reimplementing in Scrapy something that can be handled with a plugin. However, the plugin seems unmaintained, and I think basic cookie storage capabilities may be worth maintaining in upstream Scrapy. And I am slightly worried that improvements to cookie handling may break it directly or indirectly. |
Hi @Gallaecio as part of GSOC I'm going to be implementing the following:
I will email you my proposal document tommorow. I know I am late, but I am determined to be able to contribute. If the deadline for GSOC passes I can do it on my own. I have made a trivial contribution before: #5442 |
It is enough to meet that requirement, yes (and, in any case, getting the pre-application pull request merged is not a requirement). |
There are different aspects of cookie handling in Scrapy that we should improve. This issue aims to centralize a set of improvements that could be addressed as part of a Google Summer of Code project.
Implement the latest standard of cookies, the one web browsers use.
This cannot be done with the Python standard library, as its cookie implementation does not comply with the latest standards of cookies. We should look for Python libraries that do or, if none fit the bill, build our own Python library for modern cookie handling.
As part of this implementation, we should build a comprehensive set of tests covering all aspects of cookie handling. There is a draft pull request of initial work on this front.
Related standards:
Let the
Cookie
header of a request be processed just the same as thecookies
parameter ofRequest
.Reported at Cookies from the Cookie request header are not processed #1992, which we fixed, but we had to revert the fix due to undesired side effects; there is however a new draft pull request to address the issue for good.
Allow users to decide, for a request and its response, any combination of the following:
cookies
parameter ofRequest
or theCookie
header) should be included in the request cookiejar.Some of the combinations are already possible, e.g. by the use of the
dont_merge_cookies
andcookiejar
request metadata keys. We should extend support to the rest of scenarios, and make sure we document all scenarios properly.Related issues: Cookies not set when dont_merge_cookies is True #2124, Setting a cookie for a different domain does not work #5841
Provide a user-friendly API to interact with cookiejars.
Related issues: Allow copying existing cookiejar for request.meta['cookiejar'] #1448, Expose cookiejars #1878, WIP: CookiesMiddleware: add "reset_cookies" meta to clear the jar #2986
Allow cookie storage
A single setting to define the storage path may be enough.
Related plugin: https://github.com/scrapedia/scrapy-cookies
Provide a separate documentation page or section about cookies, to make it easier for users to learn how to make use of the different cookie-related Scrapy features that are currently only documented through separate API reference segments (meta keys, settings, Request input parameters).
Note: A Google Summer of Code project around this idea does not need to cover everything here. Choose a subset of the work that fits the time you plan to spend on Google Summer of Code and your estimations for the work. It is better to overestimate the time it will take you to complete tasks, and to have stretch goals to spend the extra time after project completion, than to underestimate and fail to achieve your goals on time.
The text was updated successfully, but these errors were encountered: