Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookiejars exposed #6218

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

GeorgeA92
Copy link
Contributor

Aimed to fix #1878
based on suggestion from #1878 (comment)

Copy link

codecov bot commented Feb 9, 2024

Codecov Report

Merging #6218 (b8f8960) into master (6f73dc0) will increase coverage by 0.18%.
Report is 88 commits behind head on master.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6218      +/-   ##
==========================================
+ Coverage   88.48%   88.67%   +0.18%     
==========================================
  Files         160      161       +1     
  Lines       11607    11792     +185     
  Branches     1883     1912      +29     
==========================================
+ Hits        10271    10457     +186     
+ Misses       1009     1007       -2     
- Partials      327      328       +1     
Files Coverage Δ
scrapy/downloadermiddlewares/cookies.py 96.33% <100.00%> (+0.13%) ⬆️

... and 14 files with indirect coverage changes

@wRAR
Copy link
Member

wRAR commented Feb 9, 2024

I haven't read the original ticket recently, but why is this feature optional?

@GeorgeA92
Copy link
Contributor Author

GeorgeA92 commented Feb 23, 2024

This is how it works on current version of PR

script_sample.py
import scrapy
from scrapy.crawler import CrawlerProcess

class Quotes(scrapy.Spider):
    name = "quotes"; custom_settings = {"DOWNLOAD_DELAY": 1}

    def start_requests(self):
        yield scrapy.Request(url='https://quotes.toscrape.com/login', callback=self.login)

    def login(self, response):
        self.logger.info(self.cookie_jars[None]) # scrapy.http.cookies.CookieJar object
        self.logger.info(self.cookie_jars[None].jar) # http.cookiejar object

        locale_cookie = self.cookie_jars[None]._cookies["quotes.toscrape.com"]["/"].get("session")
        locale_cookie.value = locale_cookie.value.upper()
        self.logger.info(self.cookie_jars[None].jar)

if __name__ == "__main__":
    p = CrawlerProcess(); p.crawl(Quotes); p.start()
log_output (fragment)
2024-02-23 10:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/login> (referer: None)
2024-02-23 10:51:27 [quotes] INFO: <scrapy.http.cookies.CookieJar object at 0x00000217DB719B40>
2024-02-23 10:51:27 [quotes] INFO: <CookieJar[<Cookie session=eyJjc3JmX3Rva2VuIjoiSnFQQU9GTGt1amZzZ3J3UVdHeGV6WFR2UnBpY0Job1NWS3liWmxhblVISXROREVtQ2RZTSJ9.Zdhqng.8uQzjuvDfOcNJHV7luY5Na6C1N0 for quotes.toscrape.com/>]>
2024-02-23 10:51:27 [quotes] INFO: <CookieJar[<Cookie session=EYJJC3JMX3RVA2VUIJOISNFQQU9GTGT1AMZZZ3J3UVDHEGV6WFR2UNBPY0JOB1NWS3LIWMXHBLVISXROREVTQ2RZTSJ9.ZDHQNG.8UQZJUVDFOCNJHV7LUY5NA6C1N0 for quotes.toscrape.com/>]>
2024-02-23 10:51:27 [scrapy.core.engine] INFO: Closing spider (finished)

@Gallaecio Gallaecio requested a review from kmike February 23, 2024 17:14
@Gallaecio
Copy link
Member

I‘m slightly hesitant about setting a spider attribute from a middleware, and I wonder if maybe it should be set from a different place or in a different place (e.g. the crawler), bun in general I’m find with the approach.

@kmike Any thoughts on the general approach? Should @GeorgeA92 go on with tests and docs?

@kmike
Copy link
Member

kmike commented Mar 17, 2024

Hey! My main worry is the obscure API, which we'd need to document & support in the future. It'd require good documentation to explain a line like

self.cookie_jars[None]._cookies["quotes.toscrape.com"]["/"].get("session")

It also need an access to a private property (._cookies).

@GeorgeA92
Copy link
Contributor Author

Hey! My main worry is the obscure API, which we'd need to document & support in the future. It'd require good documentation to explain a line like

self.cookie_jars[None]._cookies["quotes.toscrape.com"]["/"].get("session")

It also need an access to a private property (._cookies).

Another option is to update scrapy.http.Cookies.CookieJar class to add.. more convenient way to interact with Cookiejar

class CookieJar:
def __init__(self, policy=None, check_expired_frequency=10000):
self.policy = policy or DefaultCookiePolicy()
self.jar = _CookieJar(self.policy)
self.jar._cookies_lock = _DummyLock()
self.check_expired_frequency = check_expired_frequency
self.processed = 0
def extract_cookies(self, response, request):
wreq = WrappedRequest(request)
wrsp = WrappedResponse(response)
return self.jar.extract_cookies(wrsp, wreq)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose cookiejars
4 participants