[MRG+1] Callback kwargs #3563

elacuesta · 2019-01-03T20:54:10Z

~~This is just a first approach. It's currently lacking docs and tests, I'll add those if the implementation is good.~~ Update: added tests and docs

I see (at least) two points for discussion:

Should we also pass the same arguments to the errbacks? Or maybe add a different parameter? errback_kwargs or something like that. On the other hand, the request object is available in the failure received by the errback, failure.request.cb_kwargs gives access to the arguments, so I think it shouldn't be necessary.
I'm not a fan of the kwargs name, I think it could be easily confused with Python's own "kwargs" naming convention, i.e., people could understand that any remaining keyword argument passed to the Request constructor will be passed to the callbacks. Is "callback_kwargs" too verbose? Maybe it's not compatible with the previous point. Update Renamed to cb_kwargs

Sample spider:

import scrapy

class TestCallbackKwargsSpider(scrapy.Spider):
    name = 'callback_kwargs'

    def start_requests(self):
        data = {'a': 123, 'b': 456}
        yield scrapy.Request('https://example.org', cb_kwargs=data)

    def parse(self, response, a, b):
        yield {'url': response.url, 'a': a, 'b': b}
        yield response.follow(
            response.css('a::attr(href)').get(),
            self.parse_other,
            cb_kwargs={'source': response.url})

    def parse_other(self, response, source):
        yield {'url': response.url, 'source': source}
        yield response.follow(
            response.css('a::attr(href)').get(),
            self.parse_regular)

    def parse_regular(self, response):
        yield {'url': response.url}

Output:

(...)
2019-01-03 17:40:38 [scrapy.core.engine] INFO: Spider opened
2019-01-03 17:40:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-01-03 17:40:38 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-01-03 17:40:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://example.org> (referer: None)
2019-01-03 17:40:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://example.org>
{'url': 'https://example.org', 'a': 123, 'b': 456}
2019-01-03 17:40:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.iana.org/domains/reserved> from <GET http://www.iana.org/domains/example>
2019-01-03 17:40:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.iana.org/domains/reserved> (referer: None)
2019-01-03 17:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.iana.org/domains/reserved>
{'url': 'https://www.iana.org/domains/reserved', 'source': 'https://example.org'}
2019-01-03 17:40:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.iana.org/> (referer: https://www.iana.org/domains/reserved)
2019-01-03 17:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.iana.org/>
{'url': 'https://www.iana.org/'}
2019-01-03 17:40:40 [scrapy.core.engine] INFO: Closing spider (finished)
(...)

Tasks

/cc @kmike @dangra

codecov · 2019-01-03T21:36:45Z

Codecov Report

Merging #3563 into master will decrease coverage by 0.05%.
The diff coverage is 34.78%.

@@            Coverage Diff            @@
##           master   #3563      +/-   ##
=========================================
- Coverage   85.46%   85.4%   -0.06%     
=========================================
  Files         169     169              
  Lines        9666    9682      +16     
  Branches     1440    1443       +3     
=========================================
+ Hits         8261    8269       +8     
- Misses       1157    1166       +9     
+ Partials      248     247       -1

Impacted Files	Coverage Δ
scrapy/utils/reqser.py	`94.11% <ø> (ø)`	⬆️
scrapy/http/response/__init__.py	`93.44% <ø> (ø)`	⬆️
scrapy/http/response/text.py	`97.84% <ø> (ø)`	⬆️
scrapy/core/scraper.py	`88.51% <100%> (ø)`	⬆️
scrapy/http/request/__init__.py	`100% <100%> (ø)`	⬆️
scrapy/commands/parse.py	`20.32% <11.76%> (-0.73%)`	⬇️
scrapy/core/spidermw.py	`100% <0%> (+2.46%)`	⬆️

tests/test_crawl.py

elacuesta · 2019-01-22T12:57:17Z

@dangra @kmike I'm sorry to bother you guys, but do you have any comments on this?

ejulio

@elacuesta great work here 👏
Looks good to me, but I left some comments.

Besides those:

kwargs does not sound good, it seems to be Request's kwargs and not the callback
I like callback_kwargs
The same kwargs for the callback should be used for the failure one (keep compatible with meta)

Maybe, the best solution would be using partial functions, but IMHO, they'd look verbose and not pythonic. Unless you know some library to help with partial functions 😄

scrapy/http/request/__init__.py

tests/spiders.py

tests/test_crawl.py

elacuesta · 2019-03-15T22:12:04Z

Yeah I like callback_kwargs more than just kwargs, I'll better change it now to save time before we get close to 1.7 :-)
There were some comments about partial in the original issue, but it didn't seem to work. I think using the Deferred's API for this is the most natural choice, passing keyword arguments to a callback is precisely what we need.

elacuesta · 2019-03-15T22:33:16Z

I decided to go with cb_kwargs instead of callback_kwargs. It's less verbose, and it's also already in use within Scrapy (https://docs.scrapy.org/en/latest/topics/spiders.html?highlight=cb_kwargs#scrapy.spiders.Rule)

docs/topics/request-response.rst

kmike · 2019-03-27T21:50:37Z

tests/spiders.py

@@ -28,6 +28,45 @@ def closed(self, reason):
        self.meta['close_reason'] = reason


+class KeywordArgumentsSpider(MockServerSpider):


Could you please add a test which checks how Scrapy behaves if there is a mismatch between parameters parse accept and parameters passed via callback kwargs, e.g. some required argument is missing, or an extra argument is passed? Is exception raised? Does traceback look good (no need to write a test for it)? Is errback called?

Another case which could be nice to check explicitly is how default values are handled.

Added tests for defaults and argument mismatch.
Errback is not called with this current implementation, it does if we add the callback and the errback to the Deferred in two separate steps, i.e.

dfd.addCallback(request.callback or spider.parse, **request.cb_kwargs) dfd.addErrback(request.errback)

instead of

dfd.addCallbacks( callback=request.callback or spider.parse, errback=request.errback, callbackKeywords=request.cb_kwargs)

In any case, the Request object is not bounded to the Failure received by the errback. Personally, I don't think calling the errback would be appropriate here, since it's not an error with the request/response itself, but with the code that handles it. The logged error is very descriptive, and similar to the one that currently appears when, for instance, a callback does not take a second positional argument (TypeError: parse() takes 1 positional argument but 2 were given).

kmike · 2019-03-27T22:00:43Z

By searching our docs for "meta" it is possible to find a few other places where docs may need an update:

examples at https://docs.scrapy.org/en/latest/topics/debug.html?highlight=meta
it'd be nice to mention this new feature in addition to meta in https://docs.scrapy.org/en/latest/topics/leaks.html?highlight=meta; example is better to be updated as well;
example at https://docs.scrapy.org/en/latest/topics/jobs.html?highlight=meta
scrapy parse command supports --meta - do we need to support arguments as well, to make transition smooth? (https://docs.scrapy.org/en/latest/topics/commands.html?highlight=meta)
Docs for Request callback argument may mention this feature as well (technically it is still correct though)
Request.replace docs only say that meta is copied, while the cb_kwargs behavior is the same

I could miss something - it'd be great to check all cases "meta" is mentioned somewhere.

kmike · 2019-03-27T22:02:47Z

All other things equal, I'd prefer either kwargs or callback_kwargs over cb_kwargs, but the fact we use this name already in Rule is a good argument to go with cb_kwargs name.

elacuesta · 2019-03-29T01:56:23Z

Updated/replaced several occurences of Request.meta accros the docs.
I'm not sure why the Codecov check is failing, I added a test for the --cb_kwargs option in the parse command 🤔

docs/topics/commands.rst

docs/topics/request-response.rst

kmike · 2019-06-24T17:29:18Z

Thanks for the work @elacuesta, and thanks for a careful review @Gallaecio and @ejulio! The PR looks great. I think we can merge it without errback kwargs support, and discuss this feature separately.

May I ask for one additional test though? It would be awesome if a middleware can change request.cb_kwargs, and changes have an effect; it is not clear if this is supported.

This feature would enable some cool use cases, mostly related to dependency injection, similar to how pytest fixtures work. For example, it'd be possible to implement this:

class MySpider(scrapy.Spider):
    # ...
    def parse(self, response):
        # ... business as usual

    def parse_page(self, response, cookiejar: scrapy.CookieJar):
        # we ask for a current cookiejar object;
        # cookie middleware inspects `callback.__annotations__`, figures out 
        # user wants to read / write current cookies, and provides 
        # (injects) cookiejar object.

or similar features - e.g. it may be useful to integrate more tightly with browsers (ask for a browser in addition to response), etc. It is not clear that such API is the way to go, but it'd be awesome if we make sure it is possible to implement, that we're not closing the door for this.

elacuesta · 2019-06-26T16:30:47Z

@kmike Added tests for downloader/spider middlewares.
That CookieJar injection idea is awesome, and not too hard to implement (https://gist.github.com/elacuesta/edfb297fdb0eaa0e5e415835c148c564), but it does require to override the default Cookies middleware because it is the only one that knows about cookies. Would you consider a PR to add the contents of the above gist to Scrapy itself? Seems like that would be a good way to address #1878. There should be no version-specific problems if we use getattr to get the annotations.

kmike · 2019-06-26T16:48:35Z

That's awesome it works @elacuesta, thanks for checking it and adding more tests!

Regarding actually implementing cookiejar feature, it may need a bit of thought. For example, @pawelmhm raised a good point in #1878 (comment): cookiejar API is not adequate, it is hard to use. So probably if we make this feature built-in, we may need to have some wrapper to make working with cookies straightforward. We may also think if that's possible to fix some other issues using the same API: e.g. how to "fork" a cookiejar (copy current cookies, but update cookiejars separately afterwards).

It seems this all needs a proposal and a separate discussion. Some starter implementation can aid discussion, but overall it looks more complex than making a PR out of gist.

It'd probably be the easiest to release this middleware as a gist snippet, or a small Python package in the meantime, so that people can start using it, while waiting for a solution in Scrapy itself.

elacuesta · 2019-06-26T17:14:19Z

Thanks! I thought about making a small package with that gist, I'll do that 🚀

mauliadi1990

Duplicate of #

elacuesta added 4 commits January 3, 2019 17:20

Passing keyword arguments to callbacks

50a0d87

Pass callback kwargs with response.follow

a2b509a

Copy request.kwargs

69a1ee7

Serialize Request kwargs

a67f1ce

elacuesta force-pushed the callback_kwargs branch from 8b9dee0 to a67f1ce Compare January 3, 2019 21:19

elacuesta added 2 commits January 9, 2019 10:40

Test request kwargs (copy, serialization)

770a501

Test callback kwargs

57e7c76

elacuesta commented Jan 9, 2019

View reviewed changes

tests/test_crawl.py Outdated Show resolved Hide resolved

elacuesta changed the title ~~[WIP] Callback kwargs~~ Callback kwargs Jan 15, 2019

Add Request.kwargs docs

bddfeab

elacuesta force-pushed the callback_kwargs branch from 75c2285 to bddfeab Compare January 15, 2019 22:15

kmike added this to the v1.7 milestone Jan 29, 2019

ejulio reviewed Mar 15, 2019

View reviewed changes

scrapy/http/request/__init__.py Outdated Show resolved Hide resolved

tests/spiders.py Outdated Show resolved Hide resolved

tests/test_crawl.py Outdated Show resolved Hide resolved

Count keyword argument checks

645e8d1

Rename Request.kwargs to Request.cb_kwargs

6760bca

elacuesta force-pushed the callback_kwargs branch from f8c9e3b to 6760bca Compare March 15, 2019 22:42

Gallaecio reviewed Mar 22, 2019

View reviewed changes

docs/topics/request-response.rst Outdated Show resolved Hide resolved

docs/topics/request-response.rst Outdated Show resolved Hide resolved

docs/topics/request-response.rst Outdated Show resolved Hide resolved

elacuesta added 2 commits March 27, 2019 14:42

[Doc] Update cb_kwargs example

8528f50

Merge remote-tracking branch 'upstream/master' into callback_kwargs

c43a231

kmike reviewed Mar 27, 2019

View reviewed changes

Callback kwargs: more tests

70a4d93

elacuesta force-pushed the callback_kwargs branch from 0ccca1a to 70a4d93 Compare March 28, 2019 16:28

elacuesta added 2 commits March 28, 2019 14:16

Update docs about cb_kwargs and meta

3efe3be

Add cb_kwargs option to the parse command

e8af633

Request.cb_kwargs: Update docs

8fb0776

Gallaecio reviewed Mar 29, 2019

View reviewed changes

elacuesta added 4 commits March 29, 2019 14:03

parse command: rename cb_kwargs option to cbkwargs

f5e0b6b

Update docs about cb_kwargs and meta

ccb56a3

parse command: update docs about passing callback keyword arguments

294ef51

parse command: improve option description

0522fe3

This was referenced Jun 25, 2019

Expose cookiejars #1878

Open

Scrapy "session" extension #3258

Open

Merge remote-tracking branch 'origin/master' into callback_kwargs

428309b

kmike mentioned this pull request Jun 26, 2019

Check what responses can the callback receive before calling it #3618

Open

elacuesta added 3 commits June 26, 2019 12:31

Move request.cb_kwargs tests to their own test file

1f9f41b

Request.cb_kwargs: update in downloader middleware

d4d68cf

Request.cb_kwargs: update in spider middleware

312e573

kmike changed the title ~~Callback kwargs~~ [MRG+1] Callback kwargs Jun 26, 2019

kmike approved these changes Jun 26, 2019

View reviewed changes

dangra merged commit 3adf09b into scrapy:master Jun 26, 2019

elacuesta deleted the callback_kwargs branch June 26, 2019 17:14

mauliadi1990 reviewed Jun 26, 2019

View reviewed changes

kmike mentioned this pull request Jul 10, 2019

API of Request and Rule #666

Closed

elacuesta mentioned this pull request May 27, 2020

eb_kwargs for errback #4598

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Callback kwargs #3563

[MRG+1] Callback kwargs #3563

elacuesta commented Jan 3, 2019 •

edited

codecov bot commented Jan 3, 2019 •

edited

elacuesta commented Jan 22, 2019

ejulio left a comment

elacuesta commented Mar 15, 2019

elacuesta commented Mar 15, 2019

kmike Mar 27, 2019 •

edited

elacuesta Mar 28, 2019

kmike commented Mar 27, 2019

kmike commented Mar 27, 2019

elacuesta commented Mar 29, 2019

kmike commented Jun 24, 2019 •

edited

elacuesta commented Jun 26, 2019 •

edited

kmike commented Jun 26, 2019

elacuesta commented Jun 26, 2019

mauliadi1990 left a comment

		@@ -28,6 +28,45 @@ def closed(self, reason):
		self.meta['close_reason'] = reason


		class KeywordArgumentsSpider(MockServerSpider):

[MRG+1] Callback kwargs #3563

[MRG+1] Callback kwargs #3563

Conversation

elacuesta commented Jan 3, 2019 • edited

codecov bot commented Jan 3, 2019 • edited

Codecov Report

elacuesta commented Jan 22, 2019

ejulio left a comment

Choose a reason for hiding this comment

elacuesta commented Mar 15, 2019

elacuesta commented Mar 15, 2019

kmike Mar 27, 2019 • edited

Choose a reason for hiding this comment

elacuesta Mar 28, 2019

Choose a reason for hiding this comment

kmike commented Mar 27, 2019

kmike commented Mar 27, 2019

elacuesta commented Mar 29, 2019

kmike commented Jun 24, 2019 • edited

elacuesta commented Jun 26, 2019 • edited

kmike commented Jun 26, 2019

elacuesta commented Jun 26, 2019

mauliadi1990 left a comment

Choose a reason for hiding this comment

elacuesta commented Jan 3, 2019 •

edited

codecov bot commented Jan 3, 2019 •

edited

kmike Mar 27, 2019 •

edited

kmike commented Jun 24, 2019 •

edited

elacuesta commented Jun 26, 2019 •

edited