Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing wrap_loader_context() #49

Open
wRAR opened this issue Oct 27, 2021 · 2 comments
Open

Optimizing wrap_loader_context() #49

wRAR opened this issue Oct 27, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@wRAR
Copy link
Member

wRAR commented Oct 27, 2021

When working on a loader-heavy project I found that a lot of various inspect calls (and just a lot of various function calls) is done for every field of every loader. wrap_loader_context() calls get_func_args() for each processor which in turn does the most of the aforementioned things.

My tests show that a simple @lru_cache(1024) for get_func_args() is enough and that the cache in this case will contain one or several items for each processor used by the spider (so 1024 should be enough, though if it isn't, the cache will fail to give the performance improvements so this is debatable).

Another option would be changing get_value() and get_output_value() so that wrap_loader_context() isn't called this often, but get_value() takes the processors as input values.

@wRAR wRAR added the enhancement New feature or request label Oct 27, 2021
@wRAR
Copy link
Member Author

wRAR commented Jan 21, 2022

Related: scrapy/scrapy#2889

@soundofspace
Copy link

Any reason to not merge a lru cache implementation? I came here after I noticed huge performance decrease for some spiders after updating to v1.1.0 from v1.0.6.

Some metrics for the most affected spider I could find:

Version items/second
v1.0.6 350
v1.1.0 120
v1.1.0 with lru 550
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants