Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling CloseSpider exception if it has been raised in start_requests() #6148

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

cakemd
Copy link
Contributor

@cakemd cakemd commented Nov 16, 2023

Proposal for issue #3463.

I assume it's not the cleanest approach, so I'm opened for change requests after review.

I was trying to modify the ExecutionEngine.open_spider in a way that the start_requests's exception would be called inside it. But it looked that it would take too much changes, so a lot of stuff can be broken.

Still, I'm opened to any ideas of how to make this change better.

Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Merging #6148 (c5ff304) into master (42b3a3a) will increase coverage by 0.07%.
The diff coverage is 80.00%.

❗ Current head c5ff304 differs from pull request most recent head 7db2755. Consider uploading reports for the commit 7db2755 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6148      +/-   ##
==========================================
+ Coverage   88.55%   88.63%   +0.07%     
==========================================
  Files         159      159              
  Lines       11582    11593      +11     
  Branches     1885     1887       +2     
==========================================
+ Hits        10257    10276      +19     
+ Misses        996      990       -6     
+ Partials      329      327       -2     
Files Coverage Δ
scrapy/crawler.py 88.09% <100.00%> (+0.17%) ⬆️
scrapy/core/engine.py 85.34% <62.50%> (-0.32%) ⬇️

... and 3 files with indirect coverage changes

@cakemd
Copy link
Contributor Author

cakemd commented Nov 16, 2023

Still working on it, got failed tests after some changes.

@cakemd
Copy link
Contributor Author

cakemd commented Nov 16, 2023

@Gallaecio @wRAR could you please take a look if have time?

if self.slot is not None:
raise RuntimeError("Engine slot is already assigned. Use self.close_spider")

self.start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct because start() returns a Deferred, but also is it OK to call it here at all if we are closing anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wRAR
I didn't place it at beginning. But to common close_spider method has several checks for engine being started. Not sure we need to have use it though, just think that the code inside it is useful.

I've tried to avoid it, but neither of approaches I've considered allowed to get the proper result. The only other option I see so far is to simply set close reason to spider and leave it in piece.

How do you think to do it better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case if you call it you need to wait until it completes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants