Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply sitemap_filter only if sitemap type is urlset #6033

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

majid-vaghari
Copy link

Here's sitemap_filter documentation:

This is a filter function that could be overridden to select sitemap entries based on their attributes.

I wanted to create a sitemap_filter to limit requests made, but this takes into account urls from sitemap index files.

I believe this contradicts with documentation of sitemap_filter and this function should be applied on sitemaps of type urlset.

I fixed the issue and made a pull request, hope it helps.

@Gallaecio
Copy link
Member

Gallaecio commented Aug 31, 2023

This would break existing code, and you can already filter by type within the filtering function. So, if the problem is that the documentation is not clear, I think that is what needs fixing.

@majid-vaghari
Copy link
Author

I wanted to use this function to limit the number of requests. I tried doing it using CloseSpider exception but spider keeps downloading pages (which doesn't make sense in my opinion) then I used this function, this function doesn't make sense either. I don't know what to do 😂

@Gallaecio
Copy link
Member

Gallaecio commented Sep 4, 2023

If your filter method does not yield a URL, it is impossible that such a URL gets followed. No need to raise CloseSpider, returning early from the filtering function is enough, right? I feel like I am missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants