Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue with incorrect number of total pages if any of the seeds is a redirect #1649

Merged
merged 1 commit into from Apr 4, 2024

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Apr 4, 2024

Following changes in webrecorder/browsertrix-crawler#475, webrecorder/browsertrix-crawler#509, the crawler adds a redirected seed to the seen list. To account for this, it needs to be subtracted to get the total page count.

To test, run a crawl with page limit (eg. 3) and a seed that redirects (eg. www.webrecorder.net).
Before PR, setting a limit of 3 will result in 3/4 pages, with this fix, it should say 3/3.

… a redirect

following changes in webrecorder/browsertrix-crawler#475, webrecorder/browsertrix-crawler#509, the crawler adds a redirected seed
to the seen list. To account for this, it needs to be subtracted to get the actual page count.
@ikreymer ikreymer requested a review from tw4l April 4, 2024 22:22
Copy link
Contributor

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, working well!

@ikreymer ikreymer merged commit 5c08c96 into main Apr 4, 2024
4 checks passed
@ikreymer ikreymer deleted the redirect-seeds-page-count branch April 4, 2024 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants