You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A follow up to #475, currently adding extra redirect seeds are not subtracted from the limit, resulting in one less page being crawled if one of the pages redirects, eg. with 1 seed and limit of 10, if that seed redirects, only 9 pages total will be crawled.
The solution is to subtract the number of 'extra seeds' from the seen list when computing the limit
The text was updated successfully, but these errors were encountered:
- subtract extraSeeds when computing limit
- don't include redirect seeds in seen list when serializing
- tests: adjust saved-state-test to also check total pages when crawl is done
fixes#508
(for 1.0.3 release)
)
- subtract extraSeeds when computing limit
- don't include redirect seeds in seen list when serializing
- tests: adjust saved-state-test to also check total pages when crawl is
done
fixes#508
(for 1.0.3 release)
sitemap improvements: gz support + application/xml + extraHops fix#511
- follow up to
#496
- support parsing sitemap urls that end in .gz with gzip decompression
- support both `application/xml` and `text/xml` as valid sitemap
content-types (add test for both)
- ignore extraHops for sitemap found URLs by setting to past extraHops
limit (otherwise, all sitemap URLs would be treated as links from seed
page)
fixes redirected seed (from #476) being counted against page limit: #509
- subtract extraSeeds when computing limit
- don't include redirect seeds in seen list when serializing
- tests: adjust saved-state-test to also check total pages when crawl is
done
fixes#508
A follow up to #475, currently adding extra redirect seeds are not subtracted from the limit, resulting in one less page being crawled if one of the pages redirects, eg. with 1 seed and limit of 10, if that seed redirects, only 9 pages total will be crawled.
The solution is to subtract the number of 'extra seeds' from the seen list when computing the limit
The text was updated successfully, but these errors were encountered: