New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAX-based sitemap parser #497
Commits on Mar 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5075cbe - Browse repository at this point
Copy the full SHA 5075cbeView commit details
Commits on Mar 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for daff9ef - Browse repository at this point
Copy the full SHA daff9efView commit details -
support sitemap index parsing, nested sitemap parsing
better error handling
Configuration menu - View commit details
-
Copy full SHA for 9553d42 - Browse repository at this point
Copy the full SHA 9553d42View commit details -
Configuration menu - View commit details
-
Copy full SHA for aa02067 - Browse repository at this point
Copy the full SHA aa02067View commit details -
refactor to use single queue for nested sitemaps
continue fetching sitemaps async include nsted sitemaps queued count in logging store if sitemap parsing was finished in redis, include in save/load, don't reparse if fully parsed
Configuration menu - View commit details
-
Copy full SHA for f98f338 - Browse repository at this point
Copy the full SHA f98f338View commit details
Commits on Mar 18, 2024
-
support passing in pageLimit, interrupting additional parsing when li…
…mit is hit. when at limit, don't report any errors, close xml stream and pass end event to the root
Configuration menu - View commit details
-
Copy full SHA for af6e65d - Browse repository at this point
Copy the full SHA af6e65dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b61f7b - Browse repository at this point
Copy the full SHA 3b61f7bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8470687 - Browse repository at this point
Copy the full SHA 8470687View commit details -
Configuration menu - View commit details
-
Copy full SHA for e2024f5 - Browse repository at this point
Copy the full SHA e2024f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for c39f4bd - Browse repository at this point
Copy the full SHA c39f4bdView commit details -
support parsing sitemap from robots.txt - if sitemap url
ends in /robots.txt, parse as text
Configuration menu - View commit details
-
Copy full SHA for 7e9fe50 - Browse repository at this point
Copy the full SHA 7e9fe50View commit details -
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Configuration menu - View commit details
-
Copy full SHA for 17d4a65 - Browse repository at this point
Copy the full SHA 17d4a65View commit details -
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Configuration menu - View commit details
-
Copy full SHA for 2fe9e19 - Browse repository at this point
Copy the full SHA 2fe9e19View commit details -
if just --sitemap/--useSitemap given, then - first try parsing <seed>/robots.txt - then try parsing <seed>/sitemap.xml if sitemap url specified, then: - fetch and detect content type, and parse as either xml or robots.txt based on extension and content-type
Configuration menu - View commit details
-
Copy full SHA for 621f620 - Browse repository at this point
Copy the full SHA 621f620View commit details -
Configuration menu - View commit details
-
Copy full SHA for a21979d - Browse repository at this point
Copy the full SHA a21979dView commit details
Commits on Mar 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 199fd4e - Browse repository at this point
Copy the full SHA 199fd4eView commit details -
tests: add sitemap-parse-text for testing auto-detection, with limits…
… and specific URL
Configuration menu - View commit details
-
Copy full SHA for cb18d64 - Browse repository at this point
Copy the full SHA cb18d64View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2497189 - Browse repository at this point
Copy the full SHA 2497189View commit details