Skip to content

Browsertrix Crawler 1.1.0 Beta 5

Pre-release
Pre-release
Compare
Choose a tag to compare
@ikreymer ikreymer released this 15 Apr 21:53
· 24 commits to main since this release
efebc33

What's Changed

  • Separate writing pages to pages.jsonl + extraPages.jsonl to use with new py-wacz by @ikreymer in #535
  • Adblock support by @ikreymer in #534
  • Remove no longer needed invalid Brave update URLs by @tw4l in #539
  • Better logging of all queue WARCWriter operations by @ikreymer in #536
  • qa: filter out non-html pages by @ikreymer in #541
  • Fix for --rolloverSize for individual WARCs in 1.x by @ikreymer in #542
  • Set mime type for html pages by @tw4l in #545

Full Changelog: v1.1.0-beta.4...v1.1.0-beta.5