Releases: webrecorder/browsertrix-crawler
Releases · webrecorder/browsertrix-crawler
Browsertrix Crawler v1.1.1
What's Changed
- Avoid crashes when editing / creating profile and navigation is interrupted
- profiles: ensure all page.goto() promises have at least catch block/a… by @ikreymer in #559
- profiles: ensure initial page.load() is awaited by @ikreymer in #561
Full Changelog: v1.1.0...v1.1.1
Browsertrix Crawler v1.1.0
Major Features
Support for QA Crawling (https://crawler.docs.browsertrix.com/user-guide/qa/)
What's Changed
- QA Crawl Support (Beta) by @ikreymer in #469
- Use RFC2606 invalid domain names by @vnznznz in #514
- Fixes from 1.0.3 release -> main by @ikreymer in #517
- Unify WARC writing + CDXJ indexing into single class by @ikreymer in #507
- upgrade puppeteer-core to 22.6.1 by @ikreymer in #516
- avoid cloudflare detection of puppeteer when using browser profiles: by @ikreymer in #518
- add an extra --postLoadDelay param to specify how many seconds to wait after page-load by @ikreymer in #520
- Gracefully handle non-absolute path for create-login-profile --filename by @tw4l in #521
- Make /app world-readable to better support non-root usage by @vnznznz in #523
- merge V1.0.4 change -> main: by @ikreymer in #527
- Revert "Make /app world-readable to better support non-root usage" by @ikreymer in #529
- ensure all warcwriter write operations go through a queue. by @ikreymer in #528
- qa/replay crawl loading improvements by @ikreymer in #526
- Separate writing pages to pages.jsonl + extraPages.jsonl to use with new py-wacz by @ikreymer in #535
- Adblock support by @ikreymer in #534
- Remove no longer needed invalid Brave update URLs by @tw4l in #539
- Better logging of all queue WARCWriter operations by @ikreymer in #536
- qa: filter out non-html pages by @ikreymer in #541
- Fix for --rolloverSize for individual WARCs in 1.x by @ikreymer in #542
- Set mime type for html pages by @tw4l in #545
- allow minio to connect to other regions by @mguella in #543
- replay counts: don't filter out URLs with __wb_method to avoid dispar… by @ikreymer in #552
- Add crawler QA docs by @tw4l in #551
- Support site-specific wait via browsertrix-behaviors by @ikreymer in #555
- warcinfo: fix version to 1.1 to avoid confusion (part of #553) by @ikreymer in #557
New Contributors
Full Changelog: v1.0.4...v1.1.0
Browsertrix Crawler 1.1.0 Beta 5
What's Changed
- Separate writing pages to pages.jsonl + extraPages.jsonl to use with new py-wacz by @ikreymer in #535
- Adblock support by @ikreymer in #534
- Remove no longer needed invalid Brave update URLs by @tw4l in #539
- Better logging of all queue WARCWriter operations by @ikreymer in #536
- qa: filter out non-html pages by @ikreymer in #541
- Fix for --rolloverSize for individual WARCs in 1.x by @ikreymer in #542
- Set mime type for html pages by @tw4l in #545
Full Changelog: v1.1.0-beta.4...v1.1.0-beta.5
v1.1.0-beta.4
What's Changed
- Gracefully handle non-absolute path for create-login-profile --filename by @tw4l in #521
- refactor handling of max size for html/js/css by @ikreymer in #525
- merge V1.0.4 change -> main: by @ikreymer in #527
- ensure all warcwriter write operations go through a queue. by @ikreymer in #528
- qa/replay crawl loading improvements by @ikreymer in #526
Full Changelog: v1.1.0-beta.3...v1.1.0-beta.4
Browsertrix Crawler v1.0.4
What's Changed
- refactor handling of max size for html/js/css by @ikreymer in #525
Fix for #522, issues loading pages with large streaming js/css
Full Changelog: v1.0.3...v1.0.4
Browsertrix Crawler 1.1.0 Beta 3 (QA Support)
What's Changed
- Use RFC2606 invalid domain names by @vnznznz in #514
- Fixes from 1.0.3 release -> main by @ikreymer in #517
- Unify WARC writing + CDXJ indexing into single class by @ikreymer in #507
- upgrade puppeteer-core to 22.6.1 by @ikreymer in #516
- avoid cloudflare detection of puppeteer when using browser profiles: by @ikreymer in #518
- add an extra --postLoadDelay param to specify how many seconds to wait after page-load by @ikreymer in #520
Full Changelog: v1.1.0-beta.2...v1.1.0-beta.3
Browsertrix Crawler 1.0.3
Browsertrix Crawler 1.1.0 Beta 2 (QA Crawl Support Beta)
What's Changed
- Docs: Minor fixes to edit link & clarifications by @Shrinks99 in #501
- Improved support for running as non-root by @ikreymer in #503
- improvements to 'non-graceful' interrupt to ensure WARCs are still closed gracefully by @ikreymer in #504
- service worker capture fix: disable by default for now by @ikreymer in #506
- QA Crawl Support (Beta) by @ikreymer in #469
New Contributors
- @Shrinks99 made their first contribution in #501
Full Changelog: v1.1.0-beta.1...v1.1.0-beta.2
Browsertrix Crawler 1.0.2
What's Changed
- service worker capture fix: disable service workers by default for now, add cli option by @ikreymer in #506
Full Changelog: v1.0.1...v1.0.2
Browsertrix Crawler 1.0.1
What's Changed
- Docs: Minor fixes to edit link & clarifications by @Shrinks99 in #501
- Improved support for running as non-root by @ikreymer in #503
- improvements to 'non-graceful' interrupt to ensure WARCs are still closed gracefully by @ikreymer in #504
New Contributors
- @Shrinks99 made their first contribution in #501
Full Changelog: v1.0.0...v1.0.1