Releases · webrecorder/browsertrix-crawler

16 Feb 23:20

ikreymer

v1.0.0-beta.3

46eb02d

Browsertrix Crawler 1.0.0 Beta 3 Pre-release

Pre-release

What's Changed

Add arg to write pages to Redis by @tw4l in #464
Page Resources: Include Cached Resources by @ikreymer in #465

Full Changelog: v1.0.0-beta.2...v1.0.0-beta.3

Contributors

ikreymer and tw4l

Assets 2

17 Jan 22:48

ikreymer

v1.0.0-beta.2

298deac

Browsertrix Crawler 1.0.0 Beta 2 Pre-release

Pre-release

What's Changed

Bump puppeteer-core to ^20.8.2 to patch vulnerability by @tw4l in #459
Generate urn:pageinfo: records by @ikreymer in #458
skipping resources: ensure HEAD, OPTIONS, 204, 206, and 304 response/request pairs are not written to WARC by @ikreymer in #460

Full Changelog: 1.0.0-beta.1...v1.0.0-beta.2

Contributors

ikreymer and tw4l

Assets 2

17 Jan 22:13

ikreymer

v0.12.4

cd3a1b0

Browsertrix Crawler 0.12.4

What's Changed

Bump puppeteer-core to ^20.8.2 to patch vulnerability by @tw4l in #459

Full Changelog: v0.12.3...v0.12.4

Contributors

tw4l

Assets 2

03 Jan 09:01

ikreymer

1.0.0-beta.1

db2dbe0

Browsertrix Crawler 1.0.0 Beta 1 Pre-release

Pre-release

What's Changed

logging: don't log filtered out direct fetch attempt as error by @ikreymer in #432
Fix potential for pending list never being processed by @ikreymer in #433
more specific types additions by @ikreymer in #434
Backport pending list never being reprocessed by @ikreymer in #438
Add types + validation for log context options by @ikreymer in #435
Bump sharp from 0.32.1 to 0.32.6 by @dependabot in #443
add timeout to final awaitPendingClear() by @ikreymer in #442
WARC filename prefix + rollover size + improved 'livestream' / truncated response support. by @ikreymer in #440
detect invalid custom behaviors on load: by @ikreymer in #450
Merge 0.12.3 into 1.0.0 by @ikreymer in #455

New Contributors

@dependabot made their first contribution in #443

Full Changelog: 1.0.0-beta.0...1.0.0-beta.1

Contributors

ikreymer and dependabot

Assets 2

17 Nov 07:27

ikreymer

v0.12.3

c3b98e5

Browsertrix Crawler 0.12.3

Bug Fix Release: Ensure crawl doesn't get stuck indefinitely on pending requests at the end of the crawl -

What's Changed

Bump sharp from 0.32.1 to 0.32.6 by @dependabot in #443
add timeout to final awaitPendingClear() by @ikreymer in #442

Full Changelog: v0.12.2...v0.12.3

Contributors

ikreymer and dependabot

Assets 2

15 Nov 02:19

ikreymer

v0.12.2

9ba0b9e

Browsertrix Crawler 0.12.2

What's Changed

Fix for pending list never being reprocessed in some situations by @ikreymer in #438

Full Changelog: v0.12.1...v0.12.2

Contributors

ikreymer

Assets 2

10 Nov 07:55

ikreymer

1.0.0-beta.0

ab0f66a

Browsertrix Crawler 1.0.0 Beta 0 Pre-release

Pre-release

Major Changes

New recording/capture mechanism using browser CDP network traffic, instead of proxy
TypeScript conversion

What's Changed

Use new browser-based archiving mechanism instead of pywb proxy by @ikreymer in #424
TypeScript Conversion by @ikreymer in #425
Add Prettier to the repo, and format all the files! by @emma-sg in #428
follow-up to #428: update ignore files by @ikreymer in #431
Raise size limit for large HTML pages by @ikreymer in #430

Full Changelog: v0.12.1...1.0.0-beta.0

Contributors

ikreymer and emma-sg

Assets 2

03 Nov 22:18

ikreymer

v0.12.1

dd7b926

Browsertrix Crawler 0.12.1

Fixes

Optimize exclusion removal, follow-up to #408
Fix regression with --text false being rejected, while in use with Browsertrix Cloud (see: webrecorder/browsertrix#1334)

What's Changed

Exclusion Filtering Optimizations: check exclusion before loading new page + additional improvements @ikreymer in #423

Full Changelog: v0.12.0...v0.12.1

Contributors

ikreymer

Assets 2

02 Nov 18:55

ikreymer

v0.12.0

15661eb

Browsertrix Crawler 0.12.0

Major Changes

Use Brave same version of Brave for base image, instead of slightly different Chrome (amd64) and Chromium (arm64)
Support for faster cancelation of crawl via Redis key + signal
Include CRC32 in storage webhook for nested WACZ support
Dynamic exclusion addition/queue filter/removal via redis message queue
Text extraction stored in WARC records (both initial and final page after behaviors) with new --text options

What's Changed

Switch to Brave Base Image by @ikreymer in #400
Store crawler start and end times in Redis lists by @tw4l in #397
additional failure logic: by @ikreymer in #402
tests: disable ad-block tests: seeing inconsistent ci behavior by @ikreymer in #407
Fast cancelation + remove time counter by @ikreymer in #406
disable component updates by setting --component-updater to invalid URL by @ikreymer in #413
storage: also compute crc32 as part of storage webhook when uploading… by @ikreymer in #414
Support adding/removing exclusions without restarting the crawler by @ikreymer in #408
load saved state fixes + redis tests by @ikreymer in #415
Return User-Agent on all code path to set headers appropriately by @benoit74 in #420
improved text extraction: (addresses #403) by @ikreymer in #404
More flexible multi value arg parsing + README update for 0.12.0 by @ikreymer in #422

Full Changelog: v0.11.2...v0.12.0

Contributors

ikreymer, tw4l, and benoit74

Assets 2

28 Oct 01:36

ikreymer

v0.12.0-beta.2

064db52

Browsertix Crawler 0.12.0 Beta 2 Pre-release

Pre-release

What's Changed

disable component updates by setting --component-updater to invalid URL by @ikreymer in #413
storage: also compute crc32 as part of storage webhook when uploading… by @ikreymer in #414
Support adding/removing exclusions without restarting the crawler by @ikreymer in #408
load saved state fixes + redis tests by @ikreymer in #415
Return User-Agent on all code path to set headers appropriately by @benoit74 in #420

Full Changelog: v0.12.0-beta.1...v0.12.0-beta.2

Contributors

ikreymer and benoit74

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Major Changes

What's Changed

Contributors

Fixes

What's Changed

Contributors

Major Changes

What's Changed

Contributors

What's Changed

Contributors

Releases: webrecorder/browsertrix-crawler

Browsertrix Crawler 1.0.0 Beta 3

What's Changed

Contributors

Browsertrix Crawler 1.0.0 Beta 2

What's Changed

Contributors

Browsertrix Crawler 0.12.4

What's Changed

Contributors

Browsertrix Crawler 1.0.0 Beta 1

What's Changed

New Contributors

Contributors

Browsertrix Crawler 0.12.3

What's Changed

Contributors

Browsertrix Crawler 0.12.2

What's Changed

Contributors

Browsertrix Crawler 1.0.0 Beta 0

Major Changes

What's Changed

Contributors

Browsertrix Crawler 0.12.1

Fixes

What's Changed

Contributors

Browsertrix Crawler 0.12.0

Major Changes

What's Changed

Contributors

Browsertix Crawler 0.12.0 Beta 2

What's Changed

Contributors