New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QA Crawl Support (Beta) #469
Commits on Feb 20, 2024
-
convert driver to a class that supports crawlPage, setupPage and tear…
…downPage, instead of a single crawlPage function. setupPage / teardownPage called for when a page is created / destroyed
Configuration menu - View commit details
-
Copy full SHA for e1e7743 - Browse repository at this point
Copy the full SHA e1e7743View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c0617c - Browse repository at this point
Copy the full SHA 2c0617cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6827788 - Browse repository at this point
Copy the full SHA 6827788View commit details -
Configuration menu - View commit details
-
Copy full SHA for a00176b - Browse repository at this point
Copy the full SHA a00176bView commit details -
replace driver with ReplayCrawler subclass
keep track of page resources
Configuration menu - View commit details
-
Copy full SHA for 7cc741a - Browse repository at this point
Copy the full SHA 7cc741aView commit details -
load WACZ page list directly (via wabac.js ZipRangeReader)
crawler: add overridable _addInitialSeeds() function crawler: store archivesDir reload RWP frame if not loaded in SW after 10 secs support max replay pages via --limit store 'pageinfo' records in info.warc.gz
Configuration menu - View commit details
-
Copy full SHA for 540efeb - Browse repository at this point
Copy the full SHA 540efebView commit details -
types: fix types for WARCResourceWriter / textextract / screenshots
make skipping first N text docs configurable, set to 2 for replaycrawler, 0 by default tests: fix tests due to missing text
Configuration menu - View commit details
-
Copy full SHA for db491fc - Browse repository at this point
Copy the full SHA db491fcView commit details -
resources pageinfo, include redirects
reload timeout: track per page
Configuration menu - View commit details
-
Copy full SHA for a8869f7 - Browse repository at this point
Copy the full SHA a8869f7View commit details -
Configuration menu - View commit details
-
Copy full SHA for cefdf52 - Browse repository at this point
Copy the full SHA cefdf52View commit details -
add qa option to parseArgs, requires --replaySource but not --seeds
add 'qa' entrypoint to crawler which enables qa mode
Configuration menu - View commit details
-
Copy full SHA for 7787d8a - Browse repository at this point
Copy the full SHA 7787d8aView commit details -
diff work: add screenshot, text, and resource comparisons!
(not yet storing)
Configuration menu - View commit details
-
Copy full SHA for d833e2a - Browse repository at this point
Copy the full SHA d833e2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7b8ab4b - Browse repository at this point
Copy the full SHA 7b8ab4bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 222ef1d - Browse repository at this point
Copy the full SHA 222ef1dView commit details -
experiment with reloading page after initial load (disabled), add dee…
…pLink to allow reloading resources: filter our POST requests loading: add check for WACZ loading if resources is not available
Configuration menu - View commit details
-
Copy full SHA for 1791f16 - Browse repository at this point
Copy the full SHA 1791f16View commit details -
Configuration menu - View commit details
-
Copy full SHA for e15d25d - Browse repository at this point
Copy the full SHA e15d25dView commit details -
rename --replaySource -> --qaSource
add --qaDebugImageDiff to enable per-page crawl.png / replay.png / diff.png output support qaSource from file system (via blob), as well as URL
Configuration menu - View commit details
-
Copy full SHA for 59382a3 - Browse repository at this point
Copy the full SHA 59382a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for aca1a64 - Browse repository at this point
Copy the full SHA aca1a64View commit details
Commits on Feb 21, 2024
-
replayserver: support serving sw.js directly, make RWP version config…
…urable, using CDN version replayserver: if local file path specified, support serving local file under /source.{wacz,json}, support range requests
Configuration menu - View commit details
-
Copy full SHA for bad67a0 - Browse repository at this point
Copy the full SHA bad67a0View commit details -
replay: install RWP files directly into image on build, instead of lo…
…ading from cdn during crawl time
Configuration menu - View commit details
-
Copy full SHA for 3617bb6 - Browse repository at this point
Copy the full SHA 3617bb6View commit details
Commits on Feb 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for fb9de39 - Browse repository at this point
Copy the full SHA fb9de39View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e0d74e - Browse repository at this point
Copy the full SHA 0e0d74eView commit details
Commits on Mar 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c987424 - Browse repository at this point
Copy the full SHA c987424View commit details
Commits on Mar 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 2d85f2d - Browse repository at this point
Copy the full SHA 2d85f2dView commit details
Commits on Mar 8, 2024
-
- ensure original pageid is used for qa'd pages - use standard ':qa' key to write qa comparison data to with --qaWriteToRedis - print crawl stats in qa - include title + favicons in qa
Configuration menu - View commit details
-
Copy full SHA for c4231e5 - Browse repository at this point
Copy the full SHA c4231e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c42549 - Browse repository at this point
Copy the full SHA 5c42549View commit details -
qa: consolidate comparison data into pages data added to redis
- add pageEntryForRedis() overridable in replaycrawler to add 'comparison' data - add seperate type for ComparisonData - add comparison data for processPageInfo, if pagestate is available - additional type fixes - remove --qaWriteToRedis, now included with page data
Configuration menu - View commit details
-
Copy full SHA for 4f4f7a1 - Browse repository at this point
Copy the full SHA 4f4f7a1View commit details -
tests: add qa comparison test:
- run crawl with 3 pages, text/screenshots enabled - run qa crawl using resulting WACZ - enable writing pages to redis - verify comparison data is included in page data added to redis ':pages' key while crawl is running
Configuration menu - View commit details
-
Copy full SHA for 5a1b2a9 - Browse repository at this point
Copy the full SHA 5a1b2a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0a1018a - Browse repository at this point
Copy the full SHA 0a1018aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0abfaac - Browse repository at this point
Copy the full SHA 0abfaacView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a9ffd8 - Browse repository at this point
Copy the full SHA 3a9ffd8View commit details
Commits on Mar 11, 2024
-
support loading multi-wacz .json files locally
support parsing out the query string when detecting file type
Configuration menu - View commit details
-
Copy full SHA for d7d6558 - Browse repository at this point
Copy the full SHA d7d6558View commit details
Commits on Mar 12, 2024
-
qa crawl init: support loading pages from json file if 'pages' key is…
… specified, otherwise load from 'resources'
Configuration menu - View commit details
-
Copy full SHA for aa4ecd5 - Browse repository at this point
Copy the full SHA aa4ecd5View commit details -
disable CORS for replaycrawler (for now) to allow loading any existin…
…g WACZ from 'localhost' for replay QA
Configuration menu - View commit details
-
Copy full SHA for 8d0f411 - Browse repository at this point
Copy the full SHA 8d0f411View commit details
Commits on Mar 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for ceffad9 - Browse repository at this point
Copy the full SHA ceffad9View commit details
Commits on Mar 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 251e1b3 - Browse repository at this point
Copy the full SHA 251e1b3View commit details
Commits on Mar 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e4d8388 - Browse repository at this point
Copy the full SHA e4d8388View commit details -
Configuration menu - View commit details
-
Copy full SHA for cb435f6 - Browse repository at this point
Copy the full SHA cb435f6View commit details
Commits on Mar 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 52f80d0 - Browse repository at this point
Copy the full SHA 52f80d0View commit details -
Configuration menu - View commit details
-
Copy full SHA for aee5af5 - Browse repository at this point
Copy the full SHA aee5af5View commit details -
Configuration menu - View commit details
-
Copy full SHA for b18148b - Browse repository at this point
Copy the full SHA b18148bView commit details
Commits on Mar 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for ce2ffca - Browse repository at this point
Copy the full SHA ce2ffcaView commit details -
Configuration menu - View commit details
-
Copy full SHA for f6a7dab - Browse repository at this point
Copy the full SHA f6a7dabView commit details
Commits on Mar 22, 2024
-
tests: fix non-root user tests
- disable redis retryStrategy remove disconnect for redis to avoid unclosed handles - Dockerfile: fix permissions on downloaded files - add qa_compare test to non-root test as well - update jest to latest
Configuration menu - View commit details
-
Copy full SHA for 387e269 - Browse repository at this point
Copy the full SHA 387e269View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc5e130 - Browse repository at this point
Copy the full SHA cc5e130View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4979d86 - Browse repository at this point
Copy the full SHA 4979d86View commit details -
Configuration menu - View commit details
-
Copy full SHA for c8dc60d - Browse repository at this point
Copy the full SHA c8dc60dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3c4f552 - Browse repository at this point
Copy the full SHA 3c4f552View commit details -
Configuration menu - View commit details
-
Copy full SHA for ae9fdbe - Browse repository at this point
Copy the full SHA ae9fdbeView commit details
Commits on Mar 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for cdab557 - Browse repository at this point
Copy the full SHA cdab557View commit details -
Configuration menu - View commit details
-
Copy full SHA for a4ef485 - Browse repository at this point
Copy the full SHA a4ef485View commit details