You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously, there was the main WARCWriter as well as utility
WARCResourceWriter that was used for screenshots, text, pageinfo and
only generated resource records. This separate WARC writing path did not
generate CDX, but used appendFile() to append new WARC records to an
existing WARC.
This change removes WARCResourceWriter and ensures all WARC writing is done through a single WARCWriter, which uses a writable stream to append records, and can also generate CDX on the fly. This change is a
pre-requisite to the js-wacz conversion (#484) since all WARCs need to
have generated CDX.
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Improvements for 1.0.0 branch of crawler:
/tmp-cdx
rather than reindexing from WARCS--generateCDX
fromtemp-cdx/
rather than having to reindex from the WARCs/tmp-cdx
after no longer neededThe text was updated successfully, but these errors were encountered: