You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Browsertrix Crawler, we are already generating CDXJ indices per-WARC, so it would be faster to use these existing indices rather than indexing from the WARCs again. We are proposing adding a --cdxj CLI argument that can pass a directory of existing CDXJ files, similar to how --pages already works.
I have a PR in progress, just needs a bit more testing, will submit shortly. Thanks!
The text was updated successfully, but these errors were encountered:
Related to webrecorder/browsertrix-crawler#484
In Browsertrix Crawler, we are already generating CDXJ indices per-WARC, so it would be faster to use these existing indices rather than indexing from the WARCs again. We are proposing adding a
--cdxj
CLI argument that can pass a directory of existing CDXJ files, similar to how--pages
already works.I have a PR in progress, just needs a bit more testing, will submit shortly. Thanks!
The text was updated successfully, but these errors were encountered: