Skip to content
Andy Jackson edited this page Nov 27, 2020 · 20 revisions

The primary goal of this project to provide full-text search for our web archives. To achieve this, the warc-indexer component is used to parse the (W)ARC files and, for each resource, it posts a record into one or more Apache Solr servers. We then use client facing tools that allow researchers to query the Solr index and explore the collections.