Streaming WARC/ARC library for fast web archive IO
-
Updated
May 27, 2024 - Python
Streaming WARC/ARC library for fast web archive IO
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Core Python Web Archiving Toolkit for replay and recording of web archives
Create "perfect" snapshots of web pages
Browse emulated browsers connected to old web sites in your browser!
Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr
minimalistic crawler
Send records from an EPrints server to the Internet Archive and other web archives
Parse And Create Web ARChive (WARC) files with node.js
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
Add-On for Google Sheets to help those working with web archives.
Create Robust Links from within Zotero
A Python utility for publishing a social media story built from archived web pages to multiple services.
A service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (mementos).
A collection of the scripts and notebooks I wrote as part of my Data Science Bootcamp capstone project
宁波凯思奥教育科技有限公司
This module builds our Waybacks in the various different configurations we require.
Add a description, image, and links to the web-archives topic page so that developers can more easily learn about it.
To associate your repository with the web-archives topic, visit your repo's landing page and select "manage topics."