Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
-
Updated
May 21, 2024 - TypeScript
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Makes saving pages in bulk to the wayback machine much easier
A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.
The repository and website hosting the peer review process for new Programming Historian lessons
ODU Web Science and Digital Libraries Research Group (WS-DL) home page.
Run a high-fidelity browser-based crawler in a single Docker container
🗄️ A simple CLI for converting WARC to Parquet.
Really hacky proof of concept http archival using mitmproxy
🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation
Streaming WARC/ARC library for fast web archive IO
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
A Memento Aggregator CLI and Server in Go
Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.
Core Python Web Archiving Toolkit for replay and recording of web archives
Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.
To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."