web-archive

Here are 33 public repositories matching this topic...

webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

kubernetes cloud archiving warc web-archiving webrecorder web-archive wacz

Updated May 24, 2024
TypeScript

meadowingc / waybacker

Star

Periodically crawl a set of websites and ensure that all of their pages are archived on the Wayback Machine. Mirror of https://codeberg.org/meadowingc/waybacker

blogging web-archive

Updated May 22, 2024
Go

jskherman / web-clips

Star

An archive site of some webpages on the Internet created with the help of the SingleFile extension.

html jekyll jekyll-site static-site archive web-archive singlefile

Updated May 18, 2024
CSS

thiagolopes / alexandria

Star

Backup and save websites

web-archive

Updated May 17, 2024
Python

s5-dev / archiver

Star

Tool to archive websites and other content available on the Internet on the content-addressed S5 Network

git http youtube twitch web selfhosted archive archiver content-addressed web-archive bluesky atproto

Updated May 2, 2024
Dart

webrecorder / replayweb.page

Sponsor

Star

Serverless replay of web archives directly in the browser

service-worker warc web-archiving wayback-machine web-archive replay-web-page web-replay wacz

Updated May 24, 2024
TypeScript

jskherman / SingleFile-Archives

Star

Pages saved with the SingleFile browser extension.

html archive web-archive singlefile

Updated Apr 23, 2024
HTML

dosyago / DownloadNet

Star

💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!

disk internet archive archiver web-browsing web-archive

Updated Apr 22, 2024
JavaScript

wayback-if-down / wayback-if-down.github.io

Star

Redirect to a live website or an archived version if it's down.

redirect wayback-machine web-archive

Updated Apr 16, 2024
HTML

webis-de / archive-query-log

Star

📜 The Archive Query Log.

information-retrieval internet-archive serp wayback-machine web-archive query-log search-engine-result-page information-retrieval-history

Updated May 21, 2024
Jupyter Notebook

KaineRecycler / YouTube-Content-Archive

Star

YouTube Content Archive Database

youtube youtube-api web-archive

Updated Mar 18, 2024
Python

q-m / replayweb.page-docker

Star

Docker image for ReplayWeb.page

web-archiving web-archive replay-web-page web-replay

Updated Mar 14, 2024
Dockerfile

ArtificialOSS / WebCrawl

Star

Crawls the web to generate a huge dataset for training

crawler ai artificial-intelligence dataset-generation commoncrawl web-archive

Updated Jan 24, 2024
Python

minch-dev / DownTheMoon

Star

A continuation of legacy XUL version of DownThemAll! ✔️preserves web.archive.org timestamps, ✔️advanced filters for remote directory tree mirroring, ✔️UI is tweaked for better UX