#

web-archives

Here are 25 public repositories matching this topic...

k12stemaker / k12stemaker.github.io

宁波凯思奥教育科技有限公司

scratch maker stem web-archives kidscoding k12education ningbocorporation

Updated Apr 19, 2019
HTML

oduwsdl / offtopic-goldstandard-data

Data for testing the Offtopic detection software

dataset memento web-archiving web-archives offtopic

Updated Mar 6, 2018
Python

tigercosmos / web-archives

Web Archives Collection System

Updated May 22, 2019
Python

caltechlibrary / eprints2archives

Send records from an EPrints server to the Internet Archive and other web archives

python terminal archiving internet-archive memento web-archiving preservation web-archives eprints

Updated May 15, 2023
Python

wsdookadr / femtocrawl

minimalistic crawler

firefox crawler offline scraping http-archive warc zim web-archives

Updated Oct 15, 2023
Python

helgeho / Tempas2ArchiveSpark

ArchiveSpark DataSpec to analyze the Internet Archive's Web archive through temporal search results returned by Tempas (v2)

information-retrieval web-archiving temporal archivespark web-archives

Updated Dec 12, 2017
Scala

web-archive-group / wadl2017

WADL2017 Web Archive Group team papers

latex wadl web-archives jcdl

Updated Jun 23, 2017
TeX

N0taN3rd / node-cdxj

Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js

webarchive web-archives webarchiving cdxj

Updated Jul 20, 2017
JavaScript

nchylak / capstone-project

A collection of the scripts and notebooks I wrote as part of my Data Science Bootcamp capstone project

predictive-analytics web-archives credit-risk-assessment

Updated Aug 7, 2019
Jupyter Notebook

sebastian-nagel / warc-crawler

Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr

elasticsearch solr apache-storm warc web-archives warc-files stormcrawler

Updated Nov 24, 2023
FLUX

hrbrmstr / cdx

🕸 Query Web Archive Crawl Indexes ('CDX')

r rstats cdx web-archives r-cyber

Updated Sep 1, 2018
R

bhouston1982 / staticPages-webArchives

Python scripts to generate static navigation pages from collection list and insert Web Archives records using the Archive-It CDX

ead web-archives

Updated Apr 20, 2017
Python

ukwa / waybacks

This module builds our Waybacks in the various different configurations we require.

warc web-archiving webarchive web-archives

Updated Jun 30, 2018
Java

MementoEmbed

oduwsdl / MementoEmbed

A service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (mementos).

docker flask thumbnails embed memento web-archives surrogate social-cards

Updated Nov 15, 2021
HTML

Zotero-Robust-Links-Extension

lanl / Zotero-Robust-Links-Extension

Create Robust Links from within Zotero

memento references web-archives link-rot reference-rot

Updated May 10, 2022
JavaScript

raintale

oduwsdl / raintale

A Python utility for publishing a social media story built from archived web pages to multiple services.

social-media storytelling web-archives webarchives surrogates mementos

Updated Dec 15, 2021
Python

zytedata / web-snap

Create "perfect" snapshots of web pages

javascript web-archiving web-archives playwright capture-page

Updated Jan 17, 2024
JavaScript

ukwa / ukwa-gsheets-utils

Add-On for Google Sheets to help those working with web archives.

google-sheets web-archives webarchives google-sheets-addon

Updated Oct 18, 2022
JavaScript

archivesunleashed / notebooks

Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.

spark python3 notebooks web-archives pyspark-notebook juypter-notebook

Updated Dec 5, 2022
Jupyter Notebook

ukwa / ukwa-ui

A new user interface for the UK Web Archive

web-archiving web-archives

Updated Apr 10, 2024
Java

Improve this page

Add a description, image, and links to the web-archives topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-archives topic, visit your repo's landing page and select "manage topics."