Usage

A Python script to scrape URLs from a webpage and archive them to the Wayback machine. Uses Beautiful Soup to parse a page for anchor tags and then saves them using the Archive.org API. The name is supposed to be scrap.py as in scrapping plus python.

Usage

Clone or ZIP this repo. Install the modules mentioned in requirements.txt using pip install -r requirements.txt. Then run the script in your terminal and follow the screen instructions.

IMPORTANT: time.sleep(5) delays archival of each URL for 5 seconds. This is to avoid overloading the API with excess requests, due to which sometimes the server refuses the connection. A healthy gap between each request prevents that.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
index.py		index.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

index.py

index.py

requirements.txt

requirements.txt

Repository files navigation

Usage

About

Releases

Packages

Languages

4rnv/Scrappy

Folders and files

Latest commit

History

Repository files navigation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages