webScraping

Getting some web scraping scripts for a friend's project

AlphabetSoup.py cointains the main crawler.

The spider starts by going to the A universes then getting all the links and storing them in an array. the spider visits all the links in the array. If there are even more series listed here, another array is created and those links are stored.

Now that all of the links that will contain the series data are gathered, each site will be visited and each item's related text written to an excel file. While this is happening, robot parser is being checked and waits are called.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
__pycache__		__pycache__
README.md		README.md
alphabetSoup.py		alphabetSoup.py
firstDemo.py		firstDemo.py
scrapeSeriesPages.py		scrapeSeriesPages.py
sendEmail.py		sendEmail.py
writeXlsx.py		writeXlsx.py
zSnippets.py		zSnippets.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

pycache

pycache

README.md

README.md

alphabetSoup.py

alphabetSoup.py

firstDemo.py

firstDemo.py

scrapeSeriesPages.py

scrapeSeriesPages.py

sendEmail.py

sendEmail.py

writeXlsx.py

writeXlsx.py

zSnippets.py

zSnippets.py

Repository files navigation

webScraping

About

Releases

Packages

Languages

hnsvill/webScraping

Folders and files

Latest commit

History

Repository files navigation

webScraping

About

Resources

Stars

Watchers

Forks

Languages