twfy-scraper

Description

A Python web scraper that connects to the TheyWorkForYou scraped .xml index for Hansard parliamentary debates. The web scraper collates a list of debates after a given year and proceeds to download and save each of the .xml files in the list.

Development of this web scraper was necessary after numerous unsuccessful attempts to utilise the TheyWorkForYou API to gather parliamentary debate data. Further attempts using ajparsons's Python binding (twfy-python, https://github.com/ajparsons/twfy-python) were also unsuccessful - while the binding could return data using the .getMPs function call, the .getDebates function call failed to garner a response from the server.

Usage

Modify line 13's variable according to limit (by year) the number of files you want to download from the website.

Roadmap

Currently, the web scraper must compile the list of .xml files each time it is run. Ideally, I would like this list to be output to a .txt file - if the first result of the query is the same as the first file in the .txt file, compilation of the list can be skipped.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
twfy_scraper.py		twfy_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

twfy_scraper.py

twfy_scraper.py

Repository files navigation

twfy-scraper

Description

Usage

Roadmap

About

Releases

Packages

Languages

Abaqueues/twfy-scraper

Folders and files

Latest commit

History

Repository files navigation

twfy-scraper

Description

Usage

Roadmap

About

Topics

Resources

Stars

Watchers

Forks

Languages