Skip to content
This repository has been archived by the owner on Apr 4, 2018. It is now read-only.

alphagov/govuk_spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GOV.UK Spider

This is a web spider which will crawl all of the pages on www.gov.uk that it can find and expose information relating to status responses and referrers.

Requirements

You will need:

  • Python 2.6 (or greater)
  • Scrapy v0.14.4

To install the project dependencies you can use pip with the following command:

pip install -r requirements.txt

Running the spider

Once you have the relevant dependencies you can view a list of the available spiders by running:

scrapy list

And then run a specific spider with:

scrapy crawl insert_spider_name

A (local) JSON output can be created using the following options:

scrapy crawl insert_spider_name -o items.json -t jsonlines

Releases

No releases published

Packages

No packages published