Skip to content

tomxelliott/event_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

event_scraper

Web Scraper to scrape Gig & Event Data

This crawler was created using the Python web scraping library - Scrapy.

Dependencies:

As there are a number of dependencies for Scrapy, I recommend using a virtual environment.

You can set this up by creating a directory and 'cd'-ing into that directory and running:

$ virtualenv songkickenv

Then drag the unzipped file bundle that includes these files into the newly created directory.

From the root of the newly created directory we can install all of the required dependencies for the virtual environment using the requirements.txt that I have stored in the zip:

$ pip install -r requirements.txt 

You are now ready to start using the web crawler!

Data Output:

The data that has been scraped from the web page provided can be obtained in either csv or json format. (All of the following commands can be run from anywhere in the scrapy project repository)

To retrieve the output in json format run the following command:

$ scrapy crawl events -o events.json

To retrieve the output in csv format run the following command:

$ scrapy crawl events -o events.csv

Notes:

The following files are auto-generated by the scrapy library upon initiation and can be ignored by anyone marking this test:

  • middlewares.py
  • pipelines.py
  • any init.py files

About

Web Scraper to scrape Gig & Event Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages