Web Scraper to scrape Gig & Event Data
This crawler was created using the Python web scraping library - Scrapy.
As there are a number of dependencies for Scrapy, I recommend using a virtual environment.
You can set this up by creating a directory and 'cd'-ing into that directory and running:
$ virtualenv songkickenv
Then drag the unzipped file bundle that includes these files into the newly created directory.
From the root of the newly created directory we can install all of the required dependencies for the virtual environment using the requirements.txt that I have stored in the zip:
$ pip install -r requirements.txt
You are now ready to start using the web crawler!
The data that has been scraped from the web page provided can be obtained in either csv or json format. (All of the following commands can be run from anywhere in the scrapy project repository)
To retrieve the output in json format run the following command:
$ scrapy crawl events -o events.json
To retrieve the output in csv format run the following command:
$ scrapy crawl events -o events.csv
The following files are auto-generated by the scrapy library upon initiation and can be ignored by anyone marking this test:
- middlewares.py
- pipelines.py
- any init.py files