event_scraper

Web Scraper to scrape Gig & Event Data

This crawler was created using the Python web scraping library - Scrapy.

Dependencies:

As there are a number of dependencies for Scrapy, I recommend using a virtual environment.

You can set this up by creating a directory and 'cd'-ing into that directory and running:

$ virtualenv songkickenv

Then drag the unzipped file bundle that includes these files into the newly created directory.

From the root of the newly created directory we can install all of the required dependencies for the virtual environment using the requirements.txt that I have stored in the zip:

$ pip install -r requirements.txt

You are now ready to start using the web crawler!

Data Output:

The data that has been scraped from the web page provided can be obtained in either csv or json format. (All of the following commands can be run from anywhere in the scrapy project repository)

To retrieve the output in json format run the following command:

$ scrapy crawl events -o events.json

To retrieve the output in csv format run the following command:

$ scrapy crawl events -o events.csv

Notes:

The following files are auto-generated by the scrapy library upon initiation and can be ignored by anyone marking this test:

middlewares.py
pipelines.py
any init.py files

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
event_crawler		event_crawler
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

event_crawler

event_crawler

README.md

README.md

requirements.txt

requirements.txt

scrapy.cfg

scrapy.cfg

Repository files navigation

event_scraper

Dependencies:

Data Output:

Notes:

About

Releases

Packages

Languages

tomxelliott/event_scraper

Folders and files

Latest commit

History

Repository files navigation

event_scraper

Dependencies:

Data Output:

Notes:

About

Resources

Stars

Watchers

Forks

Languages