Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

Setting up a scraping server

Ed Finkler edited this page Apr 6, 2013 · 3 revisions

Setting up a scrapyd server

Super duper quick notes on getting a scrapyd server running with the Open Recipes project.

This was performed on Ubuntu Server 12.10.

On the server (SERVERNAME)

  1. Add the scrapy apt repo to sources.list

     nano /etc/apt/sources.list
    

    add the following to the end of the file:

    deb http://archive.scrapy.org/ubuntu quantal main
    

    Save and exit.

  2. Add the GPG key for the Scrapy apt repo, and install scrapyd:

     curl -s http://archive.scrapy.org/ubuntu/archive.key | sudo apt-key add -
     aptitude update
     aptitude install scrapyd-0.16
    
  3. Open port 6800 in firewall if not open.

  4. Install pip and the bleach library for Python

     apt-get install python-pip
     pip install bleach
    

On client/deploy machine

  1. Visit http://SERVERNAME:6800/ to check that it's running.

  2. Add/edit the file ~/.scrapy.cfg. Enter the following:

    [deploy:openrecipestest]
    url = http://SERVERNAME:6800/
    
  3. From within openrecipes/scrapy_proj, run

     scrapy deploy openrecipestest -p openrecipes
    

Kicking Off Jobs

Starting a Single Job

curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider=thepioneerwoman.feed

Running All Feed Jobs

scrapy list | grep .feed | xargs -I {} -p curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider={}

Running All Non-Feed Jobs

scrapy list | grep -v .feed | xargs -I {} -p curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider={}