Skip to content

OPLZZ/scrapybara

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrapybara

Ruby library providing DSL for describing custom Web scrapers.

Dependencies

PhantomJS — for installation please refer PhantomJS download page.

Installation

git clone https://github.com/OPLZZ/scrapybara.git
cd scrapybara

... install the required rubygems:

bundle install

Usage

You can take a look into examples directory for few annotated examples:

Each definition can be executed by scrapybara binary from bin directory

For example:

bundle exec ./bin/scrapybara examples/job-it.rb

If you run definition in interactive mode, code execution is paused for each #fetch, #extract part of the definition in Pry session. When code is paused, you can debug everything you can do in Pry session.

To continue, press CTRL+D.

To exit, type exit! and press enter.

bundle exec ./bin/scrapybara examples/job-it.rb --interactive

You can run each definition in browser by overriding Capybara driver from command line.

bundle exec ./bin/scrapybara examples/job-it.rb --driver selenium

You can combine previous options, so you can have interactive mode with selenium driver.

bundle exec ./bin/scrapybara examples/job-it.rb --driver selenium --interactive

You can enable debug for interactive mode to add even more breakpoints when definition is executed

bundle exec ./bin/scrapybara examples/job-it.rb --debug --interactive

##Funding Project of Operational Programme Human Resources and Employment No. CZ.1.04/5.1.01/77.00440. The project No. CZ.1.04/5.1.01/77.00440 was funded from the European Social Fund through the Operational Programme Human Resources and Employment and the state budget of Czech Republic.

About

Ruby library providing DSL for describing custom Web scrapers

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages