Scrapybara

Ruby library providing DSL for describing custom Web scrapers.

Dependencies

PhantomJS — for installation please refer PhantomJS download page.

Installation

git clone https://github.com/OPLZZ/scrapybara.git
cd scrapybara

... install the required rubygems:

bundle install

Usage

You can take a look into examples directory for few annotated examples:

job-it.rb

This definition goes through http://www.job-it.cz portal with implemented RDFa markup described in Recipe for marking-up job posting in RDFa, converts identified RDFa markup into JSON-LD and prints job posting title and employment type.
idnes.rb

This definition prints first two pages of comments of articles listed on main page http://zpravy.idnes.cz

Each definition can be executed by scrapybara binary from bin directory

For example:

bundle exec ./bin/scrapybara examples/job-it.rb

If you run definition in interactive mode, code execution is paused for each #fetch, #extract part of the definition in Pry session. When code is paused, you can debug everything you can do in Pry session.

To continue, press CTRL+D.

To exit, type exit! and press enter.

bundle exec ./bin/scrapybara examples/job-it.rb --interactive

You can run each definition in browser by overriding Capybara driver from command line.

bundle exec ./bin/scrapybara examples/job-it.rb --driver selenium

You can combine previous options, so you can have interactive mode with selenium driver.

bundle exec ./bin/scrapybara examples/job-it.rb --driver selenium --interactive

You can enable debug for interactive mode to add even more breakpoints when definition is executed

bundle exec ./bin/scrapybara examples/job-it.rb --debug --interactive

##Funding The project No. CZ.1.04/5.1.01/77.00440 was funded from the European Social Fund through the Operational Programme Human Resources and Employment and the state budget of Czech Republic.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bin		bin
examples		examples
lib		lib
test		test
.gitignore		.gitignore
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.markdown		README.markdown
Rakefile		Rakefile
config.ru		config.ru
scrapybara.gemspec		scrapybara.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

examples

examples

lib

lib

test

test

.gitignore

.gitignore

Gemfile

Gemfile

LICENSE.txt

LICENSE.txt

README.markdown

README.markdown

Rakefile

Rakefile

config.ru

config.ru

scrapybara.gemspec

scrapybara.gemspec

Repository files navigation

Scrapybara

Dependencies

Installation

Usage

About

Releases

Packages

Contributors 2

Languages

License

OPLZZ/scrapybara

Folders and files

Latest commit

History

Repository files navigation

Scrapybara

Dependencies

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Languages