Skip to content
felipecsl edited this page Jul 31, 2012 · 1 revision

Wombat 2.0.0 added a new property type called :follow. This is a special type of property that is able to click through links and navigate to new pages.

Example, again with GitHub. Here, we are clicking all the links in the "GitHub" section of the footer in the homepage and grabbing the header of each page that they point to:

Wombat.crawl do
  base_url "https://www.github.com"
  path "/"

  the_company 'xpath=//ul[@class="footer_nav"][1]//a', :follow do
    heading 'css=h1'
  end
end

Outputs:

{
  "the_company"=>[
    {"heading"=>"GitHub helps people build software together."}, 
    {"heading"=>nil}, 
    {"heading"=>"Features"}, 
    {"heading"=>"Contact GitHub"}, 
    {"heading"=>"GitHub Training — Git Training from the Experts"},
    {"heading"=>"GitHub on Your Servers"}, 
    {"heading"=>"Loading..."}
  ]
}

By default, Wombat will only follow 1 level deep links. This means that, even if in the page we just clicked through has any links that match the follow selector, they won't be clicked anyway. Next version of Wombat will add the ability to specify a custom depth option that will tell how many levels deep it should keep crawling.