method to extract sitemaps #5

MothOnMars · 2018-02-20T02:13:17Z

It would be very useful to have a method to extract the sitemaps listed in a robots.txt file, per the sitemaps specification: https://www.sitemaps.org/protocol.html#submit_robots

Example usage:
http://www.nytimes.com/robots.txt

Sitemap: http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/sitemap.xml.gz
Sitemap: http://www.nytimes.com/sitemaps/sitemap_news/sitemap.xml.gz
Sitemap: http://spiderbites.nytimes.com/sitemaps/sitemap_video/sitemap.xml.gz
Sitemap: http://spiderbites.nytimes.com/sitemaps/www.nytimes.com_realestate/sitemap.xml.gz
Sitemap: http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/2016_election_sitemap.xml.gz

> robotex = Robotex.new
> robotex.sitemaps('http://www.nytimes.com')
=> ["http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/sitemap.xml.gz",
 "http://www.nytimes.com/sitemaps/sitemap_news/sitemap.xml.gz",
 "http://spiderbites.nytimes.com/sitemaps/sitemap_video/sitemap.xml.gz",
 "http://spiderbites.nytimes.com/sitemaps/www.nytimes.com_realestate/sitemap.xml.gz",
 "http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/2016_election_sitemap.xml.gz"]

The text was updated successfully, but these errors were encountered:

MothOnMars mentioned this issue Feb 20, 2018

add #sitemaps method #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

method to extract sitemaps #5

method to extract sitemaps #5

MothOnMars commented Feb 20, 2018

method to extract sitemaps #5

method to extract sitemaps #5

Comments

MothOnMars commented Feb 20, 2018