Support robots.txt Sitemaps (plural!) discovery #112

Abdull · 2022-10-10T16:34:36Z

The robots.txt standard allows for declaring the location of sitemaps (plural!), e.g. for https://www.nytimes.com/robots.txt :

# ....
User-Agent: omgili
Disallow: /

User-agent: ia_archiver
Disallow: /

Sitemap: https://www.nytimes.com/sitemaps/new/news.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/sitemap.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/collections.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/video.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/cooking.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/recipe-collects.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/regions.xml
Sitemap: https://www.nytimes.com/sitemaps/new/best-sellers.xml
Sitemap: https://www.nytimes.com/sitemaps/www.nytimes.com/2016_election_sitemap.xml.gz
Sitemap: https://www.nytimes.com/elections/2018/sitemap
Sitemap: https://www.nytimes.com/wirecutter/sitemapindex.xml

It would be great if sitemapper allowed to process URLs to robots.txt in order to transiently return all Sitemap URLs.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support robots.txt Sitemaps (plural!) discovery #112

Support robots.txt Sitemaps (plural!) discovery #112

Abdull commented Oct 10, 2022

Support robots.txt Sitemaps (plural!) discovery #112

Support robots.txt Sitemaps (plural!) discovery #112

Comments

Abdull commented Oct 10, 2022