Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

deseretdigital/crawlr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crawlr

A web spider, written in Go, that scans your site looking for pages that are missing particular string patterns.

Configuration

The configuration file is passed to the spider as its first command-line parameter. The file should contain a JSON object with the following keys:

  • startUrl
  • dropHttps
  • allowedDomains
  • rewriteDomains
  • filteredUrls
  • droppedParameters
  • requiredPatterns

An example JSON configuration file is included in the repo.

Options

You can also change certain operational details of the spider through the use of command-line parameters:

  • -h : print list of accepted parameters
  • -d : how deep to spider the site (default:2)
  • -n : maximum concurrent requests (default:1)
  • -s : show live stats (default:false)
  • -v : verbose output (default:false)

Output

As the spider progresses, it will print out the URL's of pages that didn't contain the required patterns, as well as the list of patterns that weren't found.

About

Web spider that verifies that every page on your site contains certain string patterns.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages