Skip to content

wcygan/crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawler

A web crawler written in Rust.

This crawler creates a web graph by exploring all URLs that it finds.

Design

The crawler is split into two parts:

  1. The connection pool
  2. The parser pool

The crawler will spin up as many connections & parsers as you specify.

The connection pool will handle all HTTP requests, while the parser pool will handle all HTML parsing.

Requests to the same domain are rate limited to avoid being blocked by the server.

The URL mapping is written to an index which can be written to disk during shutdown.

Resources