Skip to content

wxning1107/Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Golang_crawler

Crawler is a distributed web crawler written in golang without using any crawler framework.

It is a personal project 😄, starting from zero using the native code to build a distributed crawler system.

The main purpose is to deeply understand the concurrency mechanism of golang and the design idea of the distributed system.

Introduction

单任务版爬虫
并发版爬虫

Features

  • The breadth-first algorithm framework,embedded data crawling and the information extraction is applied to implement the basic crawler task.

  • Utilize the natural advantages of Go in concurrency to achieve the distribution and scheduling of crawler tasks to achieve concurrent requirements.

  • Using rpc to separate and be independent of concurrent tasks in a single task version to implement distributed crawlers.

  • Using Docker+ElasticSearch to build a data storage backend, using the Go template library for data display

About

go语言从零搭建分布式网络爬虫

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages