Skip to content

xchengyu/Web_Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web_Crawler

Simple web crawler

1 . Objective

I work with a simple web crawler to measure aspects of a crawl, study the characteristics of the crawl, download web pages from the crawl and gather webpage metadata, all from pre-selected news websites.

2 . Preliminaries

To begin I will make use of an existing open source Java web crawler called crawler4j. This crawler is built upon the open source crawler4j library which is located on github. For complete details on downloading and compiling see https://github.com/yasserg/crawler4j Also see the following document for help installing Eclipse and crawler4j http://www-scf.usc.edu/~csci572/2017Spring/hw2/Crawler4jinstallation.pdf

About

Simple web crawler

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages