Skip to content

aecio/gp4c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GP4C: Genetic Programming for Crawling

The quality of a Web search engine is influenced by several factors, including coverage and the freshness of the content gathered by the web crawler. Focusing particularly on freshness, one key challenge is to estimate the likelihood of a previously crawled webpage being modified. Such estimates are used to define the order in which those pages should be visited, and thus, can be exploited to reduce the cost of monitoring crawled webpages for keeping updated versions.

GP4C is genetic programming based framework to generate score functions that produce accurate rankings of pages regarding their probabilities of having been modified.

This code was used for the experiments in the following publications:

About

GP4C - Genetic Programming for Crawling

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published