The task is to predict whether an image is an advertisement ("ad") or not ("nonad").
The dataset represents a set of possible advertisements on Internet pages. (2821 nonads, 458 ads) The features encode the geometry of the image (if available) as well as phrases occurring in the URL, the image's URL and alt text, the anchor text, and words occuring near the anchor text.
A previous study using this dataset: http://www.sc.ehu.es/ccwbayes/docencia/mmcc/docs/lecturas-clasificacion/abstracts-resumir/kushmerick99learning.pdf