Skip to content

Releases: scrapinghub/product-extraction-benchmark

1.0.0

26 Apr 16:01
Compare
Choose a tag to compare

We compare the quality of product extraction between Zyte Automatic Extraction (Zyte), Diffbot, and open-source tools (extruct) for the following attributes: price, availability (whether the product is in-stock or out-of-stock), SKU.

Attached are:

  • snapshots in WARC format, which can be served by pywb: dataset-warc.zip
  • screenshots of pages (before snapshot creation): dataset-jpeg.zip

See README for more details.