Releases: Conal-Tuohy/APIHarvester
Exponential back-off
In this release, when a request returns an error, and is retried (potentially several times, depending on the value of the retries
parameter), the pause between each retry of the same URI is doubled each time. The first retry is made after a 5s wait, the second after a 10s wait, the third after a 20s wait, etc.
APIHarvester with filtering and enhanced resumption control
This release incorporates the resume-when-xpath
and discard-xpath
parameters. The first is a boolean expression which controls whether APIHarvester will resume harvesting after receiving a response from the API. The second is an XPath expression which matches elements in the response which the harvester can ignore.
V1.2
This version includes the ability to throttle requests by specifying a number of seconds delay between each request.
This version also allows for the resumptionXPath
to return a string rather than just a nodeset. This means the resumption URL can be assembled from parts, rather than simply read out of the harvested document, which allows APIHarvester to harvest from less RESTful APIs such as OAI-PMH.
namespace-aware version
This version adds the ability to bind XML Namespace prefixes to URIs, and to use those prefixes to refer to namespaces in the XPath expressions used to control APIHarvester.
There is an additional option to indent harvested XML.
First release
The initial release of APIHarvester is a command-line application written in Java, which allows for easy harvesting bulk XML records from web APIs such as those of the National Library of Australia's Trove service, and the National Library of New Zealand's Digital NZ.