-
Notifications
You must be signed in to change notification settings - Fork 0
Download API
The download method of the VertNet API provides a simple way to download VertNet data in tab-delimited formatted text files. This method is oriented towards getting large amounts of VertNet data efficiently, although users can download record sets of any size.
Getting large datasets can be a time-consuming task, so we decided to make downloads asynchronous in contrast to searches. This means the request will return immediately with information about the download process, while the download itself is enqueued in the backend. The user will receive a notification via email when the results are ready for download.
See the documentation home page for information on the current URL to access the service.
Requests are made by building a query object (see below) and adding this as a parameter to the API's method's URL. This query object specifies the query parameters and provides some extra arguments needed to properly build the download.
As a simple example, the following request will prepare a download of all Noturus placidus records (the threatened Neosho madtom catfish) in a file called noturusplacidus.txt (well, not exactly, see below) and will send an email notification to you@example.com:
[http://api-module.vertnet-portal.appspot.com/api/v1/download?q={"q": "noturus placidus", "n": "noturusplacidus", "e": "you@example.com"}](http://api-module.vertnet-portal.appspot.com/api/v1/download?q={"q":"noturus placidus", "n": "noturusplacidus", "e": "you@example.com"})
The query object is a JSON object that contains the parameters that define the query to be performed, the email to send notification to, the output file name and other result features. This object is added at the end of the download method URL as the value of the q method argument, with a question mark ? separating both entities. The above URL is an example of how to build a basic download. In that URL, the query object is everything between (and including) the curly braces:
{"q": "noturus placidus", "n": "noturusplacidus", "e": "you@example.com"}IMPORTANT NOTE: As you can see in the URL above, there are two different q values. The first one, ?q= is the query object itself whereas the second one, {"q": is the query string element of the query object (see below). This distinction is important, since the query object should never be quoted and the query string should always be quoted.
The query object can have the following elements.
The element that defines the query terms, the definition of the query, the records you want to retrieve. This is the most important element in the quey object and, therefore, is a mandatory element. Without it, the API call will fail. The query terms must be present as a single, quoted string. See the "Query string" wiki page for more information on how to properly build this element and different options for more complex queries.
Example:
{"q": "noturus placidus"}A mandatory element that specifies the name of the file that will be created. Actually, this will only specify part of the file name (the beginning of it), since the download method will add a unique ID string at the end of the file name, and will append the .txt extension. For example, a name specified as noturusplacidus will turn into something like noturusplacidus-59dc056fd8f44bd2bf398761e536f479.txt
Example:
{"q": "noturus placidus", "n": "noturusplacidus"}Mandatory. As we have said, downloads are asynchronous and won't return any record immediately. In order to properly receive the download when it's ready, users must specify an email address to which the notification will be sent.
NOTE: the API does not do any email parsing and, therefore, a query with no email or an invalid email address can be launched. In these cases, the results will be generated and stored in the backend data warehouse, but no notification will be sent. Since the last part of the download file name is random, there is no way for the user to "guess" the link to the file except by asking us.
{"q": "noturus placidus", "n": "noturusplacidus", "e": "you@example.com"}A few seconds after launching the download query, the API will return a JSON object with a message indicating the success/failure of the call. In case of a success, the answer will have this structure:
{
"result": "success",
"file_name": "<the value of the 'n' parameter>",
"email": "<the value of the 'e' parameter>",
"query": "<the value of the 'q' parameter>",
"api_version": "<a string with specific details on the API version, for feedback purposes>",
"source": "<a string indicating the application that made the call. Usually this will be 'DownloadAPI'>"
}Note that a "success" in the result element does not mean the download is ready, or that it is guaranteed that it return any record. This "success" applies to the download call and it only means that the download was properly queued. Whether it returns any record or not is a totally different matter.
If everything went well with the download itself, after some time (between seconds and hours, depending on the volume of the download) you will receive a notification in the specified email address with the query terms you used, how many records matched those terms and the time of the request and download fulfillment. Also, you will receive a link to the file itself.
Even though downloads will be kept for a long time, we recommend not to rely on having the data in the backend server and proceed to download immediately. We might need to clean-up the data warehouse at any moment without notice. We will do our best, though, to keep download files for at least 24h.
Asynchronous tasks are a bit tricky in the sense that the user cannot know if the process is running smoothly or if something wrong happened. In the past, we acknowledge to have seen "stale" downloads that had been running for even hundreds of days. While retrieving really large datasets (like the whole MVZ collection) will take a long time, it will definitely not take days.
If, for some reason, you don't receive a notification of your download being ready after 24h, please create a new issue in the code repository http://www.github.com/vertnet/api/issues and/or send us an email at vertnetinfo@vertnet.org