Skip to content

Commit

Permalink
Add class explanation and cache control details
Browse files Browse the repository at this point in the history
  • Loading branch information
jmarshall committed Jun 12, 2018
1 parent 3eaeabd commit edcea92
Showing 1 changed file with 29 additions and 2 deletions.
31 changes: 29 additions & 2 deletions htsget.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,11 +281,23 @@ _optional object_
For HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is `{"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}`, then the client must supply the headers `Range: bytes=0-1023` and `Authorization: Bearer xxxx` with the HTTPS request to the URL.
</td></tr>
<tr markdown="block"><td>
`class`
_string_
`class`
_optional string_
</td><td>
For file formats whose specification describes a header and a body, the class indicates which of the two will be retrieved when querying this URL. The allowed values are `header` and `body`.
</td></tr>
<tr markdown="block"><td>
`ETag`
_optional string_
</td><td>
The _entity-tag_ that would be returned to a request for the URL.
</td></tr>
<tr markdown="block"><td>
`Last-Modified`
_optional string_
</td><td>
The last modification _HTTP-date_ that would be returned to a request for the URL.
</td></tr>
</table>

</td></tr>
Expand Down Expand Up @@ -346,6 +358,7 @@ An example of a JSON response is:
4. Client concatenates data blocks to produce local blob.

While the blocks must be finally concatenated in the given order, the client may fetch them in parallel.
If the ticket returned by the server contains `class` fields or cache control fields, the client may use them to avoid fetching already-downloaded headers as described below.

### HTTPS data block URLs

Expand All @@ -370,6 +383,20 @@ The client obtains the data block by decoding the embedded base64 payload.

Note: the base64 text should not be additionally percent encoded.

### Avoiding re-fetching ticket array URLs

Clients may use `class` fields and the usual HTTP cache control mechanisms to avoid re-fetching URLs in the ticket array whose contents the client has already downloaded.
For example, when making multiple requests to fetch reads (or variants) within different regions of the same `<id>` resource, usually the SAM/CRAM (or VCF) headers will not change between requests.
When the headers are large and the requested regions are small, the headers will constitute most of the downloaded data and it will be advantageous to avoid re-fetching this unchanged data.

If classes are specified in the ticket, zero or more of the entries at the start of the `urls` array will have class `header`.
When the client has previously downloaded the resource's SAM/VCF headers, it may reuse these known headers rather than re-fetching the `header`-class URLs.
(The boundary between the contents of the final `header` URL and the first `body` URL must be at the start of the first data record, as described in FIXME FOOTNOTE FOR BAM/CRAM/VCF/BCF.
If the resource is BGZF-compressed, the end of the contents of the final `header` URL must be the end of a BGZF block.)

Clients SHOULD use the usual HTTP caching facilities (`Cache-Control`; `ETag`/`If-None-Match` and/or `Last-Modified`/`If-Modified-Since`) to ensure that reused cached data is still valid.
If the server has provided `ETag` or `Last-Modified` ticket fields for a particular URL, the client can use them to avoid making even a request/304 round trip for that URL.

### Reliability & performance considerations

To provide robustness to sporadic transfer failures, servers should divide large payloads into multiple data blocks in the `urls` array. Then if the transfer of any one block fails, the client can retry that block and carry on, instead of starting all over. Clients may also fetch blocks in parallel, which can improve throughput.
Expand Down

0 comments on commit edcea92

Please sign in to comment.