Compressed data & chunk size fails fetch #895

voidware · 2023-09-18T13:39:27Z

I'm having a problem with emscripten sokol_fetch and compressed data with chunk size;

Sokol issues a HEAD and gets the compressed content length.

HTTP/1.1 200 OK
Date: Mon, 18 Sep 2023 12:41:06 GMT
Connection: Keep-Alive
ETag: "1695039706"
Cache-Control: max-age=86400
Content-Encoding: gzip
Content-Length: 18042
Content-Type: application/json
Last-Modified: Mon, 18 Sep 2023 12:21:46 GMT
Accept-Ranges: bytes
Vary: Origin

Sokol issues a get range and gets uncompressed data;

HTTP/1.1 206 Partial Content
Date: Mon, 18 Sep 2023 12:58:19 GMT
Connection: Keep-Alive
ETag: "1695039706"
Cache-Control: max-age=86400
Content-Length: 1024
Content-Range: bytes 0-1023/52494
Content-Type: application/json
Last-Modified: Mon, 18 Sep 2023 12:21:46 GMT
Accept-Ranges: bytes
Vary: Origin

And the server does not compress it (no Content-Encoding field). The range requested is interpreted as that of uncompressed data.

So here we get the first 1K of 52K.

But Sokol stops fetching after 18042 of uncompressed data and the download is incomplete.

I don't know if this is a server problem or a Sokol problem. But it would seem the server has the option always to send the data uncompressed anyway and this is what it is doing.

Also, would it ever be the case that ranges are compressed? For example, does the server have the option to compress each range separately and therefore have completely different Content-Length both to the request and to any HEAD request?

And if a range within a file were requested how would it ever be possible for to receive uncompressed data in the buffer? So i dont think the fetch buffer needs to be bigger than the chunk size ever. Except for chunk_size=0.

The text was updated successfully, but these errors were encountered:

floooh · 2023-09-18T14:55:44Z

Hmm, I'm somewhat sure that I had received compressed chunks when experimenting with streaming downloads, otherwise I wouldn't have gone to great length describing that scenario here:

sokol/sokol_fetch.h

Lines 600 to 638 in 751fc4c

    
               CHUNK SIZE AND HTTP COMPRESSION 
        
               =============================== 
        
               TL;DR: for streaming scenarios, the provided chunk-size must be smaller 
        
               than the provided buffer-size because the web server may decide to 
        
               serve the data compressed and the chunk-size must be given in 'compressed 
        
               bytes' while the buffer receives 'uncompressed bytes'. It's not possible 
        
               in HTTP to query the uncompressed size for a compressed download until 
        
               that download has finished. 
        
               With vanilla HTTP, it is not possible to query the actual size of a file 
        
               without downloading the entire file first (the Content-Length response 
        
               header only provides the compressed size). Furthermore, for HTTP 
        
               range-requests, the range is given on the compressed data, not the 
        
               uncompressed data. So if the web server decides to server the data 
        
               compressed, the content-length and range-request parameters don't 
        
               correspond to the uncompressed data that's arriving in the sokol-fetch 
        
               buffers, and there's no way from JS or WASM to either force uncompressed 
        
               downloads (e.g. by setting the Accept-Encoding field), or access the 
        
               compressed data. 
        
               This has some implications for sokol_fetch.h, most notably that buffers 
        
               can't be provided in the exactly right size, because that size can't 
        
               be queried from HTTP before the data is actually downloaded. 
        
               When downloading whole files at once, it is basically expected that you 
        
               know the maximum files size upfront through other means (for instance 
        
               through a separate meta-data-file which contains the file sizes and 
        
               other meta-data for each file that needs to be loaded). 
        
               For streaming downloads the situation is a bit more complicated. These 
        
               use HTTP range-requests, and those ranges are defined on the (potentially) 
        
               compressed data which the JS/WASM side doesn't have access to. However, 
        
               the JS/WASM side only ever sees the uncompressed data, and it's not possible 
        
               to query the uncompressed size of a range request before that range request 
        
               has finished. 
        
               If the provided buffer is too small to contain the uncompressed data, 
        
               the request will fail with error code SFETCH_ERROR_BUFFER_TOO_SMALL.

If the server answers that the data will be sent compressed with a HEAD request, but then doesn't send compressed chunks, then currently sokol_fetch.h indeed cannot know when the download has finished.

The streaming sample here doesn't seem to use compression (e.g. the HEAD request returns with the actual uncompressed data size, probably because compression is deactivated for MPEG files):

https://floooh.github.io/sokol-html5/plmpeg-sapp.html

If it's only about detecting when the streamed download is complete, then I can probably look at the Content-Range response header:

Content-Range: bytes 0-1023/52494

...since the part after the slash is the overall size, so it's possible to just look at the chunk's Content-Range header to check for completion.

That sounds like a plan. I need to look into sokol_fetch.h again soonish anyway because of #882.

voidware · 2023-09-18T15:38:03Z

Thanks for looking at this.

It appears, when a HEAD is issued, the Content-Length will reflect whether compression is acceptable, since i think the value from HEAD is meant to be the same as the value from GET, all things being consistent.

So

curl -I <url> -H "Accept-Encoding: gzip"

Will contain:

Content-Encoding: gzip
Content-Length: 18042

curl -I <url>

Will contain

Content-Length: 52494

In such cases the Content-Length will then be consistent with a subsequent GET in the non-range case.

For ranges, I'm thinking the server can opt out of compression. I think identity is always implied. I tried to stop it with:

curl <url> -i -H "Accept-Encoding: gzip,identity;q=0" -H "Range: bytes=0-1023"

But it still returned uncompressed data.

voidware · 2023-09-18T15:42:46Z

BTW, if you're going to be looking at fetch sometime, can you have a quick look at the case where a buffer is not pre-assigned. I tried the method of allocating the buffer in dispatch, but my callback never happened. I could only get the pre-allocated buffer method to work.

BTW2, For the short time i have a workaround for the range problem. It turns out i only need chunks for streaming media, which is already compressed. Fetching small text files does not need chunks as they always fit in my buffer anyhow. For now i just set chunk_size to zero for those files.

BTW3, it would be nice to know whether ranges can indeed be compressed and whether the server can opt to compress each range separately. I read somewhere some CDNs do this. I have had a look around and can't find anything definite in this area. Seems to be a bit of a hole in the specifications.

Thanks.

floooh · 2023-09-20T16:05:49Z

where a buffer is not pre-assigned...

...hmm, the cgltf-sapp.c sample works like that. The sfetch_send() calls don't assign a buffer:

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L284-L287

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L578-L582

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L655-L659

...and the buffer is assigned inside the response-callback when the response is in dispatched state, using the channel and lane-indices to select a buffer:

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L447-L450

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L467-L469

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L487-L489

...are you using it differently? (if yes the documentation probably needs to be improved)

voidware · 2023-09-21T11:30:42Z

Thanks for checking this. I tried it again. Yes, the problem is only when you have a nonzero chunk_size. it blows an assert complaining the buffer is too small for the chunk, because there is no buffer yet!

voidware · 2023-09-21T11:31:50Z

Also am i right in thinking assinging the buffer in dispatch will cause an additional frame delay? If so, I'll probably preassign the buffer anyhow.

floooh · 2023-09-21T15:45:12Z

Also am i right in thinking assinging the buffer in dispatch will cause an additional frame delay?

It actually shouldn't because the dispatch callback is 'short-circuited' as soon as a lane is assigned to the request and before it is enqueued for processing, there's no extra roundtrip involved (the channel and lane index lets you pick a buffer which will only be written to by this specific request, because it's guaranteed that no other request is in flight with the same channel/lane combination):

sokol/sokol_fetch.h

Lines 2485 to 2491 in b803c9a

    
           item->state = _SFETCH_STATE_DISPATCHED; 
        
           item->lane = _sfetch_ring_dequeue(&chn->free_lanes); 
        
           // if no buffer provided yet, invoke response callback to do so 
        
           if (0 == item->buffer.ptr) { 
        
               _sfetch_invoke_response_callback(item); 
        
           } 
        
           _sfetch_ring_enqueue(&chn->user_incoming, slot_id);

floooh self-assigned this Sep 18, 2023

floooh added bug sokol-fetch labels Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed data & chunk size fails fetch #895

Compressed data & chunk size fails fetch #895

voidware commented Sep 18, 2023

floooh commented Sep 18, 2023 •

edited

voidware commented Sep 18, 2023

voidware commented Sep 18, 2023

floooh commented Sep 20, 2023 •

edited

voidware commented Sep 21, 2023

voidware commented Sep 21, 2023

floooh commented Sep 21, 2023 •

edited

Compressed data & chunk size fails fetch #895

Compressed data & chunk size fails fetch #895

Comments

voidware commented Sep 18, 2023

floooh commented Sep 18, 2023 • edited

voidware commented Sep 18, 2023

voidware commented Sep 18, 2023

floooh commented Sep 20, 2023 • edited

voidware commented Sep 21, 2023

voidware commented Sep 21, 2023

floooh commented Sep 21, 2023 • edited

floooh commented Sep 18, 2023 •

edited

floooh commented Sep 20, 2023 •

edited

floooh commented Sep 21, 2023 •

edited