Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kiwix-Serve does not support Multipart-range HTTP requests #855

Open
kelson42 opened this issue Dec 7, 2022 · 8 comments
Open

Kiwix-Serve does not support Multipart-range HTTP requests #855

kelson42 opened this issue Dec 7, 2022 · 8 comments
Assignees

Comments

@kelson42
Copy link
Collaborator

kelson42 commented Dec 7, 2022

If such a request is done with latest version 3.4.0, then an error is returned:

$ curl https://library.kiwix.org/content/micmaths_fr_all_2022-10/videos/IbV0UoXXcOY/video.webm -i -H "Range: bytes=0-50, 10-150"
HTTP/2 416 
date: Wed, 07 Dec 2022 16:50:55 GMT
content-type: video/webm
content-length: 0
access-control-allow-origin: *
etag: "da17b3bc-69ba-bbf3-5b9d-e34363056d44/Z"
cache-control: max-age=3600, must-revalidate
x-varnish: 4722013 4394821
age: 12885
via: 1.1 varnish (Varnish/7.1)
accept-ranges: bytes
content-range: bytes */63655850
strict-transport-security: max-age=15724800; includeSubDomains

Unfortunately, the analysis of library.kiwix.org logs has shown that we have legitimate clients (Chrome on Android) which generate such kind of requests.

Therefore, this part of the specification like explained here should be supported:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests#multipart_ranges

Kind of follow-up of #363

@veloman-yunkan
Copy link
Collaborator

This limitation was documented in #360:

This PR enables handling of partial content requests with a single byte-range. Requests for two or more byte ranges (even if they effectively constitute a single continuous range) are rejected with a 416 (Range Not Satisfiable) error response. Such behaviour complies with somewhat liberal interpretation of the spec):

The 416 (Range Not Satisfiable) status code indicates that none of the ranges in the request's Range header field (Section 3.1) overlap the current extent of the selected resource or that the set of ranges requested has been rejected due to invalid ranges or an excessive request of small or overlapping ranges.

@kelson42 Do you know what happens in response to our 416 response to such a multi-part range request?

  1. Does the client come back with a set of new separate single-range requests?
  2. Or it then requests the entire item instead?

I believe that for scenario 1, we shouldn't waste any effort implementing this enhancement.

@kelson42
Copy link
Collaborator Author

kelson42 commented Dec 8, 2022

I don't know how browsers react to this, probably they just stop because the spec is not fully implemented, which is not an allowed scenario (either you support byte ranges or not).

@kelson42 kelson42 modified the milestones: 12.1.0, 12.2.0 Dec 9, 2022
@veloman-yunkan
Copy link
Collaborator

I don't know how browsers react to this, probably they just stop because the spec is not fully implemented, which is not an allowed scenario (either you support byte ranges or not).

@kelson42 Can't we find out a fact based answer from library.kiwix.org logs?

@kelson42
Copy link
Collaborator Author

@rgaudin ?

@rgaudin
Copy link
Member

rgaudin commented Dec 16, 2022

That sounds difficult but the 416 requests were:

library.kiwix.org 18.212.255.64 - - [13/Nov/2022:15:33:25 +0000] "GET http://library.kiwix.org/catalog/v2/categories HTTP/1.1" 416 0 "-" "Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)"
library.kiwix.org xxx.xxx.xxx.xxx - - [15/Nov/2022:17:26:50 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org 37.120.157.86 - - [16/Nov/2022:06:20:18 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org 37.120.157.86 - - [16/Nov/2022:06:21:27 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"

Now I have about 800 lines of logs spread across 2 IPs. I removed the IPs and there is apparently no suggest nor content search request so I guess it's fine to share here.

416-user2.log
416-user.log

@veloman-yunkan
Copy link
Collaborator

416-user2.log Doesn't contain any 416 responses.

Looking at 416-user.log, I see that a request to http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 is first satisfied with a 200 status code and a couple of seconds later another request for the same URL is rejected with a 416 status code:

library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:20:16 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 200 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:20:18 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"

This pattern repeats another time:

library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:21:25 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 200 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:21:27 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"

However, it is not clear if for the 416 responses we deal with multi-part range requests (it could rather be, for example, an out-of-bounds single-range request). Yet it is strange that a web client sends a range request for an illustration resource.

@kelson42 What made you think that the 416 responses from library.kiwix.org are caused by multi-part range requests?

@kelson42 kelson42 modified the milestones: 12.2.0, 12.3.0 Feb 12, 2023
@kelson42 kelson42 removed this from the 12.3.0 milestone Mar 27, 2023
@kelson42
Copy link
Collaborator Author

@kelson42 What made you think that the 416 responses from library.kiwix.org are caused by multi-part range requests?

Concretly nothing I can remember, but what would be another plausible scenario?

@kelson42
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants