Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.tar.gz files downloaded with Chrome get gunzipped #22

Open
RohanTalip opened this issue Aug 5, 2019 · 6 comments
Open

.tar.gz files downloaded with Chrome get gunzipped #22

RohanTalip opened this issue Aug 5, 2019 · 6 comments

Comments

@RohanTalip
Copy link

[I realise this is not an issue with the less software, but it is somewhat related]

Downloading .tar.gz files from http://www.greenwoodsoftware.com/less/download.html with Chrome causes them to become uncompressed. i.e.

$ ls -l less-530.tar.gz
1272 -rw-r--r--@ 1 rohantalip  somegroup  1300480 Aug  5 12:55 less-530.tar.gz

$ file less-530.tar.gz
less-530.tar.gz: POSIX tar archive (GNU)

This then obviously causes issues when trying to verify the .tar.gz files with the GnuPG signature.

It seems that the www.greenwoodsoftware.com webserver might be misconfigured with regards to the Content-Encoding header:
https://superuser.com/questions/940605/chromium-prevent-unpacking-tar-gz

You might want to fix that.

Using wget or curl to download the files works as expected.

@keithbowes
Copy link

Some good advice is not to use Google's spyware as a browser. Other than that, I've been using wget to download things for ages (a download manager on Windows before moving to Linux), due to how browsers' download functionality tends to be broken. It is annoying though when a download site doesn't give you a direct link to copy and paste into wget.

Anyway, you're right about the site's configuration, though downloading it works fine in my browser:

$ lynx -head -dump http://www.greenwoodsoftware.com/less/less-551.tar.gz
HTTP/1.1 200 OK
Date: Sat, 01 Feb 2020 15:15:21 GMT
Server: Apache/2
Upgrade: h2,h2c
Connection: Upgrade, close
Last-Modified: Tue, 11 Jun 2019 18:12:25 GMT
ETag: "54b7f-58b103d2402bc"
Accept-Ranges: bytes
Content-Length: 347007
Vary: User-Agent
Content-Type: application/x-gzip
Content-Encoding: x-gzip


@gwsw
Copy link
Owner

gwsw commented Nov 11, 2022

Noting #215 is a duplicate of this issue.

@ferdnyc
Copy link

ferdnyc commented Jun 28, 2023

@gwsw

I guess I'll leave this here, since this is the open issue, although there's somewhat more useful data in the report @ #215.

Possibly not the issue: Content-Type/Content-Encoding

I don't think the issue is necessarily the Content-Type or Content-Encoding header at all. GitHub's download servers use the exact same Content-Type: application/x-gzip in their responses to download requests for less-636.tar.gz. And while it's true they don't include a Content-Encoding, they do include a different header that I think is central to this issue.

curl --verbose https://codeload.github.com/gwsw/less/tar.gz/refs/tags/v636 --output less.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 140.82.112.9:443...
* Connected to codeload.github.com (140.82.112.9) port 443 (#0)
[...]
*  SSL certificate verify ok.
} [5 bytes data]
[...]
} [5 bytes data]
> GET /gwsw/less/tar.gz/refs/tags/v636 HTTP/2
> Host: codeload.github.com
> user-agent: curl/8.0.1
> accept: */*
> 
{ [5 bytes data]
[...]
< HTTP/2 200 
< access-control-allow-origin: https://render.githubusercontent.com
< content-disposition: attachment; filename=less-636.tar.gz
< content-security-policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
< content-type: application/x-gzip
< cross-origin-resource-policy: cross-origin
[...]
< x-content-type-options: nosniff
< date: Wed, 28 Jun 2023 08:25:44 GMT
< 
{ [838 bytes data]
100  812k    0  812k    0     0  2181k      0 --:--:-- --:--:-- --:--:-- 2184k
* Connection #0 to host codeload.github.com left intact

The problem: no Content-Disposition

The real problem, I think, is the lack of a Content-Disposition header indicating that the file is to be downloaded as-is. The GitHub server's response (see above) includes one, which makes all the difference:

content-Disposition: attachment; filename=less-636.tar.gz

Setting Content-Disposition

This StackOverflow answer indicates you can use Apache's mod_headers to include a Content-Disposition header with files selected using any of the standard selectors; the example given uses a FilesMatch regex to serve mp3 files as attachments:

<IfModule mod_headers.c>
    <FilesMatch "\.(mp3|MP3)$">
        ForceType audio/mpeg
        Header set Content-Disposition "attachment"
        Allow from all
    </FilesMatch>
</IfModule>

Including the filename

The Apache docs don't give any indication that the filename= parameter would be appended automatically, but since you're in a FilesMatch you can take advantage of its tagged regex matching features, something like:

<IfModule mod_headers.c>
 <FilesMatch "^(?<file>[^/]+\.tar\.gz)$">
  Header set Content-Disposition "attachment; filename=%{env:MATCH_FILE}"
  Allow from all
 </FilesMatch>
</IfModule>

(Edit: Fixed my FilesMatch regex, since the docs say it only matches against the basename of the file, not the full path.)

(Edit2: Also, if the issue is the Content-Encoding header, while you're in that same FilesMatch, you can use mod_headers to remove it as well...)

<IfModule mod_headers.c>
 <FilesMatch "^(?<file>[^/]+\.tar\.gz)$">
  Header set Content-Disposition "attachment; filename=%{env:MATCH_FILE}"
  Header unset Content-Encoding
  Allow from all
 </FilesMatch>
</IfModule>

@gwsw
Copy link
Owner

gwsw commented Jul 1, 2023

I tried this but had a couple of problems. Using your script verbatim, when I try to download a tar.gz file, Apache gives a 500 error. I changed the first Header line to just

    Header set Content-Disposition "attachment"

and that seems to work. Possibly my server is using an old version of Apache.

With the above change I can see the Content-Disposition header in the HTTP response. The unset line doesn't seem to work however. I still see the Content-Encoding: x-gzip header.

I can't really tell whether this has any effect on the issue, because I don't have Chrome. Edge does not seem to uncompress the file regardless of whether the Content-Disposition header is present or not. Perhaps someone with Chrome who does see this issue can confirm whether it's still happening with the Content-Disposition header present.

@RohanTalip
Copy link
Author

RohanTalip commented Jul 1, 2023

Perhaps someone with Chrome who does see this issue can confirm whether it's still happening with the Content-Disposition header present.

I just tried to download http://www.greenwoodsoftware.com/less/less-639-beta.tar.gz via the latest version of Chrome and still got the uncompressed version.

These were the response headers:

HTTP/1.1 200 OK
Date: ...
Server: Apache/2
Upgrade: h2,h2c
Connection: Upgrade, Keep-Alive
Last-Modified: Thu, 29 Jun 2023 19:43:18 GMT
ETag: "5c092-5ff49e97a1813"
Accept-Ranges: bytes
Content-Length: 376978
Content-Disposition: attachment
Vary: User-Agent
Keep-Alive: timeout=2, max=100
Content-Type: application/x-gzip
Content-Encoding: gzip

I think @ferdnyc may have the right approach of either adding the filename in the Content-Disposition header or removing (or not setting) the Content-Encoding: gzip header for files that are already compressed. I don't currently run Apache, so I can't comment on how to configure it to do that, and it might be different for your configuration anyway.

@gwsw
Copy link
Owner

gwsw commented Jul 2, 2023

Unfortunately I have not found a way to remove the Content-Encoding header. I've spent several hours trying various suggestions but nothing seems to work. I don't know much about the internals of apache, but since the Content-Encoding header appears after the Content-Disposition header that I recently added with mod_header, I suspect it's being added by apache after the .htaccess file is processed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants