Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect filename in Content-Disposition header #360

Open
tongwang opened this issue Nov 18, 2021 · 1 comment
Open

Incorrect filename in Content-Disposition header #360

tongwang opened this issue Nov 18, 2021 · 1 comment

Comments

@tongwang
Copy link
Contributor

Fixes of #167, #124, #225 and #285 only mask the error, but never generate the correct Content-Disposition header.

With those fixes:
when rfc6266 is installed, we get TypeError as reported in #274.
when rfc6266 is not installed, we get incorrect filename in Content-Disposition header. For example, if the filename is hello.c, instead of Content-Disposition: attachment; filename=hello.c, we get Content-Disposition: attachment; filename=b'hello.c'. This may explain #333.

With incorrect filenames, Tika's content detection may return different file types. Use the same hello.c as an example, with Content-Disposition: attachment; filename=hello.c, Tika content detection returns text/x-csrc, while with Content-Disposition: attachment; filename=b'hello.c', Tika returns text/plain, because Tika thinks the file name is b'hello.c'.

@chrismattmann chrismattmann added this to the tika-next milestone Dec 31, 2022
@chrismattmann
Copy link
Owner

Interesting. Please propose a patch to fix this if you have time. Thanks @tongwang I will take a look if you submit a PR and in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants