Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txt detect as application/octet-stream #336

Open
MrSwed opened this issue Oct 21, 2022 · 6 comments
Open

txt detect as application/octet-stream #336

MrSwed opened this issue Oct 21, 2022 · 6 comments

Comments

@MrSwed
Copy link

MrSwed commented Oct 21, 2022

Example txt txt.txt detect as application/octet-stream

@gabriel-vasile
Copy link
Owner

I tested your file using the latest release of mimetype and it is correctly detected as text/plain.

@MrSwed
Copy link
Author

MrSwed commented Nov 1, 2022

the problem occurs with a large buffer, which is necessary to determine the office files #230

Details, click to open

image

@gabriel-vasile
Copy link
Owner

Oh, I see. The problem appears when you call mimetype.SetLimit(1024*1024) before detection. Let me see what's the problem and how to fix it.

@gabriel-vasile gabriel-vasile reopened this Nov 1, 2022
@gabriel-vasile
Copy link
Owner

The problem is the test file contains the 0x0B (vertical tab) character at index 4478.

Computer files can be either binary files or text files.
mimetype uses the MIME sniffing standard to detect text files.
This standard says: If the resource header contains no binary data bytes, the computed MIME type is "text/plain".

0x0B is considered binary data byte in this standard so the file is not detected as text/plain.

If you want these types of files to be detected as text/plain you can extend mimetype to detect other formats.

@gabriel-vasile
Copy link
Owner

There are different approaches to what is considered a text file.
One is the MIME sniffing standard I linked above.
The Linux file utility for example, considers text files any file that does not contain the null 0x00 byte. For this reason, file detects your file as text/plain.
image

@zdiff
Copy link

zdiff commented May 22, 2024

Do you think it would benefit this project to allow users to choose the binary file detection method: MIME sniffing standard or the looser Linux file standard?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants