Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some file types from the Go version #41

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

fzzylogic
Copy link

Added some low hanging fruit file types from the Go version. Put Dcm under archive (as on the Go side). Didn't improve matroska detection or add docx, xlsx or pptx. Nice lib ^^.

@codefo
Copy link

codefo commented Nov 27, 2019

Hey, @h2non what we need to do to merge this PR? Maybe I can help 🙂

@ixna
Copy link

ixna commented Nov 28, 2019

https://github.com/h2non/filetype.py/pull/41/files#diff-ad453f8a0e9dcc5a7320fb8fa6e98de5R96-R99

all doc, xls and ppt are have the same file signature, so no matter which one checked will always detected as a doc type, because it is evaluated first.

:edit:
and also the same case for docx, pptx, xlsx type which in current repository will be detected as zip archive type.

so here i am confused about how to implement this for ms office document types.

:update:
i think it's better to make a group type for doc file signature (magic number) to be application/x-ole-storage and determine which type by filename extension.

@@ -134,6 +148,13 @@ Font
- **ttf** - ``application/font-sfnt``
- **otf** - ``application/font-sfnt``

Document
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group microsoft office pre 2007 documents as application/x-ole-storage as this 3 types have same file signatures. We can determine the file type by filename extension.


# Supported application types
DOCUMENT = (
document.Doc(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc type it will valid for xls and ppt because of same file signature, so no need to define the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants