Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPT detected as 'application/x-ole-storage' #388

Open
MrSwed opened this issue May 10, 2023 · 8 comments
Open

PPT detected as 'application/x-ole-storage' #388

MrSwed opened this issue May 10, 2023 · 8 comments

Comments

@MrSwed
Copy link

MrSwed commented May 10, 2023

The file for which the detection is inaccurate
2.zip

Expected MIME type
application/vnd.ms-powerpoint

Returned MIME type
application/x-ole-storage

Version of the library you are using
v1.4.1

Output of go version
go version go1.16.15 linux/amd64

Additional context
https://www.htmlstrip.com/mime-file-type-checker and console command file --mime-type 2.ppt give correct results

@eqinox76
Copy link

I see the same with an .xls file. Seems to be a common issue also in other mime detection libraries like ruby

@gabriel-vasile
Copy link
Owner

@eqinox76 I'm looking into how

file --mime 2.ppt

does detection in order to make it work in go.
If there's no privacy concern, please upload your .xls so I can use it for tests.

@MrSwed
Copy link
Author

MrSwed commented May 30, 2023

also check the 'file' version, those may give different results :(

image

file -v; file --mime-type -b -E 2.ppt 
file-5.41
magic file from /etc/magic:/usr/share/misc/magic
application/vnd.ms-powerpoint

@eqinox76
Copy link

Sadly i cannot share the file. When i remove proprietary information and save it the resulting file is correctly recognised as "application/vnd.ms-excel".

file in version 5.41 shows the correct mime type as well

file --mime 1.xls 
1.xls: application/vnd.ms-excel; charset=binary

I tried to debug this issue a bit more and see in matchOleClsid (ms_office.go:224, github.com/gabriel-vasile/mimetype v1.4.2) the following state:

- in[26:28]
[]uint8 len: 2, cap: 2, [3,0]
- clsidOffset
1616
- firstSecID
2
- in[clsidOffset:]
[]uint8 len: 1456, cap: 1456, [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,112,224,41,135,124,141,217,1,254,255,255,255,0,0,0,0,0,0,0,0,87,0,111,0,114,0,107,0,98,0,111,0,111,0,107,0,...+1392 more]

I hope this information helps a bit.
Many thanks for looking into it!

@gabriel-vasile
Copy link
Owner

gabriel-vasile commented Jun 1, 2023

@eqinox76 I have a suspicion that your problem happens because the excel signature is at the end of the file.
Try disabling the limit for the amount of bytes used for detection with:

mimetype.SetLimit(0) // Default limit is 3072. Setting the limit to 0 will make mimetype use whole file.
mtype, err := mimetype.DetectFile("your_file.xls")

More details are in the FAQ

@eqinox76
Copy link

eqinox76 commented Jun 1, 2023

Thanks for the tip. Sadly this file seems to work somehow different.
When i set the limit to unlimited neither the subheaders nor the magic bytes at the end of func Xls are found and the library returns application/x-ole-storage.

i can not figure out which rule in the file command recognises this .xls file. the most output i can get is:

file -d 1.xls
[try zmagic 0]
[try tar 0]
[try json 0]
[try csv 0]
[try cdf 1]
1.xls: CDFV2 Microsoft Excel

Let me know if you have another idea what information i could share without sharing the whole file.

@gabriel-vasile
Copy link
Owner

@MrSwed your file is not detected because the signature is at the end of the file. Use SetLimit as explained in FAQ and it will be detected correctly.

@eqinox76 Your case seems more complicated. mimetype uses CLSIDs for ole files detection.
It would be helpful to know the CLSID of that Xls file. This program will output the CLSID and the offset where it can be found.

package main

import (
	"encoding/binary"
	"encoding/hex"
	"fmt"
	"os"
)

func main() {
	d, err := os.ReadFile("1.xls")
	if err != nil {
		panic(err)
	}
	fmt.Println(getOleClsid(d))
}
func getOleClsid(in []byte) (int, string) {
	sectorLength := 512
	if in[26] == 0x04 && in[27] == 0x00 {
		sectorLength = 4096
	}

	// SecID of first sector of the directory stream.
	firstSecID := int(binary.LittleEndian.Uint32(in[48:52]))

	// Expected offset of CLSID for root storage object.
	clsidOffset := sectorLength*(1+firstSecID) + 80
	return clsidOffset, hex.EncodeToString(in[clsidOffset : clsidOffset+16])
}

@eqinox76
Copy link

Glad to do so! The output is

1616 00000000000000000000000000000000

Some more information that might be handy:

in[26] = 3
firstSecID = 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants