You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think its because LibreOffice (OpenOffice) put files into docx-archive in different order than MS Word.
So file have another signature and detects as simple zip-archive.
That's correct because it is a zip file. For me to detect it as something else means that I have to open it up and process the file contents which I'm less interested in doing. I'm specifically worried about processing a large zip file just to see if it is a doc file. But maybe I can read in the first X bytes of the zip file and look for key files....
@j256 Did you add something to detect this case as a DOCX file? I found similar problem and I tried several ways to detect it as a DOCX file but just "Tika core" library was able to detect this case correctly.
Tika tika = new Tika();
String mimeType = tika.detect(filePath);
// output mime type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
I think its because LibreOffice (OpenOffice) put files into docx-archive in different order than MS Word.
So file have another signature and detects as simple zip-archive.
Example file is attached.
DocxByLibreOffice.docx
The text was updated successfully, but these errors were encountered: