You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems the Apache Tika command line interface doesn't support passing in the MIME type of the document (or any additional metadata for that matter).
Tika's Detector Interface would consider such metadata, but the metadata argument seems to be only exposed in the Tika API, not the command line interface.
Then refines this detection result using the file extension (if available)
And then refines it again using the content type from the supplied metadata (which we can't set)
The command line interface help describes a switch
-d or --detect Detect document type
Which seems to be enabled by default (otherwise, converting a temporary file with no extension wouldn't have worked). Still, we should probably enable this switch to be sure content type detection is always performed.
Currently a temporary file without file extension is used to store the original document passed to Tika.
We probably should
TransformEngine
'sconvertTo
method on to TikaThe text was updated successfully, but these errors were encountered: