Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thumbnail generation for mis-detected file types #769

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

glubsy
Copy link

@glubsy glubsy commented Nov 28, 2020

  • If a file failed to be detected via filename extension by the javascript frontend, try to detect its type to generate a thumbnail.
  • Use mime type detection to determine the actual file type (if FileInfo module is indeed loaded, otherwise skip detection).
  • Don't write capture file to hard drive disk anymore, but keep it in memory until final thumbnails have been generated.
  • Fix SWF files not having thumbnails generated when they could (usually they can be handled by ffmpeg).
  • Various performance improvements by reducing superfluous or redundant checks.

Fixes #770.

@glubsy
Copy link
Author

glubsy commented Nov 28, 2020

Note:

  • This is based off of the base branch in Fix short videos thumbnail generation #765, this is a continuation/extension of that work.
  • I don't see the point in keeping capture files on disk, since they are only used once or twice, and then just sit there wasting disk space forever. Instead, I chose to keep the capture data in memory until the final thumbnails have been generated, after which it will be discarded.
  • I tried to make the code as clear as possible, had to refactor things to that end.
  • Even if the thumbnail is generated for a file, the javascript front-end still doesn't know the actual file type. This might need an update to the javascript code for "previews".
    It should not be too hard to send back the detected file type in the response to the request and use the proper player.

Keep the default assignment in case there are more types.
* Use ffprobe/avprobe to get the total duration of each video file first.
* Compute a desired optimal timestamp of 15% of the total duration to seek for and get the thumbnail from.
* Fix capture of ffmpeg's output (from stderr) in exec_cmdv() by using an optional redirection to stdout.
* Capture of sub-process output is optional (possible slight performance gain).
Overwrite already existing thumbnails if the source file's mtime is more recent
* Return stdout as expected from ffprobe to get duration.
* Avoid throwing an exception in favour of a default value for potential
malformed total duration got from ffprobe.
* Conflict with the newly added boolean arguments if they are
explicitely passed by the caller.
* cmdv has to be an array.
* Allows users to specify which percentage of the total video duration
to seek in instead of hardcoding 15%.
* Now defaults to 50%.
* This fixes a denial-of-service exploit that would allow the client to
generate an infinite number of thumbnails and fill up the storage completely.
Since the client had control over the requested thumbnail sizes, it could make arbitrary requests for thumbnails, and every time the backend did not find an already generated thumbnail with the specified sizes, it would happily generate a new one.
* Remove the ability of the client to decide thumbnail dimensions and
only let the back-end do this by reading the configuration.
* Limit the number of generated thumbnails per file to only one, with
"landscape" dimensions (4/3).
* Use CSS "object-fit" property to adjust displaying of landscape thumbnails into squares. Ref: https://developer.mozilla.org/en-US/docs/Web/CSS/object-fit
* Generate thumbnail despite wrong detected filetype

When a file has the wrong extension, fall back to other types
available until one works.
For unsupported files, get the mime type to determine if we truly cannot
support thumbnail generation.
Cache the type detected for further requests with different sizes.

* Test whether server has Fileinfo extension active

Only check mime types with fileinfo extension if it is active.
If not, avoid brute forcing type detection by rolling over the various
thumbnail generation methods and simply return no thumbnail.

* Keep capture file in memory instead of writing to disk

Keep the capture data in the Image class if the capture data is valid,
otherwise destroy the Image object.
Image doesn't read files from disk directly anymore.
* This fixes a denial-of-service exploit that would allow the client to
generate an infinite number of thumbnails and fill up the storage completely.
Since the client had control over the requested thumbnail sizes, it could make arbitrary requests for thumbnails, and every time the backend did not find an already generated thumbnail with the specified sizes, it would happily generate a new one.
* Remove the ability of the client to decide thumbnail dimensions and
only let the back-end do this by reading the configuration.
* Limit the number of generated thumbnails per file to only one, with
"landscape" dimensions (4/3).
* Use CSS "object-fit" property to adjust displaying of landscape thumbnails into squares. Ref: https://developer.mozilla.org/en-US/docs/Web/CSS/object-fit
* We now use only one thumbnail file src for both square and rational
thumbnails so remove superfluous requests.
* Set the same src for both square and landscape dom classes.
The href is converted to path.
Only compute thumbnail configured dimensions on thumbnail API requests.
* In order to handle file types that we don't want to generate
thumbnails for, be explicit about it in the options.
We cannot rely on just removing the unwanted file-types from the
arrays of strings in the options, as the backend will generate
thumbnails for files it detects the actual type for anyway.
When a value is removed from a default array of strings in the
options, the removed value should be considered explicitly blocked by
the user: thumbnails should not be generated, actual underlying file type should
not be checked for. Thus the blocklist should be updated with the
missing string.
* Down-sampling should only be requested by the backend.
* For now remove the client-side logic entirely since it is both
superfluous and vulnerable. This can be implemented again on the
back-end if necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant