Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Asset Sanitizer Queue & Preview Check #14998

Closed
brusch opened this issue Apr 21, 2023 · 4 comments · Fixed by #16053
Closed

Implement Asset Sanitizer Queue & Preview Check #14998

brusch opened this issue Apr 21, 2023 · 4 comments · Fixed by #16053
Assignees
Milestone

Comments

@brusch
Copy link
Member

brusch commented Apr 21, 2023

Improvement description

Uploaded assets should be sanitized in a message queue, we should check existing things there, like max pixels but also additional things like Javascript in PDFs or SVGs and other things.

self::checkMaxPixels($mimeTypeGuessData, $data);

The previews should be only available after the assets were sanitized, in the meantime we don't display any preview of an asset in the admin UI.

Maybe we can combine that with the existing AssetUpdateTasksHandler or AssetPreviewImageHandler is probably even better to avoid race conditions? 🤔

@heigpa
Copy link

heigpa commented Sep 27, 2023

Would this be a solution to sanitize the pdf files? javascript - How to remove Active Content from uploaded PDF documents? - Stack Overflow

@brusch
Copy link
Member Author

brusch commented Oct 2, 2023

Yes, that would work, but we should try to get a PHP solution, otherwise we'd need Ghostscript as a requirement.
Maybe we can just use an RegEx to find out whether the PDF includes a script or not, if we we don't display the preview - that would be sufficient for our case.

@kingjia90
Copy link
Contributor

kingjia90 commented Apr 18, 2024

Considering #16955 which hints that scanning could be triggering false positive, and excluding that we want to actually sanitize the files, suggesting to consider ways to block JS on rendering (client-side).

What about using https://github.com/mozilla/pdf.js ?
It has javascript blocked by default https://github.com/mozilla/pdf.js/blob/7290faf840ab5a3828d12bbbb460bd8ad49f6ffa/extensions/chromium/preferences_schema.json#L65-L70

Also tried Adobe PDF Embed API and works fine, the JS got blocked, it's unlimited and free (but needs a token and i am not sure about licensing).

One potential advantage would be replacing the native pdf viewer that comes with the browser WHICH differs depending on the browser (making it homogeneous) with something that Devs could have more control on how it should appear and the tools it should provide (highlighter, search in page button etc..) than the basic chrome one which only allows to zoom, download and print.

@brusch
Copy link
Member Author

brusch commented May 2, 2024

We were using pdf.js before, but we removed it to reduce dependencies and maintenance work.
I'd stick to using the browsers native PDF viewer and also the scanning, but provide a button to display the PDF anyway (in a new window -> now XSS 😉 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants