Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Fix pdf scanner #16956

Draft
wants to merge 3 commits into
base: 11.2
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 13 additions & 2 deletions lib/Messenger/Handler/AssetUpdateTasksHandler.php
Expand Up @@ -19,6 +19,7 @@
use Pimcore\Helper\LongRunningHelper;
use Pimcore\Messenger\AssetUpdateTasksMessage;
use Pimcore\Model\Asset;
use Pimcore\Model\Asset\Enum\PdfScanStatus;
use Pimcore\Model\Version;
use Psr\Log\LoggerInterface;

Expand Down Expand Up @@ -62,8 +63,18 @@ private function saveAsset(Asset $asset): void

private function processDocument(Asset\Document $asset): void
{
if ($asset->getMimeType() === 'application/pdf' && $asset->checkIfPdfContainsJS()) {
$asset->save(['versionNote' => 'PDF scan result']);
if ($asset->isPdfScanningEnabled() && $asset->getMimeType() === 'application/pdf' && !$asset->getScanStatus()) {
$asset->setCustomSetting($asset::CUSTOM_SETTING_PDF_SCAN_STATUS, PdfScanStatus::IN_PROGRESS->value);
$this->saveAsset($asset);

if ($asset->checkIfPdfContainsJS()) {
$asset->setCustomSetting($asset::CUSTOM_SETTING_PDF_SCAN_STATUS, PdfScanStatus::SAFE->value);
$note = 'safe';
} else {
$asset->setCustomSetting($asset::CUSTOM_SETTING_PDF_SCAN_STATUS, PdfScanStatus::UNSAFE->value);
$note = 'unsafe';
}
$asset->save(['versionNote' => 'PDF scan result:' . $note]);
Copy link
Contributor Author

@kingjia90 kingjia90 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhhh, this could be bad, if by uploading a pdf that takes 5 seconds to check, and within that time, it's replaced by a new one that takes 1 second, by the time it finishes the first task, it would add a newer version with a earlier date than the creation date of the latest one, or things like that.

That's probably why it uses saveAsset() that avoid Versioning.
Is it async or sync?
Is it necessary vernsioNote? could it be moved to metadata?

}

$pageCount = $asset->getCustomSetting('document_page_count');
Expand Down
7 changes: 7 additions & 0 deletions models/Asset.php
Expand Up @@ -17,6 +17,7 @@

use Doctrine\DBAL\Exception\DeadlockException;
use Exception;
use Pimcore\Model\Asset\Enum\PdfScanStatus;
use function in_array;
use function is_array;
use League\Flysystem\FilesystemException;
Expand Down Expand Up @@ -498,6 +499,12 @@ public function save(array $parameters = []): static

$parameters['isUpdate'] = $isUpdate; // need for $this->update() for certain types (image, video, document)

if ($this->getDataChanged()) {
if ($this->getType() === 'document' && $this instanceof Asset\Document) {
$this->setCustomSetting($this::CUSTOM_SETTING_PDF_SCAN_STATUS, null);
}
}

// we wrap the save actions in a loop here, to restart the database transactions in the case it fails
// if a transaction fails it gets restarted $maxRetries times, then the exception is thrown out
// especially useful to avoid problems with deadlocks in multi-threaded environments (forked workers, ...)
Expand Down
23 changes: 2 additions & 21 deletions models/Asset/Document.php
Expand Up @@ -167,15 +167,6 @@ public function getText(int $page = null): ?string

public function checkIfPdfContainsJS(): bool
{
if (!$this->isPdfScanningEnabled()) {
return false;
}

$this->setCustomSetting(
self::CUSTOM_SETTING_PDF_SCAN_STATUS,
Model\Asset\Enum\PdfScanStatus::IN_PROGRESS->value
);

$chunkSize = 1024;
$filePointer = $this->getStream();

Expand All @@ -187,20 +178,10 @@ public function checkIfPdfContainsJS(): bool
}

if (str_contains($chunk, '/JS') || str_contains($chunk, '/JavaScript')) {
$this->setCustomSetting(
self::CUSTOM_SETTING_PDF_SCAN_STATUS,
Model\Asset\Enum\PdfScanStatus::UNSAFE->value
);

return true;
return false;
}
}

$this->setCustomSetting(
self::CUSTOM_SETTING_PDF_SCAN_STATUS,
Model\Asset\Enum\PdfScanStatus::SAFE->value
);

return true;
}

Expand Down Expand Up @@ -228,7 +209,7 @@ private function isTextProcessingEnabled(): bool
return Config::getSystemConfiguration('assets')['document']['process_text'];
}

private function isPdfScanningEnabled(): bool
public function isPdfScanningEnabled(): bool
{
return Config::getSystemConfiguration('assets')['document']['scan_pdf'];
}
Expand Down