batchAnnotateFiles failing silently (and taking php thread with it) #7288

James-THEA · 2024-05-03T21:14:38Z

Environment details

OS: Amazon Linux 2023
PHP version: 8.2.15
Package name and version: v1.9.0

Steps to reproduce

Use this file:
faraone2005 (1).pdf
Request pages 1-10
a. Two batches of 5 pages. It works if I do only 1-9.

More context:
I have a setup to parse PDFs that relies on the Google Cloud Vision API. It has worked for the past several months, and anecdotally this is a new issue. There is no error thrown, and the PHP thread just dies.

Moreover, the issue doesn't exist in all my environments. Locally, everything works great (PHP version 8.2.4). On an Amazon Beanstalk server it works as well (some versions as listed above). The issue exists on both new and old servers that we have spun up. That means there is a possible solution of finding the discrepancy between the servers and updating the problem; however, I still think this should be filed as a bug.

I have added memory usage logging, and nothing appears that crazy (>100MB). It does spike on the first request using batchAnnotateFiles and then dies on the second request, so it is possible it spikes again (as I strongly suspect a memory limit is the problem).

I found this bug report: https://www.googlecloudcommunity.com/gc/AI-ML/Vision-AI-OCR-Internal-server-error-Failed-to-process-features/m-p/735441

It looks almost identical to my issue, but it is for Vision AI, so the fix is not applicable

Code example

A little edited for brevity, but I can confirm it still has the problem.

private function myFunction($filePath, int $startingPage, int $lastPage): FileUploadResponse {
        $pdfContent = \Storage::get($filePath);
        $inputConfig = (new InputConfig())
            ->setMimeType('application/pdf')
            ->setContent($pdfContent);
        $feature = (new Feature())->setType(Type::DOCUMENT_TEXT_DETECTION);

        $totalPages = range($startingPage + 1, $lastPage + 1);
        $pageChunks = array_chunk($totalPages, 5);
        $overallText = '';
        $maxLength = self::MAX_UPLOAD_TEXT_LENGTH;        
        
        for ($chunk = 0; $chunk < count($pageChunks); $chunk++) {
            try {
                $imageAnnotator = new ImageAnnotatorClient(['credentials' => 'redacted']);
                $pages = $pageChunks[$chunk];
                $annotateFileRequest = (new AnnotateFileRequest())
                    ->setInputConfig($inputConfig)
                    ->setFeatures([$feature])
                    ->setPages($pages);
                try {
                    $response = $imageAnnotator->batchAnnotateFiles([$annotateFileRequest]); // request dies here
                } catch (\Exception $e) {
                    Logger(json_encode($e));
                }
                $responses = $response->getResponses()[0]->getResponses();

                for ($x = 0; $x < min(count($pages), count($responses)); $x++) {
                    $pageResponse = $responses[$x];
                    if ($pageResponse->hasError()) {
                        continue;
                    }
                    if ($pageResponse->getFullTextAnnotation() !== null) {
                        $overallText .= $pageResponse->getFullTextAnnotation()->getText();
                    }
                }
            } finally {
                $imageAnnotator->close();
                gc_collect_cycles();
            }
        }
        return new FileUploadResponse(text: $overallText);
    }

The text was updated successfully, but these errors were encountered:

James-THEA · 2024-05-06T19:40:53Z

Adding some follow up investigation:

If we decrease the batch size to 1-4 pages, it works
If we don't chunk by pages and make one request, it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batchAnnotateFiles failing silently (and taking php thread with it) #7288

batchAnnotateFiles failing silently (and taking php thread with it) #7288

James-THEA commented May 3, 2024 •

edited

James-THEA commented May 6, 2024

batchAnnotateFiles failing silently (and taking php thread with it) #7288

batchAnnotateFiles failing silently (and taking php thread with it) #7288

Comments

James-THEA commented May 3, 2024 • edited

Environment details

Steps to reproduce

Code example

James-THEA commented May 6, 2024

James-THEA commented May 3, 2024 •

edited