Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For some PDFs an error can cause the process to run out of memory #33

Open
Zirafnik opened this issue Apr 18, 2023 · 2 comments
Open

For some PDFs an error can cause the process to run out of memory #33

Zirafnik opened this issue Apr 18, 2023 · 2 comments

Comments

@Zirafnik
Copy link

Zirafnik commented Apr 18, 2023

A PDF with many pages, causes the Node to run out of memory.

This is a similar issue to the one in 'pdf2pic' library: yakovmeister/pdf2image#54

It can be solved with manual batching, with arrays of page numbers, but it is not a sexy solution, as you first need to determine the number of pages in the document with an external library (such as 'pdfjs-dist') and then create the arrays, allowing for non-even number of pages.

@Zirafnik Zirafnik changed the title With a big number of pages, the process runs out of memory. With a large number of pages, the process runs out of memory. Apr 18, 2023
@Zirafnik
Copy link
Author

Zirafnik commented Apr 18, 2023

Update: The batching doesn't work.

Event with setting the output folder, the library still saves buffers to memory, which eventually overruns it.

Update: Apparently I was testing with a pdf that had some kind of internal error, but was visually otherwise fine. The page 55 out of 68 was broken, which then broke one of the dependencies. I tested it with only pages 54-58, to remove the too many pages assumption, and it still broke. As soon as I fixed the pdf with an external tool, it worked.

My assumption was a broken/missing font, due to the warnings below (outputted at verbosityLevel: 1), which led me to finding ways of fixing my pdf.
The warning is connected to Mozillas pdf.js: mozilla/pdf.js#3768 (comment)

However, since the warning was firing for ALL the pages, and none up until page 55 broke, I believe the problem lies elsewhere, since the problematic dependency seems to be canvas.node.

I am not sure how to fix this or what specifically causes it, however, perhaps some kind of error handling could be added?

Warnings:

...
Warning: TT: undefined function: 32
Warning: TT: undefined function: 32
Warning: TT: undefined function: 32
...

Errors:

<--- Last few GCs --->

[20233:0x5bbd220]    71971 ms: Mark-sweep (reduce) 131.3 (170.1) -> 131.2 (136.8) MB, 140.6 / 0.0 ms  (average mu = 0.681, current mu = 0.000) external memory pressure; GC in old space requested
[20233:0x5bbd220]    72130 ms: Mark-sweep (reduce) 131.2 (136.8) -> 131.2 (136.6) MB, 159.3 / 0.0 ms  (average mu = 0.494, current mu = 0.000) external memory pressure; GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: v8::ArrayBuffer::New Allocation failed - process out of memory
 1: 0xb7a940 node::Abort() [node]
 2: 0xa8e823  [node]
 3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd5cdeb  [node]
 6: 0xd6d99d  [node]
 7: 0x7f4cd86d8b3c Context2d::GetImageData(Nan::FunctionCallbackInfo<v8::Value> const&) [/home/user/folder1/folder2/node_modules/canvas/build/Release/canvas.node]
 8: 0x7f4cd86cb3d3  [/home/user/folder1/folder2/node_modules/canvas/build/Release/canvas.node]
 9: 0xdbaa30  [node]
10: 0xdbbf6f v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x16fb7b9  [node]
Aborted

@Zirafnik Zirafnik changed the title With a large number of pages, the process runs out of memory. For some PDFs an error can cause the process to run out of memory Apr 20, 2023
@AChangXD
Copy link

@Zirafnik Same issue here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants