Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Memory Error #1290

Open
user1823 opened this issue Apr 8, 2024 · 0 comments
Open

[Bug]: Memory Error #1290

user1823 opened this issue Apr 8, 2024 · 0 comments
Assignees
Labels

Comments

@user1823
Copy link

user1823 commented Apr 8, 2024

Describe the bug

I tried to OCR a file. I got an error saying "MemoryError" and the OCR couldn't be completed. I assume that this has something to do with my laptop not having sufficient RAM. But, probably OCRmyPDF can do something to work with the available RAM.

Steps to reproduce

1. Run ocrmypdf -v1 --max-image-mpixels 1000 --tesseract-downsample-large-images --tesseract-downsample-above 3508 --output-type pdf 430.pdf ocr.pdf
2. See error message.

Files

430.pdf

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

ocrmypdf 16.1.2

Relevant log output

ocrmypdf 16.1.2                                                                                           __main__.py:59
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Found tesseract 5.3.3.20231005                                                                           __init__.py:342
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\gs\\gs10.03.0\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Found gs 10.3.0                                                                                          __init__.py:342
Running: ['C:\\Program Files\\gs\\gs10.03.0\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--list-langs']                             __init__.py:133
stdout/stderr = List of available languages in "C:\Program Files\Tesseract-OCR/tessdata/" (2):            __init__.py:73
eng
osd

No language specified; assuming --language eng                                                         _validation.py:61
pikepdf mmap enabled                                                                                      helpers.py:326
Gathering info with 1 thread workers                                                                         info.py:772
pikepdf mmap enabled                                                                                      helpers.py:326
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Using Tesseract OpenMP thread limit 3                                                               tesseract_ocr.py:183
pikepdf mmap enabled                                                                                      helpers.py:326
    1 Rasterize with png16m, rotation 0                                                                 _pipeline.py:528
    1 Running: ['C:\\Program Files\\gs\\gs10.03.0\\bin\\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH',  __init__.py:133
'-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1',
'-r3343.900814x3343.900814', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr',
'-dAutoRotatePages=/None', '-f',
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.qbvrc7a5\\origin.pdf']
    1 C:\Program Files\Python312\Lib\site-packages\PIL\Image.py:3218: DecompressionBombWarning: Image    warnings.py:110
size (1082319417 pixels) exceeds limit of 1000000000 pixels, could be decompression bomb DOS attack.
  warnings.warn(

    1 Rotating output by 0                                                                            ghostscript.py:149
OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/1 -:--:--
An exception occurred while executing the pipeline                                                        _common.py:284
Traceback (most recent call last):
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_pipelines\_common.py",
line 249, in cli_exception_handler
    return fn(options, plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_pipelines\ocr.py", line
191, in _run_pipeline
    optimize_messages = exec_concurrent(context, executor)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_pipelines\ocr.py", line
118, in exec_concurrent
    executor(
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_concurrent.py", line 78,
in __call__
    self._execute(
  File
"C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\builtin_plugins\concurrency.py",
line 144, in _execute
    result = future.result()
             ^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Program Files\Python312\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_pipelines\ocr.py", line
79, in _exec_page_sync
    ocr_image_out, pdf_page_from_image_out, orientation_correction = process_page(
                                                                     ^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_pipelines\_common.py",
line 391, in process_page
    ocr_image, preprocess_out = make_intermediate_images(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_pipelines\_common.py",
line 327, in make_intermediate_images
    rasterize_out = rasterize(
                    ^^^^^^^^^^
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_pipeline.py", line 532,
in rasterize
    page_context.plugin_manager.hook.rasterize_pdf_page(
  File "C:\Program Files\Python312\Lib\site-packages\pluggy\_hooks.py", line 501, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\pluggy\_manager.py", line 119, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\pluggy\_callers.py", line 138, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "C:\Program Files\Python312\Lib\site-packages\pluggy\_callers.py", line 102, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File
"C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\builtin_plugins\ghostscript.py",
line 105, in rasterize_pdf_page
    ghostscript.rasterize_pdf(
  File "C:\Users\User\AppData\Roaming\Python\Python312\site-packages\ocrmypdf\_exec\ghostscript.py",
line 160, in rasterize_pdf
    im.save(fspath(output_file), dpi=page_dpi)
  File "C:\Program Files\Python312\Lib\site-packages\PIL\Image.py", line 2421, in save
    self._ensure_mutable()
  File "C:\Program Files\Python312\Lib\site-packages\PIL\Image.py", line 595, in _ensure_mutable
    self._copy()
  File "C:\Program Files\Python312\Lib\site-packages\PIL\Image.py", line 589, in _copy
    self.im = self.im.copy()
              ^^^^^^^^^^^^^^
MemoryError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants