Save the OCRed PDF #1595

micos7 · 2024-05-13T14:48:53Z

🚀 The feature

I`d like to save the pdf after OCR

Motivation, pitch

Alternatives

I tried something like this but exceptions all over

   `doc = DocumentFile.from_pdf(pdf_content)
    # Perform OCR using doctr
    model = ocr_predictor(pretrained=True)
    result = model(doc)



    # Extract text from the OCR result
    text = ""
    for page in result.pages:
        for block in page.blocks:
            for line in block.lines:
                for word in line.words:
                    text += word.value + " "

    # Save the OCR result back to the original file path
    with open(body.url, 'w') as pdf_file:
        pdf_file.write(text)`

Additional context

Thanks for your work.

No response

The text was updated successfully, but these errors were encountered:

felixdittrich92 · 2024-05-13T16:37:26Z

Hi @micos7 : 👋
What you want to create is a PDF/A File (PDF with text layer).
Please take a look at https://mindee.com/blog/create-ocrized-pdfs-in-2-steps :)

felixdittrich92 · 2024-05-17T13:15:33Z

Any updates @micos7 ? :)

micos7 added the type: enhancement Improvement label May 13, 2024

felixdittrich92 added the awaiting response Waiting for feedback label May 17, 2024

micos7 closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save the OCRed PDF #1595

Save the OCRed PDF #1595

micos7 commented May 13, 2024 •

edited

felixdittrich92 commented May 13, 2024

felixdittrich92 commented May 17, 2024

Save the OCRed PDF #1595

Save the OCRed PDF #1595

Comments

micos7 commented May 13, 2024 • edited

🚀 The feature

Motivation, pitch

Alternatives

Additional context

felixdittrich92 commented May 13, 2024

felixdittrich92 commented May 17, 2024

micos7 commented May 13, 2024 •

edited