Unable to extract text in both directions #1547

nk-alex · 2024-04-12T12:04:35Z

Bug description

Given an image with text in both directions (horizontal and vertical), I'm not able to extract text

Sample image:

Code snippet to reproduce the bug

ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)

Returning " SMAL COMPANY M B GFEEL"

ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True, assume_straight_pages=False)

"You are trying to use a model trained on straight pages while not assuming your pages are straight. If you have only straight documents, don't pass assume_straight_pages=False, otherwise you should use one of these archs:['db_resnet50_rotation']"

ocr_predictor(det_arch='db_resnet50_rotation', reco_arch='crnn_vgg16_bn', pretrained=True, assume_straight_pages=False)

Returns "UST NE'RE FOR"

ocr_predictor(
    det_arch="fast_base",
    reco_arch="parseq",
    pretrained=True,
    det_bs=8,
    reco_bs=1024,
    assume_straight_pages=False,
    straighten_pages=False,
    detect_orientation=True,
)

Returns unknown architecture 'fast_base'

Error traceback

Environment

Python 3.9.18
python-doctr 0.7.0

Deep Learning backend

is_tf_available: False
is_torch_available: True

The text was updated successfully, but these errors were encountered:

felixdittrich92 · 2024-04-12T12:10:27Z

Hi @nk-alex 👋,

Could you try one of the fast models please (only available on main branch atm) these works much better with rotated texts

And please upgrade to latest: 0.8.1 or use directly the main branch :)

nk-alex · 2024-04-12T12:22:52Z

Thank you for the quick response @felixdittrich92.

I tried the following:

pip install -U git+https://github.com/mindee/doctr.git@main

pip list is showing now python-doctr 0.9.0a0

ocr_predictor(
    det_arch="fast_base",
    reco_arch="parseq",
    pretrained=True,
    det_bs=8,
    reco_bs=1024,
    assume_straight_pages=False,
    straighten_pages=False,
    detect_orientation=True,
)

I see is trying to download from https://doctr-static.mindee.com/models?id=v0.8.1/fast_base-688a8b34.pt&src=0 which returns "HTTP Error 308: Permanent Redirect"

felixdittrich92 · 2024-04-12T12:25:15Z

Mh .. but it is available: https://github.com/mindee/doctr/releases/download/v0.8.1/fast_base-688a8b34.pt

CC @odulcy-mindee

felixdittrich92 · 2024-04-13T17:46:02Z

Could you please retry and report back if it's still not working ?
Unfortunately i can not reproduce this behaviour

odulcy-mindee · 2024-04-15T08:38:36Z

Hello @nk-alex, any update ?

nk-alex · 2024-04-15T12:56:23Z

Sorry for the delay. Now it successfully downloads.

This is my result in this case:

ocr_predictor(
    det_arch="fast_base",
    reco_arch="parseq",
    pretrained=True,
    det_bs=8,
    reco_bs=1024,
    assume_straight_pages=False,
    straighten_pages=False,
    detect_orientation=True,
)

Returns: "A - JUST I I - - N C WE'RE ( - Ju I I - - I - PA - - NV - A FOR - I - - 44 STRIVING"

felixdittrich92 · 2024-04-15T14:17:49Z

You are right all models have problems with the large text parts:

So the vertical text is detected and recognized correctly but the horizontal large text isn't.

I think that's a problem from the dataset we use for pretraining because it contains mostly commonly seen documents/receipts @odulcy-mindee correct me if it contains other data 😅

odulcy-mindee · 2024-04-16T12:24:34Z

I think that's a problem from the dataset we use for pretraining because it contains mostly commonly seen documents/receipts @odulcy-mindee correct me if it contains other data 😅

Yeah, indeed, we don't have such image in our dataset

nk-alex · 2024-04-16T14:01:06Z

Is there any way this could be achieved with current models state? If not, are you considering integrating this feature in the near future? In spain we have many documents where important information is written vertically on left side: (the best example I could find on internet)

felixdittrich92 · 2024-04-17T06:08:43Z

Hey @nk-alex 👋,

yeah i see looks like the detection model has some problems with the vertical text in your example.
Normally on the samples i have used for testing it works pretty well.
Could you test it on some real samples you want to process and check if it works ?
But yeah of course we will further optimize the models step by step :)

nk-alex · 2024-04-17T07:25:58Z

Hi @felixdittrich92 with the ocr_predictor configuration specified above, on real samples, I get some of the vertical words but not most of them. Is there any other ocr_predictor configuration with better results for this use case?

Something like this is what I get in most cases:

Sample1	Sample2

wesamalnabki · 2024-04-18T15:45:18Z

I'm facing the same problem here. The vertical text is not detected at all.

from doctr.models import ocr_predictor
ocr_model = ocr_predictor(det_arch = "db_resnet50",# "linknet_resnet50",
                        reco_arch = "crnn_vgg16_bn",
                        pretrained = True,
                        pretrained_backbone = True,
                        assume_straight_pages = False, 
                        preserve_aspect_ratio = True, 
                        symmetric_pad = True, 
                        export_as_straight_boxes = True, 
                        detect_orientation = False, 
                        straighten_pages = True, 
                        detect_language = False)

# Modify the binarization threshold and the box threshold
ocr_model.det_predictor.model.postprocessor.bin_thresh = 0.3
ocr_model.det_predictor.model.postprocessor.box_thresh = 0.2

felixdittrich92 · 2024-04-19T14:07:02Z

Hi all 👋,

Thanks for sharing i see we should think on that for the next detection model training iteration looks like the models have some problems with text instances which are light gray / close to the border:
CC @odulcy-mindee

Tested also with fast_base and -90° rotation

felixdittrich92 · 2024-05-22T07:50:34Z

Moved to #1604

nk-alex added the type: bug Something isn't working label Apr 12, 2024

felixdittrich92 added the awaiting response Waiting for feedback label Apr 15, 2024

felixdittrich92 mentioned this issue May 22, 2024

[experimential] [detection] model training iteration with updated augmentation pipeline #1604

Open

9 tasks

felixdittrich92 closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to extract text in both directions #1547

Unable to extract text in both directions #1547

nk-alex commented Apr 12, 2024

felixdittrich92 commented Apr 12, 2024 •

edited

nk-alex commented Apr 12, 2024

felixdittrich92 commented Apr 12, 2024

felixdittrich92 commented Apr 13, 2024

odulcy-mindee commented Apr 15, 2024

nk-alex commented Apr 15, 2024

felixdittrich92 commented Apr 15, 2024 •

edited

odulcy-mindee commented Apr 16, 2024

nk-alex commented Apr 16, 2024

felixdittrich92 commented Apr 17, 2024

nk-alex commented Apr 17, 2024

wesamalnabki commented Apr 18, 2024 •

edited

felixdittrich92 commented Apr 19, 2024 •

edited

felixdittrich92 commented May 22, 2024

Unable to extract text in both directions #1547

Unable to extract text in both directions #1547

Comments

nk-alex commented Apr 12, 2024

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Deep Learning backend

felixdittrich92 commented Apr 12, 2024 • edited

nk-alex commented Apr 12, 2024

felixdittrich92 commented Apr 12, 2024

felixdittrich92 commented Apr 13, 2024

odulcy-mindee commented Apr 15, 2024

nk-alex commented Apr 15, 2024

felixdittrich92 commented Apr 15, 2024 • edited

odulcy-mindee commented Apr 16, 2024

nk-alex commented Apr 16, 2024

felixdittrich92 commented Apr 17, 2024

nk-alex commented Apr 17, 2024

wesamalnabki commented Apr 18, 2024 • edited

felixdittrich92 commented Apr 19, 2024 • edited

felixdittrich92 commented May 22, 2024

felixdittrich92 commented Apr 12, 2024 •

edited

felixdittrich92 commented Apr 15, 2024 •

edited

wesamalnabki commented Apr 18, 2024 •

edited

felixdittrich92 commented Apr 19, 2024 •

edited