Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to extract text in both directions #1547

Closed
nk-alex opened this issue Apr 12, 2024 · 14 comments
Closed

Unable to extract text in both directions #1547

nk-alex opened this issue Apr 12, 2024 · 14 comments
Labels
awaiting response Waiting for feedback type: bug Something isn't working

Comments

@nk-alex
Copy link

nk-alex commented Apr 12, 2024

Bug description

Given an image with text in both directions (horizontal and vertical), I'm not able to extract text

Sample image:

tUiSO

Code snippet to reproduce the bug

ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)

  • Returning " SMAL COMPANY M B GFEEL"

ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True, assume_straight_pages=False)

  • "You are trying to use a model trained on straight pages while not assuming your pages are straight. If you have only straight documents, don't pass assume_straight_pages=False, otherwise you should use one of these archs:['db_resnet50_rotation']"

ocr_predictor(det_arch='db_resnet50_rotation', reco_arch='crnn_vgg16_bn', pretrained=True, assume_straight_pages=False)

  • Returns "UST NE'RE FOR"
ocr_predictor(
    det_arch="fast_base",
    reco_arch="parseq",
    pretrained=True,
    det_bs=8,
    reco_bs=1024,
    assume_straight_pages=False,
    straighten_pages=False,
    detect_orientation=True,
)
  • Returns unknown architecture 'fast_base'

Error traceback

Environment

Python 3.9.18
python-doctr 0.7.0

Deep Learning backend

is_tf_available: False
is_torch_available: True

@nk-alex nk-alex added the type: bug Something isn't working label Apr 12, 2024
@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Apr 12, 2024

Hi @nk-alex 👋,

Could you try one of the fast models please (only available on main branch atm) these works much better with rotated texts

And please upgrade to latest: 0.8.1 or use directly the main branch :)

@nk-alex
Copy link
Author

nk-alex commented Apr 12, 2024

Thank you for the quick response @felixdittrich92.

I tried the following:

pip install -U git+https://github.com/mindee/doctr.git@main

pip list is showing now python-doctr 0.9.0a0

ocr_predictor(
    det_arch="fast_base",
    reco_arch="parseq",
    pretrained=True,
    det_bs=8,
    reco_bs=1024,
    assume_straight_pages=False,
    straighten_pages=False,
    detect_orientation=True,
)

I see is trying to download from https://doctr-static.mindee.com/models?id=v0.8.1/fast_base-688a8b34.pt&src=0 which returns "HTTP Error 308: Permanent Redirect"

@felixdittrich92
Copy link
Contributor

@felixdittrich92
Copy link
Contributor

Could you please retry and report back if it's still not working ?
Unfortunately i can not reproduce this behaviour

@odulcy-mindee
Copy link
Collaborator

Hello @nk-alex, any update ?

@felixdittrich92 felixdittrich92 added the awaiting response Waiting for feedback label Apr 15, 2024
@nk-alex
Copy link
Author

nk-alex commented Apr 15, 2024

Sorry for the delay. Now it successfully downloads.

This is my result in this case:

ocr_predictor(
    det_arch="fast_base",
    reco_arch="parseq",
    pretrained=True,
    det_bs=8,
    reco_bs=1024,
    assume_straight_pages=False,
    straighten_pages=False,
    detect_orientation=True,
)
  • Returns: "A - JUST I I - - N C WE'RE ( - Ju I I - - I - PA - - NV - A FOR - I - - 44 STRIVING"

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Apr 15, 2024

You are right all models have problems with the large text parts:

Screenshot from 2024-04-15 16-15-16

So the vertical text is detected and recognized correctly but the horizontal large text isn't.

I think that's a problem from the dataset we use for pretraining because it contains mostly commonly seen documents/receipts @odulcy-mindee correct me if it contains other data 😅

@odulcy-mindee
Copy link
Collaborator

I think that's a problem from the dataset we use for pretraining because it contains mostly commonly seen documents/receipts @odulcy-mindee correct me if it contains other data 😅

Yeah, indeed, we don't have such image in our dataset

@nk-alex
Copy link
Author

nk-alex commented Apr 16, 2024

Is there any way this could be achieved with current models state? If not, are you considering integrating this feature in the near future? In spain we have many documents where important information is written vertically on left side: (the best example I could find on internet)

image

@felixdittrich92
Copy link
Contributor

Hey @nk-alex 👋,

yeah i see looks like the detection model has some problems with the vertical text in your example.
Normally on the samples i have used for testing it works pretty well.
Could you test it on some real samples you want to process and check if it works ?
But yeah of course we will further optimize the models step by step :)

@nk-alex
Copy link
Author

nk-alex commented Apr 17, 2024

Hi @felixdittrich92 with the ocr_predictor configuration specified above, on real samples, I get some of the vertical words but not most of them. Is there any other ocr_predictor configuration with better results for this use case?

Something like this is what I get in most cases:

Sample1 Sample2
Captura de pantalla 2024-04-17 090901 Captura de pantalla 2024-04-17 091521

@wesamalnabki
Copy link

wesamalnabki commented Apr 18, 2024

I'm facing the same problem here. The vertical text is not detected at all.

from doctr.models import ocr_predictor
ocr_model = ocr_predictor(det_arch = "db_resnet50",# "linknet_resnet50",
                        reco_arch = "crnn_vgg16_bn",
                        pretrained = True,
                        pretrained_backbone = True,
                        assume_straight_pages = False, 
                        preserve_aspect_ratio = True, 
                        symmetric_pad = True, 
                        export_as_straight_boxes = True, 
                        detect_orientation = False, 
                        straighten_pages = True, 
                        detect_language = False)

# Modify the binarization threshold and the box threshold
ocr_model.det_predictor.model.postprocessor.bin_thresh = 0.3
ocr_model.det_predictor.model.postprocessor.box_thresh = 0.2

vert_exp

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Apr 19, 2024

Hi all 👋,

Thanks for sharing i see we should think on that for the next detection model training iteration looks like the models have some problems with text instances which are light gray / close to the border:
CC @odulcy-mindee

Tested also with fast_base and -90° rotation

Screenshot from 2024-04-19 16-00-23

@felixdittrich92
Copy link
Contributor

Moved to #1604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response Waiting for feedback type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants