Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconstitution] Improve synthesize output quality #1528

Open
tzktz opened this issue Mar 26, 2024 · 20 comments
Open

[reconstitution] Improve synthesize output quality #1528

tzktz opened this issue Mar 26, 2024 · 20 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed module: utils Related to doctr.utils type: enhancement Improvement
Milestone

Comments

@tzktz
Copy link

tzktz commented Mar 26, 2024

          @felixdittrich92 i have face result image is not upto quality...fonts are breaks in result image..

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("bankstatement.pdf")
# Analyze
result = model(doc)
import matplotlib.pyplot as plt
plt.imshow(result.synthesize()[0]); plt.axis('off'); plt.show()

see the result image..
Figure_1

Originally posted by @tzktz in #1525 (comment)

@tzktz tzktz changed the title @felixdittrich92 i have face result image is not upto quality...fonts are breaks in result image.. i have face result image is not upto quality...fonts are breaks in result image.. Mar 26, 2024
@felixdittrich92
Copy link
Contributor

Yeah we can maybe align the y-coords between line elements (words) and add some small horizontal default padding between detections
CC @odulcy-mindee

@tzktz
Copy link
Author

tzktz commented Mar 26, 2024

Yeah we can maybe align the y-coords between line elements (words) and add some small horizontal default padding between detections CC @odulcy-mindee

how to change the font_family ? @felixdittrich92

@felixdittrich92
Copy link
Contributor

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)

@tzktz
Copy link
Author

tzktz commented Mar 26, 2024

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

@felixdittrich92
Copy link
Contributor

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

@tzktz
Copy link
Author

tzktz commented Mar 27, 2024

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

yes i have that font in my project folder.. @felixdittrich92

@felixdittrich92
Copy link
Contributor

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

yes i have that font in my project folder.. @felixdittrich92

Ah ok got it that's not enough you need to install the font on your system : https://linuxiac.com/how-to-install-fonts-on-linux/#:~:text=Go%20to%20%E2%80%9CSystem%20Settings%E2%80%9D%20%3E,%E2%80%9CInstall%20from%20File%E2%80%9D%20button.&text=Then%20select%20the%20font%20files,%2Dwide%20or%20per%2Duser.

@tzktz
Copy link
Author

tzktz commented Mar 27, 2024

see the below input and output results.. result image quality is very poor.. pixels were broken @felixdittrich92
input image..(1240 x 1754) 158.44kb
input

result image..(1907 x 965) 46kb
Figure_1

@tzktz
Copy link
Author

tzktz commented Apr 2, 2024

@felixdittrich92 any update?

@felixdittrich92
Copy link
Contributor

Hi @tzktz 👋,

Unfortunately i don't have the time to work on that at the moment, so we need to address this later on or you work on that if you want (feel free to open a PR)

related code can be found at:

def synthesize_page(

Best regards,
Felix

@felixdittrich92 felixdittrich92 changed the title i have face result image is not upto quality...fonts are breaks in result image.. [reconstitution] Improve synthesize output quality Apr 16, 2024
@felixdittrich92 felixdittrich92 added this to the 2.0.0 milestone Apr 16, 2024
@felixdittrich92 felixdittrich92 added type: enhancement Improvement good first issue Good for newcomers help wanted Extra attention is needed module: utils Related to doctr.utils labels Apr 16, 2024
@SkaarFacee
Copy link
Contributor

@felixdittrich92 Hey, sorry for being MIA. I needed to take some time off. I am back now and I was hoping I could take up this issue? Let me know regarding this :)

@felixdittrich92
Copy link
Contributor

@felixdittrich92 Hey, sorry for being MIA. I needed to take some time off. I am back now and I was hoping I could take up this issue? Let me know regarding this :)

Hey @SkaarFacee 👋
Sure feel free to work on it 😊
The code moved a bit it is now in:
https://github.com/mindee/doctr/blob/main/doctr/utils/reconstitution.py

@SkaarFacee
Copy link
Contributor

Okay thanks. Let me take a look on what I can do

@SkaarFacee
Copy link
Contributor

@felixdittrich92 Do you have any suggestions on how I can improve the quality of the image ?

@felixT2K
Copy link
Contributor

felixT2K commented Apr 29, 2024

@SkaarFacee
One thing we could do is if we have the line box information we could align all boxes inside to the line y coordinate (to become a more straight view)
I found the following hf space the reconstitution looks not bad maybe you can use it as reference or to get some inspiration ^^ :
https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py

@SkaarFacee
Copy link
Contributor

Okay, I will take a look and see what can be done over the weekend. This doesn't look that complex at quick glance :)

@felixdittrich92
Copy link
Contributor

Okay, I will take a look and see what can be done over the weekend. This doesn't look that complex at quick glance :)

Yeah i think too :)

@felixdittrich92 felixdittrich92 modified the milestones: 2.0.0, 1.0.0 May 5, 2024
@SkaarFacee
Copy link
Contributor

Hey, I am working on on this, sorry for the delay. Something came up at work and got me busy

@SkaarFacee
Copy link
Contributor

@felixT2K
I was using the link you mentioned as reference (https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py)
I can't exactly pin point the place where the y coordinate was used to to align the line.
If the goal is to straighten the line, why don't we make the y coordinates of each box in the line the same using a mathematical approach ( using the mean or centroid of the boxes). If you could maybe give me more insights on the hf reference I would gladly implement that as too 😄

@felixdittrich92
Copy link
Contributor

@felixT2K I was using the link you mentioned as reference (https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py) I can't exactly pin point the place where the y coordinate was used to to align the line. If the goal is to straighten the line, why don't we make the y coordinates of each box in the line the same using a mathematical approach ( using the mean or centroid of the boxes). If you could maybe give me more insights on the hf reference I would gladly implement that as too 😄

Correct that was what i have had in mind you know which boxes are in one line (of `resolve_lines=True otherwise if only one line element available keep the y coords of each box) then take the lines y coordinate for each box to straighten the boxes on the line :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed module: utils Related to doctr.utils type: enhancement Improvement
Projects
None yet
Development

No branches or pull requests

4 participants