Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes predict long, redundancy and repetitive chars, like 'parseqqqqqqqqqqqq'(ground truth is 'parseq') #56

Open
WenjunLiu6146 opened this issue Dec 8, 2022 · 5 comments

Comments

@WenjunLiu6146
Copy link

Does any chance you know the reason? Thank you for your talented work!

@baudm
Copy link
Owner

baudm commented Dec 8, 2022

Hello, please provide more details such as the model and weights used as well as the exact image you're using.

@WenjunLiu6146
Copy link
Author

Hello, please provide more details such as the model and weights used as well as the exact image you're using.

I used the parseq trained on Chinese dataset, which contains about 6K chars, and used it for test dataset inference.

@baudm
Copy link
Owner

baudm commented Dec 9, 2022

Sorry but I can't help you since:

  1. I have no access to and am not familiar with the specific model you're referring to.
  2. I have no access to and am not familiar with the data you're using.
  3. PARSeq was developed and tested with Latin characters on primarily English text. I am not familiar with the intricacies of Chinese text.

You might want to try increasing the number of decoder layers, or using a larger version of the model since the Chinese charset is much bigger than the Latin one.

@ceyxasm
Copy link

ceyxasm commented Jul 6, 2023

So I tinkered a lot and this is perhaps due to 'label_length' in main.yaml and the image size you are giving to the model.
In my case, with the model on hugging face; if you input an image with a single word parseq followed by white-spaces that are equivalent to 30-35 characters in total, the result is correct.

However, if we exceed this and input an image with length of lets say beyond 40 characters, redundant repetition of characters is seen.

image
gave me gatery.comminFreedom

image
gave me gateway.................

image
gave me gateway.

It is probably due to the face that model was trained on 1-word images and will hallucinate for longer labels.
I trained a model with label-length set to 65 and it was able to overcome this problem.

@WenjunLiu6146
Copy link
Author

Thanks. I'll try your sollution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants