Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model results #27

Open
ewan-m opened this issue Feb 9, 2024 · 9 comments
Open

Model results #27

ewan-m opened this issue Feb 9, 2024 · 9 comments

Comments

@ewan-m
Copy link

ewan-m commented Feb 9, 2024

Hi!

I've been playing around with this model locally following the instructions in the README and my results don't seem to be nearly as good as yours.. I'm following your instructions here #11 (comment) and then running the prepare.py in my fork

For instance even with different style prompts the model seems to generate very similar results for me
Real on left, Generated on right

IAM style 1
image-9-IAM

IAM style 2
image-6_1

and then secondly the CVL and the IAM models give very different results to each other. But also quite consistent results within the model itself for different styles

CVL style 1
image-9-CVL

CVL style2
image-6

Is there something stupid I'm missing or do I need to train it with these writers in the dataset to get better results? does the google drive contain the fully trained models that were used to generate the results in the paper?

Very cool project though - congrats!!

@ankanbhunia
Copy link
Owner

Thanks for sharing the results and your fork repo. I assume these examples are custom handwriting not from the IAM/CVL dataset.

Well, It seems from the results that the model does perform worse for the in-the-wild scenarios. Can you share here a zip file of the style examples used in the above example, so that I can test it on my machine, and then confirm if anything is missing?

@ewan-m
Copy link
Author

ewan-m commented Feb 9, 2024

of course! here's a zip of 30 example 32x192 pixel word pngs in style1 and style2

styles.zip

really appreciate the help btw!

I've got the start of a web app where you take a picture of a page of your writing, and then it uses OCR to cut and scale it all into these png files ready to feed into the model. My plan is to export the model as an ONNX and use onnx-runtime web to do the generation in the browser itself... If I can get some cool results locally first! 😁

@ewan-m
Copy link
Author

ewan-m commented Feb 13, 2024

So I've been playing about with it a bunch more and I've found that

  • running the jupyter notebook with no changes to get some baseline nice results
  • then screenshotting the random style example words as a png and cutting/scaling them all to 32 x 192
  • then running those back through the model
  • leads to wildly different results!

which suggests that the model is very sensitive to the exact resolution / scaling / thresholding of the original dataset and doesn't handle anything being upscaled/downscaled differently to exactly how the source data was.

Do you reckon it's worth training it further with a bunch of slightly differently rotated/scaled data or do you reckon there's something else going wrong for me entirely? 😁

@ankanbhunia
Copy link
Owner

Sorry for the late reply.

I suppose you are correct, but I am unsure whether training with differently rotated/scaled data would be beneficial. Also, doing so might make the training unstable.

I haven't been able to test the results of your examples yet. It's been a busy week. I will give it a try over the weekend.

@ankanbhunia
Copy link
Owner

Screenshot 2024-02-16 at 17 31 27

@ewan-m,

I tried the model with your style1 samples. The results I got look not bad.

You can have a look at how I preprocessed the style examples in the load_itw_samples(.) function. Here, I use a minimum boundary area crop followed by a resize/padding operation.

def load_itw_samples(folder_path, num_samples = 15):

Also, I added a notebook file demo_custom_handwriting.ipynb. Here you just need to input the cropped word images of the custom handwriting. Images do NOT need to be scaled, resized, or padded. load_itw_samples(.) will take care of them.

I tried to find out why your results are poor. I think the preprocessing function, especially the minimum area cropping, is different in your case. I also found out that model.eval() was not called in the previous demo.ipynb file. That might have caused issues when inputting images outside the training corpus.

@ewan-m
Copy link
Author

ewan-m commented Feb 18, 2024

Thanks so much for this! I think adding the model.eval() is making some big differences and not having to care about resizing and padding is great too! I can confirm I reproduce the results you've shared above 😁

It's still quite hit and miss with other samples and seems sensitive to how the image is pre processed though, but I think that's to be expected.

I'm experimenting with iam vs cvl model and different pre-processing of images to find what seems to give the optimal results - preliminarily it seems some thresholding improves things greatly, but going for a full binary black or white thresholding makes things worse again, and leaving all the noise of the white page background is worst. Can share some images if you'd be interested!

@ankanbhunia
Copy link
Owner

Nice!

During training, we maintain a fixed receptive field of 16 for each character. So, to get optimal results try to resize the style images to [16*len(string)x32]. For example, a 4-character word 'door' should be of dimension [64x32]. This way can reduce the domain gap further.

@shuangzhen361
Copy link

Hi,Why can’t I produce good results despite trying many methods?
1656efcb184ad66b08348d2842724d3

@shuangzhen361
Copy link

Here is my zip file, thank you!
image.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants