Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the correct labeling of selectionMarks - v2.1 - preview.3 #916

Open
at-philipp-heinrich opened this issue Apr 21, 2021 · 1 comment

Comments

@at-philipp-heinrich
Copy link

at-philipp-heinrich commented Apr 21, 2021

Description:
We have several identical files, with the same layout and selectionMarks in the same place, filled in differently. In the tag-editor, the selection marks are labeled on some pages and on some they are not. Even if I draw a region by myself, it only returns a NULL. It also differs in whether it is handwritten or not.

In the analyzed result it is the same. In some files they are labeled and in others they are not. So there's no real clue as to why it's reacting that way. Because the files are identical in layout.

Questions:

  • In this case, is it advisable to add even more files to give the AI more information. Because so far we have added 12 files?(identical but filled differently)
  • Is it perhaps due to the shape of the selectionMarks that they are not properly labeled?

Edit:
The problem also occurs with other selectionMarks that are not so close to each other, as seen below in the 2nd image. So in this case it can't be due to the layout.

Additional context
Fott_examples

Fott_examples_2

@RJWerning
Copy link

I've seen similar issues with 'radio' type selectionMark, checkbox style seem to always work. It's actually an issue with the form recognizer layout analyze API, not FOTT. You can see this by using the "layout analyze" option in fott-preview, you'll see that the selection marks you have issues with will not be found their either.

I posted an article on StackOverflow about this & heard back from Microsoft on it. I was able to send them some images that I was having issues with, they ran it through the new version of the detection API and said that the issues I was facing was fixed in the next preview release, scheduled for ~5/21.
https://stackoverflow.com/questions/67183842/training-custom-form-selectionmark-bounding-box-identification-issues

Couple suggestions that may help:

  • When doing your custom model training, only use documents where all of the selectionMarks are found.
  • I've found that in many cases a higher resolution image helps the form recognizer API detect them but it's not always an option for end user submitted images.
  • The more documents in your training model, the better it will work. They suggest a min of 5, but if there are issues use 10-15 so you should be ok. I trained our model with 20 and am still seeing issues.
    https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/build-training-data-set#training-data-tips

-Rich W

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants