Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Core] Calculate thresholding confidence using data #39

Open
Udayraj123 opened this issue Sep 29, 2022 · 8 comments
Open

[Feature][Core] Calculate thresholding confidence using data #39

Udayraj123 opened this issue Sep 29, 2022 · 8 comments

Comments

@Udayraj123
Copy link
Owner

Udayraj123 commented Sep 29, 2022

The core logic of OMRChecker revolves around finding the correct separation between Marked and Unmarked bubbles. We want to let the user know if it has been determined confidently.

Image

In the above image there are two possible thresholds based on the jumps in the histogram. In such cases the confidence metric will be useful to separate bad quality images.

More references in Rich Visuals section.

Note: this issue is marked with the hacktoberfest label. Follow #hacktoberfest-discussions on Discord for further details .

@Udayraj123 Udayraj123 changed the title [Enhancement][Core] Calculate thresholding confidence using data [Feature][Core] Calculate thresholding confidence using data Sep 29, 2022
@grgkaran03
Copy link

Hi, I would like to take up this issue.
Can you please tell me the approach to how I can start working on this?

@Udayraj123
Copy link
Owner Author

Hi @grgkaran03, thanks for showing interest. Let's discuss it on discord and then you can share your brief summary of things to do over here in a comment. Ping me in the channel mentioned in the description.

@Udayraj123
Copy link
Owner Author

Hi @grgkaran03, any updates/need help with anything?

@Udayraj123
Copy link
Owner Author

This task would be under a PR with an ongoing work for improving the debugging experience.

@Udayraj123
Copy link
Owner Author

Sharing a sample histogram where the MIN_JUMP configuration seemed to be ineffective
image

image image

and somehow the global threshold is also too high because the overall image is bright.

@Udayraj123
Copy link
Owner Author

Udayraj123 commented Feb 17, 2024

Analysis:
The global threshold logic was not working for this q-vals plots. Because the minimum value was too high.
(q-vals indicates list of mean pixel values of all bubbles in the omr template)

image

Setting it to 100 is also not separating the red and green lines (ideally red line should auto-correct itself to the first large gap)

This happens when there's no sharp jump between to consecutive values in the above histogram

A confidence metric is needed when there is not "clear first large jump" as it is likely to wrongly detect a few bubbles near-by that threshold (unless of-course a local threshold saves that case)

image

For a particular set of images, we can configure the MIN_JUMP parameter to solve this via config.json:

{
  "threshold_params": {
    "MIN_JUMP": 15
  }
}
image

But reducing the MIN_JUMP increases wrong detections for images with shadows/low contrast shades.

For example, in above plot, the positions 40-50 may potentially have marked bubbles with low contrast. The local thresholding technique should clear the issue most of the times, but OMRChecker is less confident about such cases.
image

The confidence metric should help us identify the same and potentially find a solution. We can try labelling the questions in the plot itself to gather some insights.

@Udayraj123
Copy link
Owner Author

Added code to support field labels in the intensity plot to understand the ambiguity better.

image image
  • If for any single field, the threshold turns out to be completely below the bubble values in that field (despite having a marked bubble) then we've probably set a wrong threshold (reduce confidence metric)

  • If the field labels are in close vicinity of the threshold(zoomed image), we need to ensure that local thresholding is handling those cases (field wise graphs)

  • For roll_5 - global threshold is really close, but just enough to distinguish

image
  • For q52 - empty field - global threshold and local threshold align
image
  • For q72 - empty field - global threshold and local threshold don't align, local threshold distinguishes due to low MIN_JUMP (false positive)
image

@Udayraj123
Copy link
Owner Author

Udayraj123 commented Mar 2, 2024

Turns out the confidence metric to show local vs global threshold disparity is already showing results!

In this scan from community samples we see an ambiguous bubble mark(see Q.131):
image

It was found when looking at the confidence metrics output:
image

Such bubbles may require human intervention or better tuning to avoid uniform output across images

Master branch New output
image image

We've decided to let user's intention to mark be considered thus the bubble will now be marked even if it is not fully filled.

Note: Still if your images contain bad quality prints, where the printed characters('B' in above case) are non-uniformly thick/bold, they may get detected as marked bubbles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants