Add Gaussian Mixture based adaptive threshold #1051

yujiepan-work · 2023-04-26T17:37:09Z

Description

This PR implements a new adaptive threshold based on Gaussian Mixture Model (GMM).

The problems of existing AnomalyScoreThreshold and our corresponding solutions:

AnomalyScoreThreshold can only propose limited number of candidate thresholds. For example, when the validation scores are [1,2,3] for normal and [4,5,6] for anomalous, it can only propose 1,2,3,4,5,6 as thresholds. However, it is intuitive to say that 3.5 might be a better choice. To solve this, we use GMM to estimate the full distribution of normal/abnormal scores. As such, we can calculate f1 at any threshold t by cumulative distribution function (CDF), and then the optimal threshold can be found:
FP = (1 - CDF_normal(t) ) * normal_rate
TP = (1 - CDF_anomalous(t) ) * anomalous_rate
FN = CDF_anomalous(t) * anomalous_rate
f1_scores = (TP * 2) / (TP * 2 + FP + FN)
AnomalyScoreThreshold cannot handle real anomalous rate. We might want to generate more anomalous data to help estimate the anomalous score distribution, even though in real cases the anomalous rate cannot be that high. As shown in the above formula, anomalous_rate can affect the f1 score. To simplify this, we allow users to decide an anticipated "anomalous rate" when initializing GMMthreshold. Then, regardless of how many anomalous data are generated, it will decide the optimal threshold with pre-defined "anomalous rate".

In summary, this new method computes threshold by:

step1: Estimate distribution of normal and anomalous scores with GMM
step2: Calculate f1 under different threshold candidates. Currently, we uniformly sweep 100K values between min score and max score. This is very fast (less than 1 sec) since CDF of GMM is easy to calculate.
step3: Select the candidate threshold with optimal f1.

Fixes [Task]: Enabling Unsupervised Workflow: Propose a new thresholding approach #1027

Notes

This threshold method can work on both real validation set and synthetic data.
We have also planned other distribution estimators. In fact we have finished a draft for KDE (sklearn.neighbous.KernelDensity) internally instead of GMM, but it is slower than GMM. We can discuss if other density estimators are helpful.
The creation of AnomalyScoreThreshold needs to support other threshold methods. I notice other PR dealing with that. For demo purpose, I have implemented a workaround in this PR.
Documentation and yaml config comments are not updated. We can do that later if this feature is accepted.

Changes

Bug fix (non-breaking change which fixes an issue)
Refactor (non-breaking change which refactors the code base)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist

My code follows the pre-commit style and check guidelines of this project.
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing tests pass locally with my changes
I have added a summary of my changes to the CHANGELOG (not for minor changes, docs and tests).

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

yujiepan-work · 2023-04-27T17:04:53Z

Hi all, this PR by anomalib_Team3 implements another threshold estimator, which serves as a solution to #1027

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

yujiepan-work · 2023-05-02T11:07:33Z

mvtec result comparison

Image f1 when using "same_as_test" for validation

"Adaptive threshold" can be regarded as the "upper bound", and we see GMM is close to that.

	model	avg	bottle	cable	capsule	carpet	grid	hazelnut	leather	metal_nut	pill	screw	tile	toothbrush	transistor	wood	zipper
1	cflow+adaptive_threshold	0.931896	1	0.8125	0.968037	0.949721	0.905983	0.971014	0.994536	0.994652	0.923077	0.872181	0.994083	0.9375	0.712871	0.983607	0.958678
2	cflow+gmm_threshold	0.922115	1	0.792453	0.955357	0.934066	0.885246	0.971014	0.98913	0.994595	0.915584	0.856031	0.988095	0.920635	0.7	0.97479	0.954732
3	patchcore+adaptive_threshold	0.984694	1	0.967742	0.977169	0.977012	0.973451	1	1	0.994595	0.960289	0.961702	0.987952	1	1	0.983051	0.987448
4	patchcore+gmm_threshold	0.981974	1	0.962567	0.972727	0.971429	0.964912	1	1	0.989247	0.956834	0.961702	0.987952	1	1	0.97479	0.987448

Image f1 when using "synthetic" data for validation

Uses random Perlin noise masks already in Anomalib.
GMM threshold can result in better f1 score than existing adaptive threshold.

	model	avg	bottle	cable	capsule	carpet	grid	hazelnut	leather	metal_nut	pill	screw	tile	toothbrush	transistor	wood	zipper
1	cflow+adaptive_threshold+synthetic	0.818935	0.984375	0.758333	0.552632	0.864078	0.844444	0.934307	0.915423	0.872727	0.691244	0.684211	0.848485	0.8	0.742857	0.909091	0.881818
2	cflow+gmm_threshold+synthetic	0.820402	0.984375	0.760331	0.616352	0.858537	0.844444	0.934307	0.929293	0.872727	0.691244	0.606742	0.844221	0.847458	0.783505	0.888889	0.843602
3	patchcore+adaptive_threshold+synthetic	0.868119	1	0.760331	0.84375	0.864078	0.850746	1	0.880383	0.978261	0.864	0.734043	0.848485	0.723404	0.824742	0.882353	0.967213
4	patchcore+gmm_threshold+synthetic	0.898764	0.992126	0.760331	0.927536	0.864078	0.850746	1	0.884615	0.978261	0.864	0.858491	0.835821	0.888889	0.879121	0.930233	0.967213

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

github-actions bot added the Tests label Apr 26, 2023

yujiepan-work changed the title ~~GMM estimator based adaptive threshold~~ Gaussian Mixture estimator based adaptive threshold Apr 26, 2023

samet-akcay added Hackathon and removed Tests labels Apr 27, 2023

yujiepan-work force-pushed the adaptive-threshold branch from 6c3b47d to d75524f Compare April 27, 2023 16:16

github-actions bot added the Tests label Apr 27, 2023

yujiepan-work marked this pull request as ready for review April 27, 2023 16:49

yujiepan-work requested review from samet-akcay, ashwinvaidya17 and djdameln as code owners April 27, 2023 16:49

yujiepan-work force-pushed the adaptive-threshold branch 2 times, most recently from 16e5732 to d75524f Compare April 27, 2023 17:01

yujiepan-work added 5 commits April 28, 2023 01:02

initial implementation of GMM threshold estimator with some to-dos

ada218d

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

code style fix

99e77ff

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

update default value in cli

f601731

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

add gmm params for get_callbacks

b4c2df8

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

fix threshold creation, doc, test

3d93965

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

yujiepan-work force-pushed the adaptive-threshold branch from d75524f to 3d93965 Compare April 27, 2023 17:02

support cflow

f3847c0

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

yujiepan-work changed the title ~~Gaussian Mixture estimator based adaptive threshold~~ Gaussian Mixture based adaptive threshold Apr 27, 2023

yujiepan-work changed the title ~~Gaussian Mixture based adaptive threshold~~ Add Gaussian Mixture based adaptive threshold Apr 27, 2023

samet-akcay added the T3 label Apr 28, 2023

add support of gmm threshold for all models

f5020d1

Signed-off-by: Pan, Yujie <yujie.pan@intel.com>

yujiepan-work requested a review from nahuja-intel as a code owner May 2, 2023 11:26

samet-akcay removed the Feature label Aug 4, 2023

samet-akcay removed the Hackathon label Jan 2, 2024

samet-akcay removed the T3 label Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gaussian Mixture based adaptive threshold #1051

Add Gaussian Mixture based adaptive threshold #1051

yujiepan-work commented Apr 26, 2023 •

edited

yujiepan-work commented Apr 27, 2023

yujiepan-work commented May 2, 2023 •

edited

Add Gaussian Mixture based adaptive threshold #1051

Are you sure you want to change the base?

Add Gaussian Mixture based adaptive threshold #1051

Conversation

yujiepan-work commented Apr 26, 2023 • edited

Description

Notes

Changes

Checklist

yujiepan-work commented Apr 27, 2023

yujiepan-work commented May 2, 2023 • edited

mvtec result comparison

Image f1 when using "same_as_test" for validation

Image f1 when using "synthetic" data for validation

yujiepan-work commented Apr 26, 2023 •

edited

yujiepan-work commented May 2, 2023 •

edited