Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run summary show number of edges instead of number of images #74

Open
amirmk89 opened this issue Feb 21, 2023 · 2 comments
Open

Run summary show number of edges instead of number of images #74

amirmk89 opened this issue Feb 21, 2023 · 2 comments

Comments

@amirmk89
Copy link
Contributor

amirmk89 commented Feb 21, 2023

Following a fastdup run with a lower threshold, the summary screen lists counts and percentages that are inconsistent with the number of images, and refer to the number of edges. Also, counts and percentages don't align.

2023-02-19 09:54:00 [INFO] Found total 13394 images to run onimated: 0 Minutes 0 Features
2023-02-19 09:54:02 [INFO] 1752) Finished write_index() NN model
2023-02-19 09:54:02 [INFO] Stored nn model index file fastdup_imagenette/nnf.index
2023-02-19 09:54:03 [INFO] Total time took 19716 ms
2023-02-19 09:54:03 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %
2023-02-19 09:54:03 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %
2023-02-19 09:54:03 [INFO] Found a total of 1189 above threshold images (d>0.900), which are 2.96 %
2023-02-19 09:54:03 [INFO] Found a total of 1339 outlier images         (d<0.050), which are 3.33 %

Here, for outliers, 1,339 outliers are ~10% of the data if are all images. if 3.33% are outliers, count should be 442 images.

Thanks!

@dbickson
Copy link
Collaborator

HI @amirmk89 not sure how to fix this, since the percentage is related to edges, since if we run k=3 and k=100 the computation will be different..

@vtyw
Copy link

vtyw commented Apr 11, 2023

It's bizarre seeing counts that are higher than the number of images

2023-04-12 10:39:26 [INFO] Found total 5278 images to run ontimated: 0 Minutes 0 Features
...
2023-04-12 10:39:27 [INFO] Found a total of 8271 above threshold images (d>0.900), which are 52.24 %

What I would expect is that there's a hierarchy of types of similarity, so the images get binned into being fully identical or nearly identical or similar or outlier. If an image is fully identical with any other image then it's classed as fully identical, even if it is also nearly identical or similar to other images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants