Duplicate Finder tool update! #21

CuddleBear92 · 2021-10-18T17:55:11Z

There is a few things that would improve the current duplicate tool for the power user, giving you more info at a glance and allowing you to make a more informed decision.

Cache duplicate processing data locally and update on new runs.
Mark a PAIR of comics as not Duplicates, this should be per Comic ID in Lanraragi, allowing it to skip it for future searches, like a blacklist. An overwrite might be wanted for this for corrections of user errors? This should be on a per pair basis
A Compare comic issue view that fills the window with both comics of the pair, taking up half the screen each with thumbnails of all pages of each. This allows you to more easily at a glance. Much like the detailed page of a comic that displays all metadata and all thumbnails. Just loaded side by side in the tab itself. Overlay would be wanted but could be done with its own tab type: This could be done a bit easier by having a split view tab type that is general for all content, and allowing the dupe tool to make a new such tab and prefill the comic issues in question.
Listing image sizes of either original cover art or all images if possible.
Display Cached info in a comparison manner between the two comic issues. A is higher res than B displaying the common res and in green for example. same with other data like amount of tags and pages. Here is an example of such a thing from Hydrus with Import date, file size, image res and jpeg compression setting.

Guerra24 · 2021-11-22T08:28:53Z

Prototype UI 1. The actions are missing, I need to figure out where to put them.

EDIT: Also might add both resolution and format of each page into the thumbnail overlay just like the page number. The format(s) will be listed below resolution in the same style as the tags.

2021-11-22_02-19-32.mp4

AbyssalMonkey · 2021-11-27T23:51:45Z

Would be nice to have a namespace for which folder the archives are found in. Would make quicker work for certain archives that are duplicates from an authoritative source. Unclear if that is what "Source" is in demonstration.

Guerra24 · 2021-11-28T03:46:42Z

Would be nice to have a namespace for which folder the archives are found in

The folder where it is stored in the server? Unfortunately I can't pull that. There is a script that converts folders to categories but those are not used in this case.

Unclear if that is what "Source" is in demonstration.

It is the source from where it came from i.e. the url, at least in my case.

The tags section is the same as the tags shown in the archive tab.

Guerra24 · 2021-12-21T21:22:15Z

Alright everything but the improved caching was implemented. The final design is like the video above but with delete and mark as non duplicate buttons, also pages show their resolution, extension and number.

AbyssalMonkey · 2021-12-22T09:31:51Z

Failing to pull tags properly.

AbyssalMonkey · 2021-12-25T03:29:22Z

After using this extensively to remove duplicates from an automatic feed for a while, a few usability things to make sorting duplicates faster, in order of probable ability to do:

The ability to open an archive from this menu, and/or change the image scaling options by having more/less images per row. It can be difficult to recognize errors in labeling when two galleries are duplicates, but are in different languages (one is translated the other not), or if the names of the galleries are competing with one another (one is labelled correctly, but if you delete it, you now have a mislabeled gallery to go hunt down).
The caching mentioned above, would be useful. Color coding for quick parsing would be preferential, maybe something like this:

In the above image, aligning tag fields would also be an improvement.

The ability to decide what to do before an archive completely loads. You can't even back out to the tool while they are still loading. This is somewhat of a problem when dealing with hundred page archives, or archives with large file sizes. A tangential improvement to this would be to try and load the first N pages such that you can make your decision before the rest of the archive finishes loading. Unsure how this works or if this is possible.
This follows with the above two: if an image fails to load, it does not count towards the resolution. This means that if you have a 16 page 1360 x 1920 archive and one image fails to load, it will instead register as 1275 x 1800. Flag that an image failed to load, and discount it from the current resolution calculation. When prompted to right click reload, even after backing out of the current pairing, since that data is now cached, it will always register as a 0x0 image. Right click reload should reupdate the cache.
The ability to merge/replace tags in bulk from one archive into another. If dupes show up, and my x1920 archive is tagged, and my x3200 is not, the ability to copy the tags, and then delete the x1920 would be handy.

Above here are what are probably easy fix implementations.
Below here are probably more in depth harder to solve problems.

Marking differences in the naming of galleries using transposition. This would make noticing errors in naming and mismatched volume numbers quicker.
Option for when an archive has been matched more than X times, spend extra time opening up the archive and doing a second diff check on the second page. Initially maybe just quarantine these dupes out of the main queue. Many of the colored-cover multi-volume/chapter archives use extremely similar covers, with only a single number changed on the image (ie 4 -> 5). This theoretically could be solved with a higher image difference number, but that has it's own penalties. With high volume counts, this means users have to tediously click "Not duplicates" on potentially up to N nCr 2 combinations for just one series. (5 volumes is 10 duplicates, 7 is 21).

Overall, it's been a fantastic tool to use, and I've already been able to remove hundreds of automatically downloaded duplicates which would have taken multitudes longer to sort out over time. Keep up the good work,.

Guerra24 added the enhancement New feature or request label Oct 18, 2021

Guerra24 mentioned this issue Oct 20, 2021

Create Russian translation #22

Merged

Guerra24 pinned this issue Oct 28, 2021

Guerra24 mentioned this issue Nov 7, 2023

[api] file name/path and size in archive metadata Difegue/LANraragi#906

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate Finder tool update! #21

Duplicate Finder tool update! #21

CuddleBear92 commented Oct 18, 2021 •

edited

Guerra24 commented Nov 22, 2021 •

edited

AbyssalMonkey commented Nov 27, 2021

Guerra24 commented Nov 28, 2021 •

edited

Guerra24 commented Dec 21, 2021

AbyssalMonkey commented Dec 22, 2021

AbyssalMonkey commented Dec 25, 2021 •

edited

Duplicate Finder tool update! #21

Duplicate Finder tool update! #21

Comments

CuddleBear92 commented Oct 18, 2021 • edited

Guerra24 commented Nov 22, 2021 • edited

AbyssalMonkey commented Nov 27, 2021

Guerra24 commented Nov 28, 2021 • edited

Guerra24 commented Dec 21, 2021

AbyssalMonkey commented Dec 22, 2021

AbyssalMonkey commented Dec 25, 2021 • edited

CuddleBear92 commented Oct 18, 2021 •

edited

Guerra24 commented Nov 22, 2021 •

edited

Guerra24 commented Nov 28, 2021 •

edited

AbyssalMonkey commented Dec 25, 2021 •

edited