Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Finder tool update! #21

Open
2 of 5 tasks
CuddleBear92 opened this issue Oct 18, 2021 · 6 comments
Open
2 of 5 tasks

Duplicate Finder tool update! #21

CuddleBear92 opened this issue Oct 18, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@CuddleBear92
Copy link

CuddleBear92 commented Oct 18, 2021

There is a few things that would improve the current duplicate tool for the power user, giving you more info at a glance and allowing you to make a more informed decision.

  • Cache duplicate processing data locally and update on new runs.

  • Mark a PAIR of comics as not Duplicates, this should be per Comic ID in Lanraragi, allowing it to skip it for future searches, like a blacklist. An overwrite might be wanted for this for corrections of user errors? This should be on a per pair basis

  • A Compare comic issue view that fills the window with both comics of the pair, taking up half the screen each with thumbnails of all pages of each. This allows you to more easily at a glance. Much like the detailed page of a comic that displays all metadata and all thumbnails. Just loaded side by side in the tab itself. Overlay would be wanted but could be done with its own tab type: This could be done a bit easier by having a split view tab type that is general for all content, and allowing the dupe tool to make a new such tab and prefill the comic issues in question.

  • Listing image sizes of either original cover art or all images if possible.

  • Display Cached info in a comparison manner between the two comic issues. A is higher res than B displaying the common res and in green for example. same with other data like amount of tags and pages. Here is an example of such a thing from Hydrus with Import date, file size, image res and jpeg compression setting.
    image

@Guerra24 Guerra24 added the enhancement New feature or request label Oct 18, 2021
@Guerra24 Guerra24 pinned this issue Oct 28, 2021
@Guerra24
Copy link
Owner

Guerra24 commented Nov 22, 2021

Prototype UI 1. The actions are missing, I need to figure out where to put them.

EDIT: Also might add both resolution and format of each page into the thumbnail overlay just like the page number. The format(s) will be listed below resolution in the same style as the tags.

2021-11-22_02-19-32.mp4

@AbyssalMonkey
Copy link

Would be nice to have a namespace for which folder the archives are found in. Would make quicker work for certain archives that are duplicates from an authoritative source. Unclear if that is what "Source" is in demonstration.

@Guerra24
Copy link
Owner

Guerra24 commented Nov 28, 2021

Would be nice to have a namespace for which folder the archives are found in

The folder where it is stored in the server? Unfortunately I can't pull that. There is a script that converts folders to categories but those are not used in this case.

Unclear if that is what "Source" is in demonstration.

It is the source from where it came from i.e. the url, at least in my case.

The tags section is the same as the tags shown in the archive tab.

@Guerra24
Copy link
Owner

Alright everything but the improved caching was implemented. The final design is like the video above but with delete and mark as non duplicate buttons, also pages show their resolution, extension and number.

@AbyssalMonkey
Copy link

Failing to pull tags properly.
image

@AbyssalMonkey
Copy link

AbyssalMonkey commented Dec 25, 2021

After using this extensively to remove duplicates from an automatic feed for a while, a few usability things to make sorting duplicates faster, in order of probable ability to do:

  • The ability to open an archive from this menu, and/or change the image scaling options by having more/less images per row. It can be difficult to recognize errors in labeling when two galleries are duplicates, but are in different languages (one is translated the other not), or if the names of the galleries are competing with one another (one is labelled correctly, but if you delete it, you now have a mislabeled gallery to go hunt down).
  • The caching mentioned above, would be useful. Color coding for quick parsing would be preferential, maybe something like this:
    image

In the above image, aligning tag fields would also be an improvement.

  • The ability to decide what to do before an archive completely loads. You can't even back out to the tool while they are still loading. This is somewhat of a problem when dealing with hundred page archives, or archives with large file sizes. A tangential improvement to this would be to try and load the first N pages such that you can make your decision before the rest of the archive finishes loading. Unsure how this works or if this is possible.
  • This follows with the above two: if an image fails to load, it does not count towards the resolution. This means that if you have a 16 page 1360 x 1920 archive and one image fails to load, it will instead register as 1275 x 1800. Flag that an image failed to load, and discount it from the current resolution calculation. When prompted to right click reload, even after backing out of the current pairing, since that data is now cached, it will always register as a 0x0 image. Right click reload should reupdate the cache.
  • The ability to merge/replace tags in bulk from one archive into another. If dupes show up, and my x1920 archive is tagged, and my x3200 is not, the ability to copy the tags, and then delete the x1920 would be handy.

Above here are what are probably easy fix implementations.
Below here are probably more in depth harder to solve problems.

  • Marking differences in the naming of galleries using transposition. This would make noticing errors in naming and mismatched volume numbers quicker.
    image

  • Option for when an archive has been matched more than X times, spend extra time opening up the archive and doing a second diff check on the second page. Initially maybe just quarantine these dupes out of the main queue. Many of the colored-cover multi-volume/chapter archives use extremely similar covers, with only a single number changed on the image (ie 4 -> 5). This theoretically could be solved with a higher image difference number, but that has it's own penalties. With high volume counts, this means users have to tediously click "Not duplicates" on potentially up to N nCr 2 combinations for just one series. (5 volumes is 10 duplicates, 7 is 21).

Overall, it's been a fantastic tool to use, and I've already been able to remove hundreds of automatically downloaded duplicates which would have taken multitudes longer to sort out over time. Keep up the good work,.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants