Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing matches on announce where announce name is different #606

Open
skifavp opened this issue Mar 4, 2024 · 13 comments
Open

Missing matches on announce where announce name is different #606

skifavp opened this issue Mar 4, 2024 · 13 comments
Labels
enhancement enhances existing features search Related to search

Comments

@skifavp
Copy link

skifavp commented Mar 4, 2024

Tracker A announces as: TV.Show.S01E01.Episode.Name.1080p.AMZN.WEB-DL.DDP5.1.H.264-NTb
It will match with trackers via search on complete downloads or irc announce who use the same name announce method.
However some sites do: Showname S01E01 1080p AMZN WEB-DL DD+ 5.1 H.264-NTb, it wont match, so far i tried different settings between true/false. The actual filename is of course the same on one site the .torrent name the way it supposed to be, but IRC announce name is not. Anyway to improve it?

@zakkarry
Copy link
Collaborator

zakkarry commented Mar 7, 2024

There's a threshold and distance variable we use for reverse lookups, and depending on the differences this will either match or not. Given the nature of potential releases being named differently, if we decide to loosen the restrictions to allow MORE (not all) of releases like this to match, it also allows for releases with the same amount of changes to match causing erroneous snatching of torrents and potential mismatches.

It's a pretty difficult thing to narrow down to effective but not too loose. I have plans to look into tightening (not loosening) the reverse lookup matching, however it may be possible to integrate some sort of parsing logic to match in situations like you describe, where we look up group, season/ep, and title, and match accordingly.

It's something I'm personally aware of, but haven't really worked much on.

@zakkarry zakkarry added search Related to search enhancement enhances existing features labels Mar 7, 2024
@zakkarry zakkarry changed the title Miss match on torrents with slightly different release names. The actual filename/.torrent file name is the same. Missing matches on announce where announce name is different Mar 8, 2024
@MaddyTP
Copy link

MaddyTP commented Mar 30, 2024

What if a library similar to 'guessit' were used to parse/sterilize filename to improve matching?

https://guessit.readthedocs.io/en/latest/

GGBot uses guessit with excellent results when checking for duplicates. Obviously this is a python package, the method could probably be reverse engineered.

@zakkarry
Copy link
Collaborator

the problem is trackers essentially obfuscate the real torrent name. for essentially no reason.

this isn't an issue generally with searches, but reverse lookups from rss and announce.

@ninboy
Copy link

ninboy commented Apr 25, 2024

Some unsolicited advise:
Maybe an option for fuzzy search that ignores special characters like , ., +, &, - and _ , so it always compares a "sanitized" name. That would increase also matches against trackers that replace DD+5.1 for DD5.1.
Special cases could be the + which is sometimes replaced by P (DDP5.1), or the & which sometimes is replaced by and

@zakkarry
Copy link
Collaborator

Our fuzzy matching is loose enough that the separators used are not an issue in almost any case that would occur regularly.

Generally what we see is groups removing episode titles or something else significant with RSS or IRC announcements.

@ppkhoa
Copy link

ppkhoa commented May 21, 2024

I have a few examples here to show that tracker announces does not contain filename, at all, and therefore, will not match:

Actual filename from open trackers/release group's RSS feed: [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv
1st tracker announce: Boukyaku Battery - 07 (2024) [SubsPlease] [WEBRip] [HD 1080p]
2nd tracker announce: [SubsPlease] Boukyaku Battery - 07 [Web][MKV][h264][1080p][AAC 2.0][Softsubs (SubsPlease)][Episode 7]

Although, cross-seed later matched those via RSS feed. I assume this is because cross-seed grabbed the torrent via the download link included in RSS feed and checked the filename/hash, can we do the same for announces? (torrent download link is already included in the payload anyway)

@zakkarry
Copy link
Collaborator

Snatching every torrent that is sent via announce is not really something we would want to do.

Snatches are prefaced with quite a bit of filtering and verification, because most trackers consider snatching torrent files without downloading/seeding to be not appreciated.

@ppkhoa
Copy link

ppkhoa commented May 21, 2024

Maybe have an option to run a search using the name from announces?

Relevant log entries to compare between announces and RSS match:

Announces:

2024-05-21 16:11:14 verbose: [server] POST /api/announce
2024-05-21 16:11:14 verbose: [server] Received announce from Tracker2: [SubsPlease] Oblivion Battery - 07  [2024][Web][MKV][h264][1080p][AAC 2.0][Softsubs (SubsPlease)]
2024-05-21 16:11:16 verbose: [decide] [SubsPlease] Boukyaku Battery - 02 (1080p) [E954FB4E].mkv - no match for Tracker2 torrent [SubsPlease] Oblivion Battery - 07  [2024][Web][MKV][h264][1080p][AAC 2.0][Softsubs (SubsPlease)] - its size does not match - (NaN bytes)  <---------- Does not match the correct name, wrong episode

Search results:

2024-05-21 16:19:11 info: [torznab] Searching 10 indexers for [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv
[a bunch of RSS feeds URLs]
2024-05-21 16:20:08 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:08 verbose: [rtorrent] Calling method load.start with params [ '',
  '/home/ppkhoa/watch/[SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv.tmp.1716308408900.torrent',
  'd.directory_base.set="/home/ppkhoa/cross-seed/xseed/Tracker2"',
  'd.custom1.set="cross-seed"',
  'd.custom.set=addtime,1716308409' ]
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker2 by MATCH - injected
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker2 by MATCH - exists
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker2 by MATCH - exists
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker2 by MATCH - exists
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 verbose: [rtorrent] Calling method load.start with params [ '',
  '/home/ppkhoa/watch/[SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv.tmp.1716308409651.torrent',
  'd.directory_base.set="/home/ppkhoa/cross-seed/xseed/Tracker1"',
  'd.custom1.set="cross-seed"',
  'd.custom.set=addtime,1716308410' ]
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker1 by MATCH - injected
2024-05-21 16:20:09 info: [server] Found 5 torrents for {
  path: '/home/ppkhoa/files/[SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv'
}

EDIT: Tracker2 announces 4 times in a row for the same torrent since anime has Japanese names, romanization of Japanese names, English names, and another with filename in torrent name

@zakkarry
Copy link
Collaborator

If you want to search, use the webhook instead of the announce endpoint.

@ppkhoa
Copy link

ppkhoa commented May 21, 2024

My point is, search could find the release, but announce/RSS never matched them.

What I'm doing is put a sleep 120 in the curl webhook script (that rtorrent calls when download is finished) to search with name/infoHash from rtorrent so other trackers have some time to have it available for searching. If they are slower than that, either I have to manually get the torrent, or run the search manually (search via webhook with name manually). Otherwise, the release will never get found and cross-seeded.

@zakkarry
Copy link
Collaborator

zakkarry commented May 21, 2024

You can schedule searches to run however often.

As I said, snatching every torrent given to the announce endpoint is not going to happen.

https://www.cross-seed.org/docs/basics/options#searchcadence

@zakkarry
Copy link
Collaborator

Furthermore if your torrents don't match the torrent name or file name, then your tracker is changing them, and this issue should be discussed with the tracker.

@ppkhoa
Copy link

ppkhoa commented May 28, 2024

I believe I figured this one out, at least with autobrr. I came across this issue over autobrr repo: autobrr/autobrr#1197, and according to the response there, you can set Max size (I set mine to 50GiB as most media falls into that range) for the filter and autobrr will use trackers' API to get the file size (even though IRC announce does not contain filesize info). If your cross-seed config has matchMode set to risky, it will check the filesize and check/match accordingly, even though the torrent name is different.

I have been testing it with AB and AnT for anime content (these 2 sites have torrent name almost totally different from public trackers as they have their own naming scheme), working pretty well so far.

You will need to adjust the webhook payload from autobrr to cross-seed /api/announce a little to include size information. Just add size from the example in autobrr documentation, i.e.:

{
  "name": "{{ .TorrentName }}",
  "guid": "{{ .TorrentUrl }}",
  "link": "{{ .TorrentUrl }}",
  "size": "{{.Size}}",
  "tracker": "{{ .IndexerName | js}}"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhances existing features search Related to search
Projects
None yet
Development

No branches or pull requests

5 participants