-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing matches on announce where announce name is different #606
Comments
There's a threshold and distance variable we use for reverse lookups, and depending on the differences this will either match or not. Given the nature of potential releases being named differently, if we decide to loosen the restrictions to allow MORE (not all) of releases like this to match, it also allows for releases with the same amount of changes to match causing erroneous snatching of torrents and potential mismatches. It's a pretty difficult thing to narrow down to effective but not too loose. I have plans to look into tightening (not loosening) the reverse lookup matching, however it may be possible to integrate some sort of parsing logic to match in situations like you describe, where we look up group, season/ep, and title, and match accordingly. It's something I'm personally aware of, but haven't really worked much on. |
What if a library similar to 'guessit' were used to parse/sterilize filename to improve matching? https://guessit.readthedocs.io/en/latest/ GGBot uses guessit with excellent results when checking for duplicates. Obviously this is a python package, the method could probably be reverse engineered. |
the problem is trackers essentially obfuscate the real torrent name. for essentially no reason. this isn't an issue generally with searches, but reverse lookups from rss and announce. |
Some unsolicited advise: |
Our fuzzy matching is loose enough that the separators used are not an issue in almost any case that would occur regularly. Generally what we see is groups removing episode titles or something else significant with RSS or IRC announcements. |
I have a few examples here to show that tracker announces does not contain filename, at all, and therefore, will not match: Actual filename from open trackers/release group's RSS feed: [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv Although, cross-seed later matched those via RSS feed. I assume this is because cross-seed grabbed the torrent via the download link included in RSS feed and checked the filename/hash, can we do the same for announces? (torrent download link is already included in the payload anyway) |
Snatching every torrent that is sent via announce is not really something we would want to do. Snatches are prefaced with quite a bit of filtering and verification, because most trackers consider snatching torrent files without downloading/seeding to be not appreciated. |
Maybe have an option to run a search using the name from announces? Relevant log entries to compare between announces and RSS match: Announces:
Search results:
EDIT: Tracker2 announces 4 times in a row for the same torrent since anime has Japanese names, romanization of Japanese names, English names, and another with filename in torrent name |
If you want to search, use the webhook instead of the announce endpoint. |
My point is, search could find the release, but announce/RSS never matched them. What I'm doing is put a |
You can schedule searches to run however often. As I said, snatching every torrent given to the announce endpoint is not going to happen. https://www.cross-seed.org/docs/basics/options#searchcadence |
Furthermore if your torrents don't match the torrent name or file name, then your tracker is changing them, and this issue should be discussed with the tracker. |
I believe I figured this one out, at least with I have been testing it with AB and AnT for anime content (these 2 sites have torrent name almost totally different from public trackers as they have their own naming scheme), working pretty well so far. You will need to adjust the webhook payload from
|
Tracker A announces as: TV.Show.S01E01.Episode.Name.1080p.AMZN.WEB-DL.DDP5.1.H.264-NTb
It will match with trackers via search on complete downloads or irc announce who use the same name announce method.
However some sites do: Showname S01E01 1080p AMZN WEB-DL DD+ 5.1 H.264-NTb, it wont match, so far i tried different settings between true/false. The actual filename is of course the same on one site the .torrent name the way it supposed to be, but IRC announce name is not. Anyway to improve it?
The text was updated successfully, but these errors were encountered: