Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to add torrents with different hashes paused #570

Open
GordonFreemanK opened this issue Dec 16, 2023 · 21 comments
Open

Option to add torrents with different hashes paused #570

GordonFreemanK opened this issue Dec 16, 2023 · 21 comments
Labels
bt client integration Integration (API) with/for applications

Comments

@GordonFreemanK
Copy link

GordonFreemanK commented Dec 16, 2023

Hi!
Great tool, thanks! I have a slight issue with how it queues torrents in inject mode though, at least on Transmission.

If a torrent has the same file paths than another one but the hashes differ, on adding the new torrent, cross-seed should:

  • trigger a hash verification (already does that, which is great).
  • but add them paused.

The reason is that if the hashes genuinely differ on two different trackers, then adding the new one unpaused will make the client always overwrite the data for the old one as soon as verification completes (provided it has seeds for it on the new tracker). This in turn will make the data invalid for the pre-existing tracker...

Of course it's not great that different trackers have different hashes for the same torrent, but the choice of which tracker we should keep the data for should fall on the user - so in this case the torrents should be added paused.

My setup:

  • Transmission 4.0.4
  • All hosted on docker
  • config:
module.exports = {
    delay: 10,
    torznab: [
        "http://xxx:9696/yyy/api?apikey=zzz",
        // Some other ones here
    ],
    dataDirs: [],
    matchMode: "safe",
    dataCategory: undefined,
    linkDir: "/downloads",
    linkType: "hardlink",
    skipRecheck: false,
    maxDataDepth: 2,
    torrentDir: "/torrents",
    outputDir: "/cross-seeds",
    includeEpisodes: true,
    includeSingleEpisodes: false,
    includeNonVideos: true,
    fuzzySizeThreshold: 0.02,
    excludeOlder: undefined,
    excludeRecentSearch: "1y",
    action: "inject",
    rtorrentRpcUrl: undefined,
    qbittorrentUrl: undefined,
    transmissionRpcUrl: "http://aaa:bbb@ccc:9091/transmission/rpc",
    delugeRpcUrl: undefined,
    duplicateCategories: false,
    notificationWebhookUrl: undefined,
    port: 2468,
    host: undefined,
    apiAuth: true,
    rssCadence: "30min",
    searchCadence: "1 day",
    snatchTimeout: undefined,
    searchTimeout: undefined,
    searchLimit: 1000,
};

The workaround is to use the save action mode while making sure start_added_torrents is false in the Transmission config.

In terms of implementation I assume it would be as simple as using the paused property of the torrent-add RPC call to Transmission

@zakkarry
Copy link
Collaborator

It doesn't appear from the spec that Transmission allows for adding torrents WITHOUT rechecking via the RPC? Is this an option when you manually add a torrent? Likely, this implementation wasn't made this way due to it not being available.

https://github.com/transmission/transmission/blob/main/docs/rpc-spec.md#34-adding-a-torrent

If a torrent is added in a paused state, does it still check and just not download?

@zakkarry zakkarry added bt client integration Integration (API) with/for applications labels Dec 16, 2023
@GordonFreemanK
Copy link
Author

GordonFreemanK commented Dec 16, 2023

Thanks for your quick reply!
I just had a play with the RPC calls myself using a torrent with a single file that I know has different hashes on different trackers (the one on which I identified this issue), and one I know has the same hashes.

The verification is indeed automatic: Transmission only does the verification where the hashes don't match.
It then either starts or not the torrent once the verification is complete, depending on the value that is set to the paused parameter in the add-torrent request.
For the torrent where the hashes do match, it just adds it with a completed status, paused or unpaused depending on the parameter in the request.

If I don't add the parameter to the request, the verification behaviour is the same, but for the paused/unpaused behaviour it falls back to the behaviour defined by the start_added_torrents setting in the Transmission config.

Solution 1

Just remove the paused property on the RPC call.
This means the behaviour is defined by the Transmission configuration.

Pro: easiest
Cons:

  • users need to know to change their Transmission config
  • the transmission configuration impacts the watch folder (people might be using that for other workflows) and other RPC calls where the paused property isn't defined (I suspect *arr apps explicitly define paused = false so I think those aren't going to be impacted).

Solution 2

Add a configuration parameter to specify that property value, where the value can be:

  • true/false to explicitly set the value in the RPC call
  • undefined to keep the behaviour unchanged (so it doesn't break the behaviour for people upgrading xseed), which is paused = false
  • null to not set the RPC property (and fall back to the Transmission config). Not quite sure this has a lot of value though to be fair.

Con: setting this to true, all downloads are paused and the user needs to do maintenance on their Transmission

Solution 3

Gold plated solution

  • Add the torrents paused
  • Get the torrent immediately to get the status (e.g. 2-verifying or 0-stopped)
  • If the status is 0-stopped, start the torrent

Pro: best automation
Con: more involved

@zakkarry
Copy link
Collaborator

I don't think 2nd solution is going to happen, solution 3 seems best, however even if we get the torrent immediately we could miss the verifying status.

Transmission....ugh.

@zakkarry
Copy link
Collaborator

Has there been feature requests for this upstream (Transmission to skip rechecking on adding) ?

@GordonFreemanK
Copy link
Author

GordonFreemanK commented Dec 16, 2023

I wouldn't want it to skip rechecking, because the content of the file is fundamentally different.

I don't know if I've explained the negative impact of this well enough:
Imagine I have a file called My.Precious.File.mkv. Let's say I have two trackers TrackerA and TrackerB that refer to the same file name and size in their torrents, BUT the file content they refer to is slightly different. So the hashes for the two files are different, because the file content is different. Most of the time when I see this, the file is only different by like 0.1%. Maybe the first piece out of 10000 is different, but the rest is fine.

Now imagine I have the torrent already added to Transmission on TrackerA, because I got it straight from there. And it's seeding. And for good measure, I'm the last seeder on that torrent: no one is ever going to have 100% of that content on that tracker unless they get it from me.

Now xseed finds the file on TrackerB, and adds it. Transmission then sees that the hashes (and therefore the content) are different, so it triggers a full verification on the TrackerB torrent that has just been added. It goes all the way to 99.9%, until it finds the 0.1% that's different. If you have added it unpaused, and there are seeds, then Transmission happily downloads that different piece, and directly overwrites the content on the file for that piece.

Now that piece is invalid on TrackerA: if I trigger a verification on TrackerA, or if a peer tries to get that specific piece from me, then Transmission is going to realise that the piece is wrong, and it will set it to 99.9% complete. And as I was just saying, I was the last seeder on TrackerA: so now the torrent is incomplete on TrackerA, and can never be complete again. It's essentially dead.

Notes:

  • This isn't a client-specific issue. It's an issue that requires user intervention, because the two torrents on the two trackers are different and that's not going to be solved unless one of the trackers trumps and replaces theirs. So the user needs to make a choice at that point: seeding on TrackerA only, seeding on TrackerB only, or try to get the torrent trumped on either of the trackers.
  • You might think this doesn't happen a lot, but it does. A lot. This happens because usenet groups tend to automatically grab movie releases from the big trackers e.g. TrackerA, then ever so slightly changes the data before they upload them to the newsgroup indexers. It still works, but something in the file metadata was changed. Some good soul then downloads the release from usenet, and uploads it to TrackerB (possibly because they don't have access to TrackerA). I can give you literally dozen of examples of such torrents.

The worst about this is, when this happens, that issue might reveal itself way way later: when a peer tries to get that specific piece again from me, which might not happen for years. At that point, I'll most likely have no backup of the original data. So injecting torrents that we know are wrong is ever so slightly corrupting the swarm data in a way that might never be recoverable.

As I was just saying, all I have to do to solve it to myself is to disable the injection, fall back to the watch folder and make sure Transmission doesn't add torrents started. But I think xseed needs to change its RPC call so it doesn't inject torrents without the matching data, and certainly it should never skip recheck on purpose to avoid facing the issue.

Edit: if anything, I prefer to always recheck unless we're perfectly sure the data is the same (i.e. if there's a pre-existing torrent for a different tracker with the exact same hashes).

@mmgoodnow
Copy link
Collaborator

it should never skip recheck on purpose to avoid facing the issue

I disagree - in the other clients this is really the best option.

  • New torrent thinks it's done, it starts without user intervention ✅
  • In the event that the data doesn't match, it stops without overwriting any data ✅

I'm interested in solution 1 - does this fall back to some global setting in Transmission that people can set on their own?

@GordonFreemanK
Copy link
Author

GordonFreemanK commented Dec 16, 2023

In the event that the data doesn't match, it stops without overwriting any data ✅

At least not overwriting data sounds like a better strategy, to be honest! But if the torrent is added 100% complete from the beginning, the client won't know that the torrent data is faulty until it either verifies it or tries to upload it to a peer. One way or another, the issue could be latent for a long time, until someone tries to get the data from the client.

I'm interested in solution 1 - does this fall back to some global setting in Transmission that people can set on their own?

That's correct, see start-added-torrents here. This setting impacts the default behaviour when adding torrents, both via RPC and via the watch folder. This default behaviour can be overridden in individual RPC calls by using the property that xseed uses.

Solution 1 would be a good start for people who use Transmission, as long as they know to set the configuration value. Clients like radarr also set the property in their RPC calls here so changing the default in configuration doesn't break those apps.

@zakkarry
Copy link
Collaborator

This is actually a specific issue for transmission, every other client supports starting without checking, and error's when fault is found in a torrent rather than redownloading automatically. The behavior we use for every other client prevents overwriting.

While there is probably a way to handle this, it seems odd that Transmission would take this stance, as I previously said, have you searched their github for issues/requests for this to be enabled/changed?

@mmgoodnow
Copy link
Collaborator

Take a look at transmission/transmission#2626:

This PR lets newly-added torrents skip the full verify step if the torrent fulfills all these conditions:

  • The torrent has metadata (i.e. is not a magnet link)
  • All files exist
  • All files are not partial (.part) files
  • All files' sizes match the sizes listed in the .torrent file
  • All files' mtimes (timestamp of when they were last modified) are older than when the torrent was added
  • The first piece passes the checksum test

When this is true, the initial verify step is bypassed and the torrent is treated like a full seed. Individual pieces are lazy-checked on demand when peers request blocks from them.

I'm curious about the exact behavior you're experiencing here—which of these bullet points is being violated? Or is it coming from the

Individual pieces are lazy-checked on demand when peers request blocks from them

part?

@GordonFreemanK
Copy link
Author

I've seen that PR before too
I would assume it's the last bullet point that fails and triggers the verification

The first piece passes the checksum test

I don't mind Transmission verifying the data, since it's actually correctly detecting that the data is wrong. I do have a problem with Transmission overwriting it as soon at it finishes verifying, which it does because the torrent was added unpaused, and indeed sounds like a design issue with Transmission, not xseed - but not something that can easily be worked around.

So, you are right, it would be better if Transmission never overwrote data, and just paused the torrent, even if the torrent is added unpaused. But I can't find an option, or a bug or feature request, to that effect.

@zakkarry
Copy link
Collaborator

Sounds like you could make one and see what they say, since handling that in Transmission is the proper way to do it and not rely on fickle workarounds that could change at the whim of another project (Them)

@GordonFreemanK
Copy link
Author

Alright I opened one. Let's see where this goes!

xseed could still avoid setting the field, and let Transmission configuration deal with it. Even if Transmission gets it fixed, xseed wouldn't really need to set the field either - we could still use the configuration parameter to make Transmission behave as we want.

@mmgoodnow
Copy link
Collaborator

mmgoodnow commented Dec 16, 2023

I would assume it's the last bullet point that fails and triggers the verification

Is this something you can verify with a diff of your files that have this problem?

What would help here on the technical side is if Transmission had a way for us to specify that we expect this torrent to be a seed, not a download. I don't think the behavior should be the same between those two use cases. In the download case, I'd want it to recheck and then start. In the seed case, I'd want it not to recheck, but just stop if it ever realizes it's got incomplete or incorrect data.

@GordonFreemanK
Copy link
Author

GordonFreemanK commented Dec 16, 2023

Is this something you can verify with a diff of your files that have this problem?

I can actually confirm that in transgui missing_piece.

This is after verifying the data in the Tracker B torrent (the same file gets 100% verification successful in Tracker A). You can see the first piece is different (Edit: you can see that from those blue bars, they don't exactly cover 100% of the space), so it will have been caught by Transmission.

I can also see it when manually adding the torrent in transgui: if the file is already there, most torrents will get straight to 100% complete (I have the fast verify option on). This one goes straight to verification.

@GordonFreemanK
Copy link
Author

@mmgoodnow I've received a reply

Main thing is you should always add those torrents paused no matter what. Then, at a later date, you can decide to start a torrent depending on percentDone

Similar to what I was suggesting: would it be possible, as a quick fix, to remove the paused = false property on adding Transmission torrents, so that users can define the behaviour they prefer in the Transmission configuration?

@zakkarry
Copy link
Collaborator

That kind of removes some of the main benefits of cross-seeding the way we do it....

If they don't want to support what is standard in every other client, I don't really know what to say. If nothing else they could easily provide a switch/option to enable functionality. It would be minimal/trivial to conform to the industry standard.

@GordonFreemanK
Copy link
Author

I'm not that familiar with the Transmission architecture but I doubt it would be trivial - the way it's currently designed, it always verifies added torrents, not just torrents added for seeding, so at the very least they'd have to add a new flag to torrent-add and then use that flag to decide if to start the torrent after a verification ends and the data is found not to be complete (or even missing).

You guys made a great tool, and I'll use it either way, even queueing paused. For now that means using the watch folder instead of RPC, which loses the folder awareness that the RPC calls have.

Even if you make the change (Solution 1, the one-liner), it only adds more configurability, it doesn't remove the choice of behaviour for anyone - but it adds choice for me and anyone else who would like to use Transmission RPC without risking data corruption, which is arguably worse than data loss. I know it's not ideal to load torrents paused - but it's the only option really, if the alternative is that Transmission overwrites data it shouldn't.

I don't want to be annoying, and obviously I should be moving to another client just because of that incompatibility - but that would be very hard work. I'm obviously happy to make the PR myself if you need. I'm also aware that I can just fork so I can change that line for myself, but there's overhead to that too in terms of delivery, so I'm trying to avoid doing that.

@mmgoodnow
Copy link
Collaborator

I think there might be a way to support this. I definitely like the idea of solution 3, but we also already have a flag that represents rechecking and adding paused. I don't think we use that flag in the transmission adapter but we could.

@GordonFreemanK if you open a PR that does either of those, I think we could get it through.

@mmgoodnow
Copy link
Collaborator

I personally am not a fan of solution 1 after thinking it over - the configuration definitely belongs on cross-seed's side, not Transmission's side.

@GordonFreemanK
Copy link
Author

Sorry for the delay! I've made the PR above for Solution 3. Left it as Draft because I don't know how to test it. Any tips on how to do that?

@zakkarry
Copy link
Collaborator

Sorry for the delay! I've made the PR above for Solution 3. Left it as Draft because I don't know how to test it. Any tips on how to do that?

If you have npm/node installed, you can just npm run watch from the cross-seed directory, and then node dist/cmd.js daemon will execute the daemon (or change to whatever you want)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bt client integration Integration (API) with/for applications
Projects
None yet
Development

No branches or pull requests

3 participants