Skip to content

Brainstorm Discussion notes

Jakob Borg edited this page Jun 1, 2015 · 4 revisions

This is just random brainstorm/discussion notes which might be relevant in the future.

Public shares and Read/Write access to nodes

 <[0]Jmr> so I was thinking about encryption a bit yesterday, and basically came up with a few ideas which might be beneficial.
<[0]Jmr> first, we could have read/write permissions per device
<[0]Jmr> and getting rid of the master
<[0]Jmr> this way you could have the archiver et al behaviour
<[0]Jmr> everyone is talking about
<[0]Jmr> actually, not even per device, but per device folder combo
<[0]Jmr> this would allow us to use plain AES for encryption but still be able to have read only nodes which have the key, are able to decrypt the data, but if configured correctly, cannot participate in updating the global index
<[0]Jmr> the other thing which came to my mind
<[0]Jmr> given we already have per folder access control sort of (as you have to explicitly share the folder with the given device)
<[0]Jmr> the whole concept of adding devices is a bit redundant, as you can move this form of control a layer up
<[0]Jmr> and just refuse to share anything with a device from the model
<[0]Jmr> so given we accept all devices by default, and just explicitly specify which folders we share with who
<[0]Jmr> we could have a special flag for a folder by marking it public
<[0]Jmr> meaning that it would be provided in ClusterConfig of every node
<[0]Jmr> allowing "public shares"
<[0]Jmr> furthermore, ClusterConfig already has a list of peers which it is sharing the folder with, hence could act as tracker/introducer
<[0]Jmr> making the public share still use p2p for seeding
<calmh> sounds somewhat reasonableish
<calmh> only thing is we are going to deny devices "write" access from the outside, we sort of want the cluster to agree on that
<calmh> i.e. if I'm A and I don't want to allow updates from B
<calmh> then I don't want C to allow updates from B either, because I allow updates from C...
<[0]Jmr> I personally would feel that it's down to the user to make sure he configures that.
<[0]Jmr> Though I agree, for majority of the cases the default behaviour you want is to have a cluster wide consensus
<[0]Jmr> but I am sure someone will some up with some super special scenario where they don't want that to be the case
<calmh> haha for sure
<[0]Jmr> What do you think about the public share bit?
<calmh> it sounds like it could work
<calmh> might need some work ui wise, as adding device + folder + tying them together is annoying just to get access to a public share
<calmh> plus we'll run into the non unique folder name thing, as everyone will call their public folder "public"
<calmh> and it would be nice to connect to a cluster and not a specific device
<calmh> and then we have bittorrent reimplemented :)
<[0]Jmr> well you connect to a device
<[0]Jmr> to figure out the participants in the cluster
<calmh> ah, right, yes
<calmh> a temporary introducer thing or whatever
<[0]Jmr> well at the point you connect
<[0]Jmr> you get ClusterConfig
<[0]Jmr> which has the public shares + devices which have been notified about the share
<[0]Jmr> or rather ones which explicitly tried to get that share off of the device
<[0]Jmr> as it has to be part of ClusterConfig on both sides
<calmh> i don't think it needs to be present on both sides? well, currently it's an error if it isn't, but if you're exposing all public shares by default
<calmh> then the "client" can just connect, get the list of public shares
<calmh> ask the user or auto add them
<calmh> then respond with the appropriate clusterconfig
<calmh> and off it goes
<calmh> or if that's a pain internally, drop the connection and reconnect when the folder has been created
<[0]Jmr> I guess the handshake part would be easier to handle
<[0]Jmr> given you know exactly what you are after
<[0]Jmr> before you connect
<[0]Jmr> this way we don't need to change the current schema
<[0]Jmr> so the way you'd share things would be <Device ID>:<Folder ID>, and then we know that the CM for <Device ID> from leechers side needs to have <Folder ID> present.
<calmh> mmm
<[0]Jmr> and it's there from the seeders side by default, because the share is public.
<calmh> i wonder if it'll scale though
<calmh> i don't think it will
<[0]Jmr> well one device will get overwhelmed
<calmh> with the poor public seeder having to get and keep track of indexes from all the clients
<[0]Jmr> due to it acting as a seeder
<calmh> it could discard them though
<[0]Jmr> it doesn't need to track indexes
<[0]Jmr> why would it?
<calmh> it doesn't need to
<calmh> but they'll send them
<calmh> unless we make sure not to
<[0]Jmr> the thing is
<calmh> but they're needed if we're going to shuffle files between clients directly
<[0]Jmr> there is only one important index in this case
<[0]Jmr> the master index
<[0]Jmr> that's the right index
<[0]Jmr> then leechers connect to other leechers, and track their indexes
<[0]Jmr> given the match the master one
<[0]Jmr> this way deciding what to get from where
<calmh> mmm
<calmh> so some mechanics changes in that department
<calmh> nothing radical though
<calmh> but it would again be interesting to have it built as a hash tree
<calmh> so a node could say "i have index 3987492387423984" and the others would know what that means
<calmh> because they have the same from the seede
<calmh> r
<[0]Jmr> A is the seeder, B connects gets A's index, pulls some files. C connects to A, discovers B via A, connects to B, B advertises which parts it has, C now pulls parts from A parts from B.
<[0]Jmr> well if the hash doesn't match
<[0]Jmr> then you need to transfer log(n) of the index anyway
<calmh> yes
<[0]Jmr> but the RTT would make it super expensive to work out what is the log(n)
<[0]Jmr> due to you having to peel one layer of the onion at a time
<calmh> but in the probably common case of a lot of devices being 100% up to date it would be nice
<calmh> it could be added as just that; sent as part of the cluster config, and only used to determine if we know of exactly this index (send nothing) or not (usual exchange)
<calmh> but it's an optimization, not something critical
<[0]Jmr> well the existing version number works for that too?
<calmh> not really as that is per device
<calmh> it's only usable if i've talked to you before
<calmh> it doesn't help in saying "i have the same index as that other guy"

Crypto talks about: Crypto-Proposal

<calmh> right yeah
<calmh> it's probably full of holes that need patching, but i did spend a few hours thikning about it today. first in an attempt to not introduce new protocol messages, but that doesn't seem possible to me
<calmh> indexing twice (or at least hashing twice, bot encrypted an unencrypted) and keeping two indexes sucks as well, but not sure what to do about it.
<[0]Jmr> the mac business is a bit confusing
<[0]Jmr> you provide macs for unencrypted devices to allow them to send indexes around
<[0]Jmr> which can be verified by others
<[0]Jmr> but then you don't have macs for the data
<[0]Jmr> why cannot we just mac index/indexupdate as a whole
<[0]Jmr> ?
<[0]Jmr> I guess I need to read it through with a bit more focus
<calmh> thing is the crypto primitives i propose (nacl/secretbox and aes+gcm) all take cleartext+nonce to encrypt, thus generating ciphertext+mac. to decrypt we need nonce+ciphertext+mac, resulting in the cleartext
<calmh> but thing get annoying if all blocks are suddenly 128KiB + 16 bytes (mac)
<calmh> hence why i propose chopping it off the ciphertext and stashing it in the index instead
<calmh> (plus "key" of course, in the crypto talk above)
<calmh> for all i'm concerned, we could skip the entire MAC thing, but then we need to invent our own crypto stuff which is nice to avoid'
<calmh> well, aes-cbc doesn't need the mac
<calmh> just an IV, equivalent to the nonce
<calmh> so that may be simpler
<[0]Jmr> well yeah
<[0]Jmr> that was my initial idea
<calmh> that's probably better
<[0]Jmr> but the IV still needs to be transferred 
<[0]Jmr> the IV could be the block hash
<[0]Jmr> but you could still deduce that two blocks are the same
<[0]Jmr> also, you don't necesserily need to have an encrypted index either
<[0]Jmr> as hashes have no value
<calmh> but it should be possible to bootstrap a new trusted node from an untrusted one
<calmh> so it needs to contain 
<calmh> so whatever index it sends needs to contain all info needed
<calmh> including iv:s etc
<[0]Jmr> yeah ok
<[0]Jmr> mtime and permissions can be mac'ed using the plaintext filename
<[0]Jmr> or something
<[0]Jmr> the untrusted node doesn't need to verify that the content is has matches the hashes.
<[0]Jmr> this is left to the trusted node, as the receiving end
<[0]Jmr> oh ok
<[0]Jmr> I see the problem, the untrusted node will not be able to sign the index
<calmh> but it does need to be able to do that if it's to reuse blocks and things like that... and i guess for the usual reason, block might have changed between request and response and we get the wrong data
<calmh> i think we'll probably need something like `syncthing -verify-index` as well to check that files are what we expect them to be on an untrusted node
<calmh> on a normal box in can look at the file and verify that it's what it's supposed to be
<calmh> can't do that when it's encrypted
<calmh> also probably `syncthing -decrypt-with-key=askdjalksdj`
<calmh> although the latter doens't depend on any hashes
<calmh> that should probably go in the document
<calmh> that conversion from untrusted to trusted should be possible with only the local data plus the encryption key
<calmh> it kind of follows automatically from beign able to bootstrap a new trusted node from it, but still
<[0]Jmr> yeah ok
<[0]Jmr> having an encrypted index helps us
<[0]Jmr> it's just that scans are going to get more than twice as heavy
<calmh> yeah, that's a bit painful
<calmh> at least the initial ones
<calmh> for the incremental ones we can compare our hashes with the old index and only re-encrypt-and-hash the ones that differ
<calmh> i guess the alternative is to skip the hashes completely...
<calmh> then we lose the ability of the untrusted node to know if it has good data or nt
<calmh> but there's still some nasty corner case there
<calmh> the hypothetical new trusted node will only have the hashes for the encrypted blocks in my proposal
<calmh> so it has nothing to verify against after decryption
<calmh> actually having the MAC in there (from secretbox or aes-gcm) would solve that, since then we know it's the same data as a trusted device once produced, at least
<[0]Jmr> yeah
<[0]Jmr> I guess we could also send both hashes 
<[0]Jmr> to the untrusted node
<calmh> yeah, and then we could use the hash of the unencrypted data as the IV as you suggested. but that exposes information. if i have a certain file, and i have access to an untrusted device, i can verify my guess that you have the same file
<calmh> not sure how much of an attack that is though
<[0]Jmr> yeah I had the same idea
<calmh> someone would object that RIAA could now prove they were warezing mp3:s or something
<calmh> on the other hand
<calmh> nothing prevents us from sending a random IV, the original hash encrypted, and the hash of the encrypted data
<calmh> just more bytes in the hypothetical encrypted index
<calmh> well
<calmh> except we then need two IV:s because it's super forbidden to reuse them :)
<calmh> but still, just more bytes
<calmh> basically any short string like filename is anyway going to become a []byte{iv..., ciphertext...}
<calmh> so the hash becomes 16+32 bytes instead of just 32
<calmh> no biggie
<[0]Jmr> so you'd have a mac, plus an IV for the plaintext hash?
<[0]Jmr> which basically means you need a nounce for the MAC too
<calmh> i guess skip the mac then. so we have { IV_for_hash, hash(original data), IV_for_data, hash(encrypted data) }
<calmh> the untrusted device verifies hash(encrypted data)
<calmh> erm
<calmh> { IV_for_hash, encrypted(hash(original data)), IV_for_data, hash(encrypted data) }
<calmh> the trusted device decrypts the original hash and uses that to verify the decrypted data