Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bencoded dictionaries' keys are not always sorted #520

Open
cypheon opened this issue Mar 30, 2021 · 5 comments · May be fixed by #521
Open

Bencoded dictionaries' keys are not always sorted #520

cypheon opened this issue Mar 30, 2021 · 5 comments · May be fixed by #521

Comments

@cypheon
Copy link

cypheon commented Mar 30, 2021

BEP 3 requires that bencoded dictionaries' keys are in sorted order BEP3.
In the current implementation, the key-value pairs are bencoded in the order of a standard Go for range loop. However Go's language specification does not guarantee a sorted (or even any stable) order for map iteration [Go Language Specification: For statements with range clauses].

I have a proposed fix ready which I will submit as a PR.

@jzelinskie
Copy link
Member

We've known about this for since roughly around the time chihaya/bencode#2 was posted.

Bencode is performance-critical to the HTTP frontend, so I'm hesitant to add the sort unless you know of broken clients/behavior that are caused by the nondeterminism. This focus on performance and cutting out unnecessary functionality is why we use our own bencode implementation.

@cypheon
Copy link
Author

cypheon commented Apr 3, 2021

Hi, thanks for the quick response! It's good to hear you're aware of the issue, unfortunately it did not turn up in my quick search.

I initially noticed this, because BitTornado (the Python library) was crashing/throwing exceptions regarding invalid bencoded data, so there's at least one client having trouble with keys not being in-order.

I honestly didn't really consider performance while writing the fix, since most (all) of the dicts in often-occurring messages (i.e. responses to announce messages) only have in the order of magnitude about 4 or 5 keys (interval, min_interval, peers). But then again, I didn't do any benchmarks, so it's just a hunch.

All in all, I think complying to the spec is in itself valuable. If this impacts performance too much, maybe the interface could be switched from a "Dict" to a slice of (Key, Value) tuples? That would move the responsibility of sorting the keys to the part of code where the response is generated. Of course that would require a much bigger change in the code.

@pushrax
Copy link
Member

pushrax commented Apr 5, 2021

There are some cases where the dict can potentially be very large

filesDict := bencode.NewDict()
for _, scrape := range resp.Files {
filesDict[string(scrape.InfoHash[:])] = bencode.Dict{
"complete": scrape.Complete,
"incomplete": scrape.Incomplete,
}
}
return bencode.NewEncoder(w).Encode(bencode.Dict{
"files": filesDict,
})
}

I agree that the best way would be to construct the dicts in order, to avoid creating another copy of keys and sorting later. However, I also agree that in practise it's probably not much of a problem to just do the sort when marshalling. A benchmark would definitely be the important deciding factor here.

@Aranjedeath
Copy link

I am running this patch in production and we will see if there is some performance issue with this or not.

@Aranjedeath
Copy link

Performance of this patch is not making me scream, so if it is unoptimized... the optimized solution will be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants