Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visibility into dictionary size and entropy. #27

Open
garretwilson opened this issue May 10, 2023 · 13 comments
Open

Visibility into dictionary size and entropy. #27

garretwilson opened this issue May 10, 2023 · 13 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@garretwilson
Copy link

From the screen shots it looks like the plugin allows various dictionaries to be used. Does the plugin provide any visibility into the size of the dictionary being used, so that the user might have an idea of the entropy (a simple calculation based upon the size of the dictionary and the number of words used) of the resulting passphrase?

I can't find the word "dictionary" or "entropy" in the readme. Maybe the plugin provides this information somewhere else?

Otherwise, without having a lot of experience with the various dictionaries, a user might not know offhand how the generated passphrase compares to a completely random password of a smaller length.

@nitz nitz self-assigned this May 19, 2023
@nitz nitz added enhancement New feature or request question Further information is requested labels May 19, 2023
@nitz
Copy link
Contributor

nitz commented May 19, 2023

Hey, apologies for the delay in getting to this.

There's no way to peek at the dictionaries through the plugin, though I'm going to leave this ticket open as a reminder that that would be a good feature, in addition to words per dictionary count, if I ever get around to adding custom dictionary support, which I hope to at some point.

As for the actual number of options per dictionary & related entropy: I honestly have always just relied on KeePass' entropy calculation (e.g.: their "quality" field in entries,) but perhaps there's a way to at the very least expose that through the plugin too.

I think the trouble with actual entropy values vs. generated passphrase starts to get muddy quick, since the plugin is capable of more than just selecting random words from the list. While it's perfectly capable of generating "correcthorsebatterystaple", it also introduces several options for different mutators so that the base generation like that would end up coming out something like "correct horse battery staple", or "C0rr3c7-H0r5e-B4tt3ry-S74pl3", or even "CoRrECT%h0r53%battery%sTAPlex^¾", all of which would have vastly differing entropy values while being based on the same 4 dictionary words. (For instance, KeePass' quality reports entropy of 81 bits, 99 bits, 129 bits, and 149 bits respectively for those examples.)

Sorry, went off on a bit of a tangent there. At the very least, if you'd like to look at the dictionaries used at this very moment, you can see them in the repo, in the Resources directory: https://github.com/cmdwtf/KeePassDiceware/tree/main/Resources

@garretwilson
Copy link
Author

garretwilson commented May 19, 2023

I think the trouble with actual entropy values vs. generated passphrase starts to get muddy quick …

To some extent, yes …

… I honestly have always just relied on KeePass' entropy calculation …

… but the KeePass entropy calculation in this case produces a value that has no resemblance to the real entropy. It's not just a little off—it's completely wrong.

Let's say you have a dictionary with three words, "foo", "bar", and "baz". You generate a password "foobar". KeePass would think that the unit of variation (I made up that term; I'm not a mathematician) is each letter, for six positions, so it would assume 26^6, or 308915776 combinations, or log2(308915776) = 28 bits of entropy. (Actually KeePass says 12 bits, perhaps accounting for the fact that "o" is doubled. Typing "fozbar" shows 29 bits of entropy, closer to what is expected.)

In reality, we're not using one of 26 letters for each position, but rather one of three words for each of two positions, so the number of combinations is 3^2 = 9 combinations, or log2(9) = 3 bits of entropy. Not a good password if the attacker knows that the dictionary being used is "foo", "bar", and "baz".

So while I realize that "correct horse battery staple" probably has more entropy than "correcthorsebatterystaple", and "C0rr3c7-H0r5e-B4tt3ry-S74pl3" probably has much more, and while I acknowledge we'd have to think in intricate and subtle mathematical terms to figure out how the variations are adding entropy, the point here is that users just need to get at least a general idea of the baseline entropy based upon the dictionary size.

Thus for "C0rr3c7-H0r5e-B4tt3ry-S74pl3", the first pass of this feature could simply calculate:

  • four positions are being used
  • the dictionary has (for example) 5,000 words
  • thus the combinations are 5000^4 = 625000000000000 combinations; log2(625000000000000) = 49 bits of entropy

Thus if the plugin simply said "this password has at >=49 bits of entropy" (at least 49 bites of entropy, ignoring the extra variations that were added), that would be a huge improvement, because what KeePass shows is based upon a completely different understanding of the password.

Finally I'll note that you'll need to take the minimum of the KeePass entropy (or calculate it yourself) and the dictionary-based entropy value, to take into consideration that an attacker could use a brute force attack based upon the individual letters or on the dictionary.

I'm not a cryptographer nor a mathematician, so please feel free to point out any errors.

@ThisMakesSenseToMe
Copy link
Contributor

I don’t agree with Garret. He’s assuming the attacker knows what dictionary was used, which is probably almost never the case, and salt (numbers) at random places makes it so that the words cannot be found in ‘the’ dictionary (which is different from just replacing some letters with numbers). Plus a lot of people will use multiple dictionaries, maybe in multiple languages. In reality an attacker will probably have to use brute force. So calculating the entropy based on that assumption seems reasonable. Of course, it would be even better if a user could add his own dictionary.

@nitz
Copy link
Contributor

nitz commented May 19, 2023

I'm not a cryptographer nor a mathematician, so please feel free to point out any errors.

I know this feeling! I'm just a fan that has the t-shirts 😂

Thus if the plugin simply said "this password has at >=49 bits of entropy"

Actually, that may be a good way to display something like that: a baseline ("pre-enhancement") level of entropy. I quite like the "at least x bits", plus it keeps the logic for calculating it relatively simple.

… but the KeePass entropy calculation in this case produces a value that has no resemblance to the real entropy. It's not just a little off—it's completely wrong.

I've never actually looked at the source of KeePass' entropy calculator, but it does seem to have knowledge of at least some dictionary (english) words, as I regularly see cases where appending a new character to a phrase that takes a non-dictionary word and makes it a dictionary word actually decreases the entropy despite the longer length. For example:

deepe, 5 ch. claims 16 bits:
image

But deeper, 6 ch. claims only 12 bits:
image

While deepee, also 6 ch. increases to 17 bits from deepe:
image

Now I'm completely curious as to what their calculation is. I wonder if it's something like zxcvbn, which exhibits the same behavior. (Though zxcvbn doesn't seem to return entropy estimates as much as it does just score the password and provide that "guesses log10" value.)

@nitz
Copy link
Contributor

nitz commented May 19, 2023

Of course, it would be even better if a user could add his own dictionary.

That's the long and short of it, isn't it? 😅

@ThisMakesSenseToMe
Copy link
Contributor

Maybe you can also look at StrongBox’s implementation?

@garretwilson
Copy link
Author

He’s assuming the attacker knows what dictionary was used, which is probably almost never the case …

So does it take a large stretch of the imagination to think that an attacker would start with all the most common dictionaries, such as those this plugin uses?

In reality an attacker will probably have to use brute force.

This whole discussion is based upon the assumption that the attacker is using brute force. Only the most unsophisticated attacker would simply use the ASCII letters. I would imagine that an attacker with any sense at all would start millions of parallel brute force attacks, some based upon the ASCII letters, and others based upon the most common dictionaries.

@garretwilson
Copy link
Author

garretwilson commented May 19, 2023

Now I'm completely curious as to what [the KeePass] calculation is.

I'm curious too, but friends, let's not get sidetracked too much in this ticket away from the original simple request. The idea is that a user would simply like to have a general idea of how much variation the dictionary-based password has, so that they can decide whether to simply use a shorter random string instead. Some general "at least this much" number would be very useful. Currently the only way to find this out is for the user to go to GitHub or somewhere, find the dictionary files, pull out a calculator, etc.

@garretwilson
Copy link
Author

I know this feeling! I'm just a fan that has the t-shirts 😂

haha I need more T-shirts. Tell me where to get them. (Now I'm getting off the subject. 😅 )

@ThisMakesSenseToMe
Copy link
Contributor

Why would the attacker assume that the diceware plug-in is used at all? And even KeePass?

@garretwilson
Copy link
Author

[KeePass] does seem to have knowledge of at least some dictionary (english) words, as I regularly see cases where appending a new character to a phrase that takes a non-dictionary word and makes it a dictionary word actually decreases the entropy despite the longer length

That's interesting. Still KeePass says "correcthorsebatterystaple" has 81 bits of entropy, while above I illustrated that an example dictionary size of 5,000 words would produce only 49 bits of entropy. My whole point here is that the user might like to know a general "at least" entropy calculation based upon the known dictionary size (using the min() function with the character-based calculation).

@garretwilson
Copy link
Author

Why would the attacker assume that the diceware plug-in is used at all? And even KeePass?

The attacker doesn't "assume" anything. The attacker tries things. (That's the whole point of a brute force attack.) The attacker doesn't try a single thing. The attacker tries many things. The attacker likely starts trying the most common things, and the diceware dictionaries and dictionaries used by KeePass plugins are some of the most obvious common things to start with.

@ThisMakesSenseToMe , if you were a nefarious consultant, and an attacker were paying you to advise him/her on which dictionaries to use in a brute force attack, and the attacker asked you for a list of 10 of the most common dictionaries to start with … hopefully you get the idea.

@ThisMakesSenseToMe
Copy link
Contributor

I honestly don’t agree. Why won’t it be generated by using LastPass, or Bitwarden, or 1Password etc.. When an attacker doesn’t have specific knowledge, it is very hard to crack such a long password.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants