Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for bcrypt_sha256 #297

Open
Jonny007-MKD opened this issue Jan 8, 2023 · 2 comments
Open

Add support for bcrypt_sha256 #297

Jonny007-MKD opened this issue Jan 8, 2023 · 2 comments

Comments

@Jonny007-MKD
Copy link

BCrypt only hashes the first 72 Bytes of a password, with UTF-8 this makes up to 72 characters, only. That's why BCrypt-SHA256 was created:

https://passlib.readthedocs.io/en/stable/lib/passlib.hash.bcrypt_sha256.html
https://gitlab.com/gitlab-org/gitlab/-/issues/220580

@Fusion
Copy link
Collaborator

Fusion commented Jan 9, 2023

Hi @Jonny007-MKD thank you for bringing this up.

Let's go down that rabbit hole, if you don't mind, because this is an important topic.

While the bcrypt specification calls for 72 (or 71+null termination) bytes, these are not guaranteed in any implementation and it is safer to assume that the key part is limited to about 54 bytes.
Overall, GLAuth should work with Unicode-encoded strings. When it comes to BCrypt (here: $2y$) UTF8 is the recommended encoding method.
Trying to figure out how many printable characters exist in UTF8, I came, adding printable characters from various ranges (from 1 to 4 bytes) with about 1,007,615 code points.
If, however, as many in the Western world would, ISOLatin1 encoding is used, we are down to 191 printable characters.
For the sake of simplicity, I am going to assume the latter (the former can also create some worst case scenarios based on the ranges being used)

Since each byte it encoded using two bytes (hexadecimal), our worst case scenario is that we encode 27 bytes.
This gives us an entropy of: log2(191^27) = 204 bits.
If, more reasonably, we expect people to use e.g. basic US keyboard characters, our pool of characters drops to 94.
Now, our entropy is: log2(94^27) = 176 bits.

An advantage that bcrypt holds over pure key derivation algorithms is that it is both RAM heavy, which is not very GPU friendly, but also Blowfish is not easy to vectorize (unlike SHA256)

OK, so... why not wrap a SHA256 hash in our bcrypt call?
It looks like depending on who you ask, it's a good idea, not a terribly bad idea, or actually a bad idea. The latter being due to "hash shucking" -- basically if you already have the SHA256 in a cracking dictionary, or from another leak (think users reusing passwords, but not only), you will be able to simply pass this hash to bcrypt() (keep in mind that bcrypt contains its own salt) and it's like you did not use bcrypt() in the first place.
Looking at the recent lastpass story, it is clear that something to worry about is your database being leaked, rather than an external attack. Say hello to my little salt (and iteration count).

A way to mitigate shucking is to "pepper" your sha256 hash before hashing it again. Unfortunately, if your database is stolen, your pepper may come along if it lives anywhere near it.

So, back to 176 of entropy. After all the previously mentioned caveats, it is not so bad.
Is it good, though? Am I saying "wontfix?"
No, because dropping the remainder of the user's password, even if it's an insanely long password, is nothing to be proud of.

I am open to further discussion on what the best approach would be.

@Jonny007-MKD
Copy link
Author

Jonny007-MKD commented Jan 15, 2023

Hi @Fusion,

I am not a password hashing specialist. I read about this on the python passlib manual page. Now I spent some time, though, to understand the issues :)

These are the sources I used:

I'll summarize the hash shucking first, in case you don't want to spend 40 minutes watching the video. It's worth it, though!

Hash shucking means:

  • an attacker finds SHA256 hashes in some user database somewhere else
  • an attacker finds bcrypt hashes in our config file

What can the attacker do with this information?
Assuming we had implemented bcrypt(salt, sha256($password)), the attacker could use the SHA256 hashes from the first attack, feed them into bcrypt with the user salt and see whether the result is the same as in our config file. The information the attacker gets is whether the same password is used at both places. Now the attacker knows that he only needs to brute-force the SHA256 hash, and does not have to spend his computing power on bcrypt.

In his talk Sam Croley makes a small statement at the end (minute 42): this hash shucking does not work, if the inner hash was salted in the first place.

That's exactly what is implemented in passlib's bcrypt_sha256 (version 2): It does not use plain SHA256, but HMAC-SHA256 with the same salt that bcrypt gets.
Now what is the difference? Using the salt in the SHA256 hash, the attacker cannot use the SHA256 hash from another user database to shuck of the bcrypt, except if by chance the same salt was used. Now the attacker cannot even find out whether the same password was used in different.

Another problem that the OWASP cheat sheet mentions: One must not feed raw bytes from hash functions into bcrypt, as hashes may contain \0 characters and bcrypt does not using following characters for the hash. So I would rather implement bcrypt($salt, base64(hmac-sha256($salt, $password))).

OWASP is discouraging exactly this solution, though 🤔 and I don't want to recommend to ignore OWASP recommendations, even though I don't agree with them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants