Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple pw hashes, use secret key and user-binding #558

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

nbraud
Copy link
Contributor

@nbraud nbraud commented Aug 8, 2021

Hi,

I'm opening this as a draft PR, as there are aspects that make it not ready IMO (like pyca/pynacl#676 not being merged yet, not having a lossless DB migration script, etc.) but I feel this might require extensive discussion prior to merging.

Rationale

This PR implements two main changes:

  • an abstraction layer over keyed and unkeyed password oracles, and the capability to define arbitrary oracles for storing user passwords;
  • an XChacha20 wrapper that turns any unkeyed oracle into a keyed one, using AEAD.

Those are submitted as a single PR as they depend on one-another: XChacha20 needs the KeyedOracle abstraction to get its private key, and the PasswordKind enum to construct the AAD value, while PasswordKind needs at least one KeyedOracle implementation to be populated (otherwise, password login is not possible). Allowing UnkeyedOracles in PasswordKind would break this dependency loop, and was considered, but ultimately rejected as strictly-worse design: unkeyed constructions are less secure, so only keyed constructions should be used anyhow, and the long-term benefit outweight the short-term cost of a larger PR to review as a single block.

This also motivates the KeyedOracle vs. UnkeyedOracle distinction, since this makes the cryptographic distinction readily apparent in the code. KeyedOracle uses the Blake2b keyed hash function as a KDF, to automatically derive a unique private key for each derived class; the use of a KDF is to prevent using the same (or related) keys in different constructions, which can be insecure, and Blake2b was selected as it is fast, secure, supports personalisation, and readily available in the standard library since Python 3.6.

The design of the XChacha20 wrapper was motivated by multiple consideration:

  • encrypting users' password hashes under a site-specific key, protects user passwords from offline attacks by an adversary who got access to the contents of the databases (through SQL injection, compromising the database server itself, or gaining access to a backup copy of the database) ;
  • the XChacha20-Poly1305 construction is fast regardless of whether specialized hardware support is available, suffers from no restrictions on the number of messages that can be encrypted under a given key or their size (unlike other AEAD constructions such as AES-GCM), and a high-quality implementation is available as part of libsodium (I am exposing it in PyNaCl, the binding maintained by the Python Cryptographic Authority) ;
  • adding PyNaCl as a dependency makes other password hash functions available (such as Argon2 and scrypt) that are more modern than bcrypt ;
  • using AEAD instead of plain AE (such as libsodium's SecretBox AE construction) allows binding ciphertexts to a given pw_kind and id, to protect user accounts from an adversary who gained write access to the users table of the database: this prevents attacks such as replacing the pw_kind field of a victim user's record (substituting a weak oracle for a strong one, under the AEAD wrapper) or replacing the pw_blob from that of a “donor” account whose password is known to the attacker.

Assumptions

  • The user.id primary key is immutable. If the id of a given user record is changed, decryption of their pw_blob will fail, effectively removing their ability to log in using a password.
  • The user set a PW_SITE_KEY environment variable, containing a uniformly-random, hex-encoded, 256b private key; if it isn't set, they are told to use python3 -c 'import secrets; print(secrets.token_hex(32))' to correctly generate one.

Changes

  • Add the PW_SITE_KEY configuration variable.
  • Add the KeyedOracle and UnkeyedOracle abstract base classes, and related tests.
  • Expose bcrypt as an UnkeyedOracle
  • Implement XChacha20, using encryption to convert any UnkeyedOracle into a KeyedOracle.
  • Implement a migration that encrypts/decrypts existing password hashes, so the change is transparent to users and can be rolled back.
  • Implement a mechanism to rollover the site's secret key.
  • Automatically upgrade user records to the (current) preferred password oracle.
    For instance, a user who last set their password when DEFAULT_KIND was AEAD_XCHACHA20_BCRYPT, should have it upgraded to another oracle construction when a newer one is implemented.

Discussion needed

  • Should the PW_SITE_KEY be renamed to something more generic, and potentially be used in a similar way as here to encrypt other secrets?
  • Should there be support for multiple site keys, so key rollovers can be performed without downtime, and in the presence of non-upgradable KeyedOracles (such as Argon2, using its support for keying and AAD) ?

@nbraud
Copy link
Contributor Author

nbraud commented Aug 8, 2021

Rebased on a current master

@nguyenkims
Copy link
Contributor

The PR looks good though I must admit it's a bit advanced given my knowledge about password oracles :).

About the questions

Should the PW_SITE_KEY be renamed to something more generic, and potentially be used in a similar way as here to encrypt other secrets?

I think the naming is good. Do you know if it requires a database migration or any value can be chosen? If data migration is needed, can we document it in https://github.com/simple-login/app/blob/95c8f14ea50240c1b59d008ea0dc911dd94da3cd/docs/upgrade.md#L1-L0 so people who self-host can run the migration too?

Should there be support for multiple site keys, so key rollovers can be performed without downtime, and in the presence of non-upgradable KeyedOracles (such as Argon2, using its support for keying and AAD) ?

Yes without downtime is quite important in my opinion.

@nbraud
Copy link
Contributor Author

nbraud commented Aug 26, 2021

The PR looks good though I must admit it's a bit advanced given my knowledge about password oracles :).

Thanks; I would have sent it in smaller, easier-to-review chunks, but I couldn't find a clean way to split it up.

Do you know if it requires a database migration or any value can be chosen? If data migration is needed, can we document it in https://github.com/simple-login/app/blob/95c8f14ea50240c1b59d008ea0dc911dd94da3cd/docs/upgrade.md#L1-L0 so people who self-host can run the migration too?

Yes, the PR will require a data migration, which I haven't implemented yet. (I'm no Alembic expert, I'm afraid)

Should there be support for multiple site keys, so key rollovers can be performed without downtime, and in the presence of non-upgradable KeyedOracles (such as Argon2, using its support for keying and AAD) ?

Yes without downtime is quite important in my opinion.

OK; I will come back to you on that later, as there are 2 main strategies I'm aware of for managing key rollover in password storage. and I'd prefer to discuss them with a few cryptographers before committing to either approach, to check that my analysis is correct:

  • A “password onion” à la Facebook, where a key rollover or algorithm change is achieved by wrapping the existing hash in an additional layer.
    This seems to me like a brittle construction, and would require special handling for the encryption layers.
    It's also a lot of additional complexity, and requires holding on to all site keys “forever”.
  • Only use password oracles that support key “upgrades”; for instance, w/ AEAD_XCHACHA20_BCRYPT, one can simply decrypt the existing data (with the “old” key) and re-encrypt it (under the “new” key)
    This seems much simpler to me, both to implement and to operate, as at most 2 keys need to be handled at any given time.
    Moreover, the security argument is straightforward, and older keys can be discarded (which hedges against scenario where password data protected under an older key was leaked, but they key itself hasn't yet)

In both cases, the rollover process is a bit involved, if it has to be done without downtime, as one needs to ensure that all running instances can decrypt data produced by any other instance:

  • replace the running instances of SimpleLogin with ones that have the new key in the configuration, but do not produce data encrypted under it (they can only decrypt such data);
  • replace the running instances (again!) with ones configured to produce data encrypted under the new key;
  • decrypt/re-encrypt all password data under the new key, as a batch process.

@nbraud
Copy link
Contributor Author

nbraud commented Aug 26, 2021

@nguyenkims Could you help me out with the ORM? Under PostgreSQL, it rejects both db.BINARY and db.BLOB for the pw_blob column, even though PostgreSQL has a binary type.

Also added support for extending to other password hashes.
This is harder to misuse and more obviously correct.  It does change the private keys generated for subclasses,
which is fine to do in the PR that introduces encryption.
This is only necessary temporarily, as there are no binary wheels available for
the `aead` branch of PyNaCl.
This saves a tiny bit of computation when building multiple instances.
`functools.cache` was introduced in Python 3.9, while this is expected
to run on Python 3.7, so the older solution is used instead.
This should work under ~all usual databases.
@nbraud
Copy link
Contributor Author

nbraud commented Aug 26, 2021

Had to update the dependency on PyNaCl (my branch for pyca/pynacl#676 doesn't exist anymore, and was merged upstream) and rebase to ensure that the deps can be resolved and installed at any point in the git history

@nguyenkims
Copy link
Contributor

@nguyenkims Could you help me out with the ORM? Under PostgreSQL, it rejects both db.BINARY and db.BLOB for the pw_blob column, even though PostgreSQL has a binary type.

@nbraud I tried to run sh scripts/new-migration.sh and it seems to work fine and LargeBinary is indeed the right type to use according to https://docs.sqlalchemy.org/en/14/core/type_basics.html#sqlalchemy.types.LargeBinary

What error do you have when running the code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants