Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative enhancement #136

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

Erotemic
Copy link

@Erotemic Erotemic commented Jun 14, 2022

This is an alternative to #132 that aims to be simpler. The new CLI args are:

--digest=<openssl-digest> 
--salt-method=<'password' | 'random' | custom_string>
--kdf=<"none" | "pbkdf2">

By default these will always write a new .transcrypt/config file that is then git add-ed to the repo. In my own experience it is a pain to have to re-enter these non-secret data that could instead be stored in plaintext in the repo. We should also make an option to optionally disable it, but I think 99% of users benefit.

The priority of loading a variable is:

  1. CLI args
  2. unversioned .git/config
  3. versioned .transcrypt/config
  4. user prompt or crash if non-interactive

The exception is transcrypt.password and transcrypt.openss-path which are never saved or loaded from the .transcrypt/config file (only the unversioned .git/config file). The other behavior of note is that when --salt=random, the program will replace it with a random value before the main logic starts.

@elasticdog @jmurty The other posts and writeup I've made haven't generated much discussion. I think enhancements along this line are a clear win, and I'd like to start recommending transcrypt to people, so it would be helpful if I knew what it would take to get this merged here. Otherwise I can continue using my fork.

This also adds:

  • environ: TRANSCRYPT_TRACE - if set to 1, will set -x to help with debugging
  • environ: EDITABLE_INSTALL - if set to 1, will symlink transcrypt into the git repo instead of copying it. Also helpful for debugging.
  • Used bpkg conventions to allow the script to be sourced as a library. Again, this is a debugging QOL improvement, and does not impact user functionality.

@jmurty
Copy link
Collaborator

jmurty commented Jun 14, 2022

Thanks for the simplified PR. I'm aiming to get a new 2.2.0 release with the fairly urgent OpenSSL v3 fix out to clear the way for this change which I think may be large enough to warrant a 3 major release, especially since it will break backwards compatibility for the new, improved, and stronger encryption which we should probably recommend by default.

The main block on getting this merged is for myself or @elasticdog to have sufficient time available. I can't promise when this will happen, but after we've cut the 2.2.0 release I will be willing to merge to the main branch and iterate on things there, rather than leave your work marooned on a PR branch.

Thanks for your patience; we'll get there.

@jmurty
Copy link
Collaborator

jmurty commented Jun 14, 2022

Hi @Erotemic I have updated this branch with changes, mostly focussed on getting the existing BATS tests to pass to give us a solid base to work from. I haven't quite managed it yet, but it's getting close

@Erotemic
Copy link
Author

I do have a bit more work that needs to be done on this branch. Specifically, I need to rename the arguments used in the program to the arguments I've described above.

I agree that getting 2.2.0 out with the SALTED changes is by far the most important thing. Keep me updated with status on that.

@Erotemic
Copy link
Author

I squashed everything down to a single commit in order to make rebases nicer. I also change the "--use-pbkdf2" arg to simply "kdf", which has a much better UX feel.

I decided it's probably best to keep "--salt_method" as-is, so the user doesn't think that the salt they specify is actually the "final_salt" to be used in encryption. Also there already was an existing "-s" option, so I wanted to keep that distinguished.

If this is the version we go with, I'll need to clean up the writeup to target this specific implementation (which should basically be removing pieces because this is simpler).

@jmurty
Copy link
Collaborator

jmurty commented Jun 27, 2022

Hi @Erotemic I have pushed some updates to get the existing BATS tests to pass for all platforms. With these tests now passing I'm more confident we can progress the feature changes in this PR without breaking things.

I am finding the salt method configuration and storage confusing, the salt-method config setting seems to hold either the config name password or an actual salt value, that is then applied in the code via an $extra_salt variable per:

transcrypt/transcrypt

Lines 257 to 264 in 61a352d

if [[ "$salt_method" == "password" ]]; then
extra_salt=$password
elif [[ "$salt_method" == "" ]]; then
die "salt must be specified"
else
extra_salt=$salt_method
fi

I think we can simplify this as follows to reduce the number of configuration keys and variables in the code:

  • store the real salt value in the configuration, perhaps renamed to base-salt or similar to clarify it isn't the final salt value but is used to derive that salt value
  • when a key derivation function (pbkdf2) is enabled, the init process should always generate – or prompt for – a random base salt value. I don't think we want to support the password-based salt generation mechanism at all when a KDF is used, since it undoes much of the security gained by using a KDF
    • when a KDF is not enabled, the init process will not prompt for a base salt value
  • for backwards compatibility with non-KDF repos, we don't load the base-salt config setting and fall back to the password-based salt generation mechanism
  • BUT for security, when a KDF is enabled we treat the absence of a base-salt config setting as an error.

Does this make sense, or am I missing something?

@Erotemic
Copy link
Author

Erotemic commented Jun 27, 2022

Yes, the salt-method currently is a bit weird. I did that to reduce the parameter space, but I think it's worth revisiting.

I like the idea of disallowing the password mode when a KDF is enabled.

I also like the idea of storing the real value we are going to use in the config as "base-salt".

One important use-case that I'm not sure this covers is the one of "plausible deniability" if the user does not want to check in a "versioned" .transcrypt config so they would have to explicitly specify all the parameters to the repo at configuration time: e.g.

transcrypt -c 'aes-256-cbc' -p '<my-very-secure-password>' -md 'sha512' --kdf 'pbkdf2' -sm 'deadbeaf12345678deadbeaf12345678deadbeaf12345678deadbeaf12345678'

This way there is no indication that transcrypt was used to encrypt the files (I suppose other than the fact that all the encrypted files will start with "salted", so maybe this isn't that important). I think it's also a TODO on my part to enable the user to disable the versioned config.

I also think that the logic to conditionally skip prompting for a salt method is somewhat inelegant. How about this modification?

  • The config stores two variables related to salt: "salt-method" and "base-salt". The former corresponds to user specified input, and the latter is the actual value we will use (unless the salt-method is password).

  • If a KDF is enabled the "salt-method" defaults to "random", but it will default to "password" if the KDF is disabled.

  • If KDF is enabled a salt method of "password" will give the user an error that explains why its not allowed.

  • The user is never prompted for the "salt-method". It just defaults based on on their KDF setting. However, the user is allowed to pre-specify it via the command line. I think this is the major change from your outline.

  • If salt-method is random, the and the "base-salt" in the configuration is empty, a new random base salt is generated and written. If the base-salt in the config exists, then it is loaded.

  • If the salt-method is neither "random", nor "password", then the value of salt method is interpreted as the actual value to be used as "base-salt". This will allow for the unversioned config case.

The only thing I'm unsure about in the above scheme is what happens on a rekey. My thought was: check if the salt-method is "random", and if so always regenerate a new base-salt. But if it is explicitly specified, then just use that. But I also want to make it easy for the user to configure a repo with a single command that specifies the entire configuration. My thought is maybe if "salt-method" is prefixed with "random:" then it signals so store "random" in the salt-method and the base-salt should be whatever is after the prefix on this configuration, so when you rekey you do get a randomized salt. But on the other hand, we might just force that the user always randomizes the salt on a rekey for security (which is what it currently does).

So the above case single-line configuration would be:

transcrypt -c 'aes-256-cbc' -p '<my-very-secure-password>' -md 'sha512' --kdf 'pbkdf2' -sm 'random:deadbeaf12345678deadbeaf12345678deadbeaf12345678deadbeaf12345678'

Thoughts?

@jmurty
Copy link
Collaborator

jmurty commented Jun 27, 2022

I think a rekey should regenerate a new base-salt. We might need to give the user a way to keep the old base salt if they want to, but this wouldn't be the default.

Overall I still don't see the point of prompting for or storing a salt method. Is there a scenario not covered in the table below where we need to know the salting method?

Using PBK? Base salt in config? Outcome
Yes Yes OK
Yes No Error
No Yes Ignored
No No Ignored

@Erotemic
Copy link
Author

Yes: the case where the user wants to keep their base salt on a rekey. We need to know if the base-salt was generated via a random process or if the user explicitly set it to that. But we could always not support that case. It does weaken the security.

@jmurty
Copy link
Collaborator

jmurty commented Jun 28, 2022

We should definitely discourage keeping the same base salt on a rekey. In this case I don't think we need to know whether an existing base salt was randomly or explicitly set, just whether the user wants to keep it.

For an interactive rekey we could prompt to keep the existing base salt – with No as the default – then let them enter a new base salt or leave blank to generate new a random one.

For a non-interactive rekey, accept a new argument like --base-salt with the base salt value to keep, otherwise generate a new random one.

@jmurty
Copy link
Collaborator

jmurty commented Jun 28, 2022

Though I'm very tempted to always regenerate the base salt on a rekey, and add preservation of the base salt only if/when someone actually asks for it. As you say it weakens security, and it complicates the UI and our code for an unlikely and ill-advised use-case.

Erotemic and others added 13 commits July 3, 2022 13:33
Catch validation errors on CLI
Remove bad echo
Update references to obsolete argument --use-pbdkf2 to --kdf
`--kdf` default is now 'none' not '0'
`--md` default is now 'md5' not 'MD5'
- Use `:+` bash parameter expansion to include pbkdf2 argument to
  openssl only if variable is set
- Simplify variable `pbkdf2_arg` from list to string, since the `[@]`
  referencing doesn't work in all cases for MacOS (at least not
  for the unit tests): an empty list errors with `unbound variable`
@jmurty
Copy link
Collaborator

jmurty commented Jul 9, 2022

Hi @Erotemic I have just released version 2.2.0 of Transcrypt, so the main branch is now clearer to prepare the improvements in this branch for merging.

I'd like to focus on the KDF/PBKDF2 key derivation function (and base salt) first, since these are the most important changes. I'd like to focus on the usability of the new arguments, add some explanations for what the settings mean at important points (e.g. the interactive config step), and make sure it's as easy as possible to use transcrypt in the new and best way: with PBKDF2 and SHA256.

Some initial thoughts/questions:

  • Could we reduce the size of the generated random base salt? 32 hex characters is a lot, would 16 do (like OpenSSL's own salt value)
  • Can we detect whether the user's openssl version is compatible with KDFs, and which ones exactly? The command openssl list -kdf-algorithms seems to work
    • Once we can detect KDF compatibility, we can show a useful error message if the user tries to configure a KDF and/or prompt the user to install a newer OpenSSL version for better security
  • The --kdf argument should accept any valid KDF value – not just pbkdf2 – or if Transcrypt will only work with pbkdf2 we might as well only add the -pbkdf2 shortcut argument
    • As such, I'd like to change the values accepted by --kdf from 0, 1, none, pbkdf2 to any valid KDF value
    • Instead of accepting --kdf=none could we interpret the absence of the --kdf argument as none?
    • For consistency we should add a -k argument as a shorthand for --kdf
  • I'd like to make the default digest and KDF settings sha256 and pbkdf2 respectively, at least for the interactive configuration process. Defaults are important, and since we're allowing for better security with these changes we should strongly encourage users to do what's best for them.

By the way, I don't expect you to make all these changes, I'll work on them too. This list ☝️ is mainly a brain-dump of things I plan to start on.

Although I think it makes sense to keep the versioned config feature in this branch alongside the KDF changes, I expect I will merge just the KDF changes first, then follow up on the versioned config feature in a second pass.

We don't have algorithm-specific shorthands for other things like
cipher, and if transcrypt will support multiple potential KDFs it
will be unwieldy to add arguments for every one.
@Erotemic
Copy link
Author

Erotemic commented Jul 9, 2022

On Versioned Configs

So your thought is that we will store the base salt and everything else locally without the versioned config, which adds in all of those features, but just forces the user to copy-paste rather long commands to initialize new clones (as there is no other mechanism for passing around the salt value)? I think that's fine in terms of splitting up the PR, but I just want to make sure you're not thinking about releasing transcrypt without the option for a versioned config.

Ok KDFs

I like removing -pbkdf2 and using -k. I'm all for removing 1 and 0, they were really placeholders anyway to make getting the proof-of-concept out easier.

From what I understand openssl doesn't support anything except pbkdf2. It's unfortunate. It would have been nice to use something like Argon2 or scrypt. Turns out I'm wrong. openssl list -kdf-algorithms does this (looks like a pain & a half to parse though). We should absolute support things beyond pbkdf2, up until now I thought it was the only one. Looks like scrypt is available, but unfortunately argon2 is not in 3.0.2. Maybe soon. Looks like the PR adding it exists, but is in danger of going stale: openssl/openssl#12256

I strongly recommend against removing kdf=none. I'm ok changing it to --kdf='' as an empty string though. Having an explicit value that represents the absence of a KDF is important to me. One of my biggest pet peeves is when I can't specify every argument to a CLI explicitly. It makes it more tedious to use things programatically. I'm a big believer that CLIs should have a key/value mechanism for specifying all ways they could be parameterized. This makes it easy to represent sets of arguments as a dictionary, which I use in my tests. I'm just a contributor here, so I'll accept pushback on this and deal with the annoyance, but I hope this mini-rant is enough to convince you that having an explicit way to specify all key/value pairs of arguments on the CLI is a desirable thing.

Also, using the absence of --kdf on the command line doesn't work when pbkdf2 is the default (how do you specify legacy behavior?)

On Salt

I don't have a problem with using 16 instead of 32 characters. It was an arbitrary choice, and I don't think it is a significant reduction in any security. We might want to bump it up a little from 16, maybe to 20 or 24, because unlike openssl we aren't choosing a random salt each time, we are reusing it. It might make sense to push up the entropy here just in case. I don't have any theoretical justification for this, it's just a feel-fact.

On Defaults

I'm all for using sha256 and pbkdf2 as defaults... but I'm not sure about the UX for people already using legacy settings. If they upgrade transcrypt and try to initialize their legacy repo does that work? The salt change isn't a big deal, although every file in the repo will show up as having a diff, but the kdf change will not correctly decrypt the data unless the user who is upgrading realizes that their has been a change they need to account for (which 9 / 10 times they will not).

I think we need to write a few test cases that test how the program works through the upgrade process from 2.x to 3.x. I don't have those in my python test suite right now, but I can put some effort into adding them. Not sure how easy it would be to add those cases to the bats suite.

My thought is that the first release of 3.0 should contain all of these options, but not modify any defaults. It could emit warnings to user to give them a heads up that they are using legacy settings as well as instruction on how to upgrade them and that defaults will change in the future. I think a 3.1 release could then switch the defaults with minimal disruption.

Note, if we do release a version with kdf, digest, and salt without the versioned config, then we should certainly not change the defaults in that case. But once the versioned config is available, I think there is a path for sensible default upgrades.

@jmurty
Copy link
Collaborator

jmurty commented Jul 9, 2022

Quick clarifications:

  • I don’t plan to release without the versioned config option – that’s helpful – but first up I’d like to work through the UX without relying on that config help so it is as usable and clean without it as possible
  • I agree the parsing the KDF listing looks like a massive pain. Maybe we can just hard-code pbkdf2 and scrypt for now
  • I’d prefer --no-kdf or similar to --kdf=none. I haven’t come across other tools that use “none” or “0” in this way, and since we’re built on Git following it’s argument conventions seems reasonable
  • Making more secure options the default will be tricky for backwards compatibility. We can probably only make them the default (recommendations) in the interactive config prompts, but even that would be a win
  • I also don’t have an informed opinion on the best length for base salt. The long value is only really a problem for sharing configs as command + args only, without the versioned config, but it does make that command more unwieldy. I'll need to give this some more thought, and maybe research

@Erotemic
Copy link
Author

So, it doesn't seem to be straightforward to switch between the underlying kdfs. openssl enc supports the -pbkdf2 option, which just enables that particular one, to my earlier rant, there does not seem to be a key/value mechanism that allows a user to simply specify which one you want to use on the command line. This might involve looking into the approach you suggested earlier where we run the KDF once and then store the key and IV and use that directly (which would also give a nice speed boost for diff and status operations).

It would be nice to get an opinion about the salt length from someone who really knows what they are doing wrt to cryptography. I know it's best to use a unique salt for each file --- and we effectively do by mixing in the hash of the file --- but I don't have a sense for if lengthening the salt provides any extra security that could make up for the fact that there is this public content that contributes to the security of all data in the repo. I also think we shouldn't worry too much about it. Using a salt of length 16 is probably fine, and there is nothing stopping the user from specifying a custom base-salt that is longer. It might be best to just change it to 16 and leave this as a known open question.

Again on the key/value

I'm going to argue again in favor of reserving some value for kdf that indicates it is not used (I don't care what it is). The "--option" and "--no-option" makes sense in the case where option takes on a binary value. For things that could take on more than one value (we would like to be able to say --kdf=pbkdf2 and --kdf=scrypt), using a key/value CLI specification makes the most sense. This pattern does exist in git, especially in the git config command, where everything is key/value based. I'm a python guy so --kdf=none makes a lot of sense to me, but --kdf=null would also be fine. I'd be happy with any of the following --kdf= ... : none, off, disabled, 0, null, etc...

(Note: previously I said --kdf='' would be ok, but after thinking about it I realize this is actually a bad idea because it becomes ambiguous with the case where the user wants no kdf versus the user didn't specify what they wanted).

I could live with --no-kdf, but I'd strongly recommend against it. The reason is that if you are scripting something its a lot easier when one part of the argument stays the same in all cases, adding the --no- convention means anyone who is scripting on top of this will need to have a special case to fixup the command line when they need to disable the feature, whereas every other change would just require changing the right-hand-side of the --kdf=<rhs> argument.

@Erotemic
Copy link
Author

FYI: I noticed I accidentally reverted the fix in #119. That is back in now.

@Erotemic
Copy link
Author

Erotemic commented Aug 8, 2022

I want to note that I've been using this in my day-to-day, and it does seem that there is an issue with it that should be resolved before merging.

When I checkout different branches, even though they were encrypted with this branch, it complains that there would be conflicts in the encrypted files (even though in the plaintext version of them there isn't). That means that there is some non-deterministic or different thing happening either between machines (where perhaps I have different version of openssl installed) or on the same machine. I haven't localized it yet. Just noting it here for reference.

@jmurty
Copy link
Collaborator

jmurty commented Aug 16, 2022

Hi @Erotemic thanks for noting those cross-machine issues. Can you note down anything that might help to debug these kinds of problems, like the platform and OpenSSL versions on the different machines?

@Erotemic
Copy link
Author

They were both Ubuntu machines. I think one is 22.04 and the other is 20.04, so they come with their respective versions of openssl. Should be fairly simple to fire up a few docker containers to test this.

@Erotemic
Copy link
Author

I made a small proof-of-concept file: tests/test_multi_docker.py

This ports the docker-management file oci_container.py from cibuildwheel which lets me start two docker images in a single python session and then send commands to each. In the example I set them both up with a minimal transcrypt repo and I try to perform some actions, commit on one, and then pull on the other.

I think this is exposing some issues in this branch, and I'll have to nail those down... eventually.

@jmurty
Copy link
Collaborator

jmurty commented Nov 10, 2022

In excellent news, it looks like the latest version of MacOS 13 Ventura includes a newer version of LibreSSL (3.3.6) with an openssl command that supports PBKDF2:

Using the built-in MacOS version of openssl from LibreSSL:

$ /usr/bin/openssl version
LibreSSL 3.3.6

# Encrypt with PBKDF2, 2500 iterations
$ echo "TEST" | ENC_PASS=abc123 /usr/bin/openssl enc -e -a -aes-256-cbc -md sha256 \
  -S deadbeef00000000 -pass env:ENC_PASS -pbkdf2 -iter 2500
U2FsdGVkX1/erb7vAAAAAN4Q77gUNYGeQo3tucwenVM=

# Decrypt with PBKDF2, 2500 iterations
$ echo U2FsdGVkX1/erb7vAAAAAN4Q77gUNYGeQo3tucwenVM= \
  | ENC_PASS=abc123 /usr/bin/openssl enc -d -a -aes-256-cbc -md sha256 \
  -S deadbeef00000000 -pass env:ENC_PASS -pbkdf2 -iter 2500
TEST

The MacOS / LibreSSL version is also compatible with OpenSSL version 1.1:

$ echo "TEST" | ENC_PASS=abc123 /opt/homebrew/opt/openssl@1.1/bin/openssl enc -e -a -aes-256-cbc -md sha256 \
  -S deadbeef00000000 -pass env:ENC_PASS -pbkdf2 -iter 2500
U2FsdGVkX1/erb7vAAAAAN4Q77gUNYGeQo3tucwenVM=

And with OpenSSL version 3, once our (ugly) work-around is applied to include the salted prefix that version 3 stopped including (see #133):

# OpenSSL v3+ without salt prefix DOES NOT MATCH
$ echo "TEST" | ENC_PASS=abc123 /opt/homebrew/opt/openssl@3/bin/openssl enc -e -a -aes-256-cbc -md sha256 \
  -S deadbeef00000000 -pass env:ENC_PASS -pbkdf2 -iter 2500
3hDvuBQ1gZ5Cje25zB6dUw==

# OpenSSL v3+ WITH salt prefix work-around does match
$ (
    printf "Salted__" && printf "%s" "deadbeef00000000" | xxd -r -p && \
    echo "TEST" | ENC_PASS=abc123 /opt/homebrew/opt/openssl@3/bin/openssl enc -e -aes-256-cbc -md sha256 \
    -S deadbeef00000000 -pass env:ENC_PASS -pbkdf2 -iter 2500
  ) | /opt/homebrew/opt/openssl@3/bin/openssl base64
U2FsdGVkX1/erb7vAAAAAN4Q77gUNYGeQo3tucwenVM=

Annoyingly our work-around for OpenSSL v3 no longer including the salt prefix is probably broken on MacOS Ventura, because LibreSSL's openssl command reports a major version of 3 but still includes the salt prefix while we forcibly add the salt prefix for major version 3. It's good that LibreSSL doesn't make breaking changes as freely as OpenSSL does, but it means we need to be smarter in applying our work-around for OpenSSL v3.

[Edit: LibreSSL vs OpenSSL version checking is fixed via #147]

@jmurty
Copy link
Collaborator

jmurty commented Mar 19, 2023

As an example of the kinds of compatibility problems we're facing that I worried about in #134 (comment), I just wasted some time trying to get the BATS tests to pass for this PR's code using the LibreSSL 3+ openssl command included with MacOS 13.

The stumbling block was in trying to get a list of available digests from LibreSSL, which:

  • does not support listing available digests with either of the list-digest-commands or list -digest-commands used by different versions of OpenSSL, and instead requires list-message-digest-commands
  • returns a zero error code (no error) when you use one an incorrect command, so we can't easily tell that the command failed.

I'm not sure how best to work around the faulty error code result to detect failed commands?

  • check for a zero-length stderr stream after the command
  • look for the word "invalid" in the output since both OpenSSL and LibreSSL happen to use this word (though with different capitalisation)
  • some other much more sensible option?

@jmurty
Copy link
Collaborator

jmurty commented Apr 3, 2023

Hi @Erotemic I just created (yet another) PR for the PBKDF2 work: #162

I have made some time recently to push the PBKDF2 improvements ahead, and thanks to your work here I have made a lot of progress. I started a new PR instead of building on this one so I could build things up slowly, keep everything in my head as I went, and make the smallest number of changes possible to get things working. I have been lifting your code pretty liberally across to the new PR, but only the pieces I need as I need them.

A key difference in the new PR is how the base salt is handled. I think I've come up with an easier and strong-enough way of doing this so we don't need to store or prompt for a long salt value: the base (called "project" in the new PR) salt is derived from the password using the same PBKDF2 settings as all other files, to make the partial hash value used in per-file salts just as difficult to brute-force as any other file in the repo.

I describe this in more detail in the PR comment section Deterministic salting for PBKDF2 (Feedback needed). I'd really appreciate your thoughts on this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants