Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPG advanced key management #446

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SlugFiller
Copy link
Contributor

@SlugFiller SlugFiller commented Sep 14, 2023

This is a big one. Let me try to cover this:

  • Separated trezor-gpg init to trezor-gpg init for creating the homedir and trezor-gpg add for creating keys. Allows adding multiple keys, instead of having multiple identities for the same key.
  • Removed the abuse of the Policy URI packet for a custom marker. Like a "License URI" entry in a package manager, just because no program parses it, doesn't mean it's free for arbitrary use. It also only identifies keys made with the program, and not with which wallet/master key. This has been replaced with a simple check of whether the public key matches on regenerating from the derivation path or not.
  • Add an ability to set key derivation path for subkeys with --subkey-path, either for the auto-generated encryption subkey, or for keys made with --subkey. The derivation path is stored in signature subpacket with id 100. That's a "private use" id, meaning it's free for use under the assumption that it might clash with other programs (As defined by RFC2434). This is fine, since the derivation path is verified by deriving and comparing public keys, so a false positive is impossible. It is theoretically possible to enable this for primary keys as well, as well as store the derivation path regardless of whether a user id is available (So the key can survive having its first user id deleted). I didn't do this, but it would be a simple modification to make.
  • It's now possible to specify the home directory for creating subkeys. And due to the availability of add, it's now possible to create custom subkeys to hardware-generated primary keys. That means users can set up their own key layout.
  • Adding to the above, it's now possible to configure the flags for keys. This includes making a certify-only (no signing) primary, creating one key instead of two (sign+encrypt) at once, and limiting encryption to communication or storage.
  • Allow deleting keys using gpg --delete-secret-and-public-keys <key>.

This is a pretty big change, but it makes GPG far more usable and versatile. I also considered adding a certify command to turn an OpenSSH-format public key into a subkey file that can be imported to another home directory. However, I realized that doing that would require adding serialization and keygrip generation for any format of public key that could be delivered, since third-party subkeys are not limited to device-supported curves. I decided against messing with encryption stuff for this PR. But if you can provide some advice, or point at appropriate documentation, I can try tackling it for the next PR.

Fixes #341. Makes #358 redundant.

Note: I've tested various combinations of hardware primary keys and subkey, but did not test non-hardware primary key with hardware subkey, which should work, but might not work when signing/encrypting, due to not being able to run commands on the primary key that's in another folder. This requires testing.

@SlugFiller
Copy link
Contributor Author

Okay, I have not a single clue why CI is failing. pydocstyle libagent is working fine for me both locally and inside docker. Any hints welcome.

@SlugFiller SlugFiller force-pushed the advanced-gpg branch 2 times, most recently from 239df93 to c0357c0 Compare September 14, 2023 02:00
@SlugFiller
Copy link
Contributor Author

Nevermind, I figured it out

@romanz
Copy link
Owner

romanz commented Sep 16, 2023

Many thanks for the contribution!
Would it please be possible to split it into separate PRs?

@SlugFiller
Copy link
Contributor Author

I'm... not sure how. I know it's big, but it's all part of the same thing. I don't think I could even split it into several commits in any way that makes sense and/or compiles.

Just as an example, removing the policy URI packet (which is also done in #298) would prevent the ability to trezor-gpg add without adding the 100 subpacket, since there'd be no way to tell apart a Trezor key (and derivation path) from a non-Trezor one, which changes which signer you need to use to certify it.

It's all interconnected.

@romanz
Copy link
Owner

romanz commented Sep 16, 2023

Maybe it would be please possible to split the following parts into separate commits/PRs?

  • Changed the way SIGINT is handled, so the agent can be stopped either with SIGINT or with a KILLAGENT message. Tested Linux version using Docker, but couldn't test if it works on Darwin as well. Also removed --timeout, as it no longer relies on busy-looping to listen for connections. This is rather ugly, and I have half a mind to make a PR converting everything to asyncio, so "waiting for a thing to happen" does not block the built-in signal handling.
  • Allow the GPG agent to handle concurrent requests, similar to the SSH agent. Also, it no longer crashes if a single request experiences an error.

@SlugFiller
Copy link
Contributor Author

@romanz Sorry for the delayed response. I wanted to be 100% certain about my answer.

I'm rather unhappy with the changes you've mentioned. They were mostly hacked together to make testing the GPG changes easier. To a certain extent, the GPG depends on them, but not vice versa, so you are correct in that it can be separated to another PR.

However, with your permission, I would like to try porting trezor-agent to trio. I've tested, and it seems to be able to handle signals in a sane way on both Windows and Linux. Although I've only done basic tests and didn't try spawning a server, it's showing signs of having less gotchas than asyncio and Python itself. And it's certainly a better solution that abusing ctypes and hoping it's not broken on certain platform variants in unexpected ways.

It would be a big patch, though, since it's essentially replacing all the IO.

Your thoughts?

(P.S. In the mean time, I've finished another PR that's a bit simpler)

@romanz
Copy link
Owner

romanz commented Sep 17, 2023

It would be a big patch, though, since it's essentially replacing all the IO.

IIRC, woudn't that require trezorlib to be async-friendly, to prevent blocking the event loop?

@SlugFiller
Copy link
Contributor Author

No. You just put all the device stuff in a thread. There are three ways to handle the concurrent calls it would inevitably create.

First, avoid them at the caller level, since you can (or rather, have to) just await the thread. If you have concurrent tasks, you can do locks between them.

Second, have a regular lock inside the thread. It won't influence the awaiting, since that would allow other tasks to run.

Third, and this is one I've done in not-Python a lot, spawn a single thread that acts as a job queue, receiving tasks from the other threads, and executing them one at a time in order.

TL;DR You just use trio.to_thread

@romanz
Copy link
Owner

romanz commented Sep 18, 2023

However, with your permission, I would like to try porting trezor-agent to trio. I've tested, and it seems to be able to handle signals in a sane way on both Windows and Linux.

Just to be sure - is the main issue cross-platform signal handling?

@SlugFiller
Copy link
Contributor Author

Just to be sure - is the main issue cross-platform signal handling?

It's not so much the signal handling itself. Rather, it's the lack of a way to wait for the first of multiple blocking actions, and signals is where it manifests the worst.

Another way to look at it is that there's no straightforward way to wait for the first of two threading.Events to be set. However, in this case there's a(n ugly) workaround: Create two threads, each waiting for one of the signals. When the thread ends, trigger a common event that represents the combination of the two. When the program ends, force-trigger the two events, in order to kill the threads.

For something like sockets, you can create a thread per socket, and you can close the socket from another thread to cancel the operation (which is how you'd usually use a cancel, anyway). Not efficient, but works.

With signals, such a workaround does not exist. The main reason is that Python's signal handler doesn't actually run registered callbacks, but simply sets a flag that makes the handler run after the next IO/blocking function ends in the main thread. This means, the following would freeze:

hup_event = threading.Event()
def handler(_):
  hup_event.set()
  
  signal.signal(SIGHUP, handler)
  hup_event.wait()

Even if SIGHUP is sent, the handler will not be called until hup_event.wait returns, which creates a deadlock. This wouldn't be an issue if Python had a signal.signal_spawn_thread which spawns a thread for the handler, instead of queuing it on the main thread. But it doesn't, so you have to use an async library instead.

Even if signal.pause was available on al platforms (it isn't), unlike events and sockets, there's no action you can take from another thread to force the pause to break on another action. So, for example, if you're waiting for a SIGINT to close the program, but then a KILLAGENT is received over a GPG socket, you can close the server, but the main thread would still be stuck waiting for a SIGINT that will never come. And there's no programmatic way to break it out of that wait.

Mind you, this issue is unique to Python (specifically, CPython), due to the way it designed its signal handling and GIL.

@romanz
Copy link
Owner

romanz commented Sep 20, 2023

Unfortunately I won't be available to review this and the suggested PR in the upcoming weeks...
Maybe you can try to rewrite the existing code to use trio and if it doesn't make it more complex and doesn't introduce too much external dependencies, I will be happy to review it :)

@SlugFiller
Copy link
Contributor Author

SlugFiller commented Sep 24, 2023

Rebased on top of #456 and added support for deleting keys (requires --delete-secret-and-public-keys)

I'm still contemplating extending derivation path setting to primary keys, and refactoring the way the derivation path is extracted for a key.

@SlugFiller
Copy link
Contributor Author

@romanz Okay, I've had a thought about taking this PR in a completely different direction. And I want your opinion.

Normally, GPG-agent is supposed to manage private/secret keys. It stores them, plain or encrypted, in ${GNUPGHOME}/private-keys-v1.d/{bin2hex(keygrip)}.key. Meanwhile, GPG itself manages the store of public keys, and certification signatures of said keys.

Trezor-GPG naturally doesn't store any private keys. Instead, it pretends to store them, by querying the public keys stored, and claiming it has a private key for any public key which has a specific metadata attached to it. This method creates two problems:

  1. Trezor-GPG can't respond to secret key deletions, since it never keeps a record of stored secret keys in the first place.
  2. Public keys are marked in a way that makes it easily detectable if a key was produced by Trezor-GPG or a different agent. This signature is needed by Trezor-GPG in order to recreate the matching private key, but it is visible not only to Trezor-GPG and GPG itself, but also to any third parties that need to verify the public key.

My suggestion is therefore to copy GPG-agent's behavior, but replace the "secret key" with a derivation path. Trezor-GPG would deterministically create "secret key" files named after the matching keygrip, and containing the derivation path. This method gives the following advantages:

  1. The public key doesn't need to contain a single hint about what the derivation path is. The derivation path can therefore be arbitrary, and simply default to the user id if not specified.
  2. Trezor-GPG can delete a secret key without deleting the macthing public key, and assert it as being deleted, if queried by a GPG client.
  3. It can also recreate it at any time if called with the same parameters used originally, since the secret key is deterministic.
  4. It requires fewer calls to GPG while running. Specifically, it doesn't need to call gpg --export just to find a key by keygrip, as it can simply look for a file with a matching name, and check that it contains a valid derivation path.

What do you think?

@doolio
Copy link
Contributor

doolio commented Jan 9, 2024

Any further thought given to this?

@SlugFiller SlugFiller force-pushed the advanced-gpg branch 2 times, most recently from 7f88219 to 03ccef3 Compare January 11, 2024 00:52
@SlugFiller
Copy link
Contributor Author

Since I didn't get any response in a while, so I decided to go ahead and implement key storage in private-keys-v1.d. Now it's possible to separately delete public or secret keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subkey generation cannot proceed past GNUPGHOME directory check
3 participants