GPG advanced key management #446

SlugFiller · 2023-09-14T00:26:26Z

This is a big one. Let me try to cover this:

Separated trezor-gpg init to trezor-gpg init for creating the homedir and trezor-gpg add for creating keys. Allows adding multiple keys, instead of having multiple identities for the same key.
Removed the abuse of the Policy URI packet for a custom marker. Like a "License URI" entry in a package manager, just because no program parses it, doesn't mean it's free for arbitrary use. It also only identifies keys made with the program, and not with which wallet/master key. This has been replaced with a simple check of whether the public key matches on regenerating from the derivation path or not.
Add an ability to set key derivation path for subkeys with --subkey-path, either for the auto-generated encryption subkey, or for keys made with --subkey. The derivation path is stored in signature subpacket with id 100. That's a "private use" id, meaning it's free for use under the assumption that it might clash with other programs (As defined by RFC2434). This is fine, since the derivation path is verified by deriving and comparing public keys, so a false positive is impossible. It is theoretically possible to enable this for primary keys as well, as well as store the derivation path regardless of whether a user id is available (So the key can survive having its first user id deleted). I didn't do this, but it would be a simple modification to make.
It's now possible to specify the home directory for creating subkeys. And due to the availability of add, it's now possible to create custom subkeys to hardware-generated primary keys. That means users can set up their own key layout.
Adding to the above, it's now possible to configure the flags for keys. This includes making a certify-only (no signing) primary, creating one key instead of two (sign+encrypt) at once, and limiting encryption to communication or storage.
Allow deleting keys using gpg --delete-secret-and-public-keys <key>.

This is a pretty big change, but it makes GPG far more usable and versatile. I also considered adding a certify command to turn an OpenSSH-format public key into a subkey file that can be imported to another home directory. However, I realized that doing that would require adding serialization and keygrip generation for any format of public key that could be delivered, since third-party subkeys are not limited to device-supported curves. I decided against messing with encryption stuff for this PR. But if you can provide some advice, or point at appropriate documentation, I can try tackling it for the next PR.

Fixes #341. Makes #358 redundant.

Note: I've tested various combinations of hardware primary keys and subkey, but did not test non-hardware primary key with hardware subkey, which should work, but might not work when signing/encrypting, due to not being able to run commands on the primary key that's in another folder. This requires testing.

SlugFiller · 2023-09-14T00:33:45Z

Okay, I have not a single clue why CI is failing. pydocstyle libagent is working fine for me both locally and inside docker. Any hints welcome.

SlugFiller · 2023-09-14T04:20:39Z

Nevermind, I figured it out

romanz · 2023-09-16T10:29:31Z

Many thanks for the contribution!
Would it please be possible to split it into separate PRs?

SlugFiller · 2023-09-16T13:31:38Z

I'm... not sure how. I know it's big, but it's all part of the same thing. I don't think I could even split it into several commits in any way that makes sense and/or compiles.

Just as an example, removing the policy URI packet (which is also done in #298) would prevent the ability to trezor-gpg add without adding the 100 subpacket, since there'd be no way to tell apart a Trezor key (and derivation path) from a non-Trezor one, which changes which signer you need to use to certify it.

It's all interconnected.

romanz · 2023-09-16T17:30:53Z

Maybe it would be please possible to split the following parts into separate commits/PRs?

Changed the way SIGINT is handled, so the agent can be stopped either with SIGINT or with a KILLAGENT message. Tested Linux version using Docker, but couldn't test if it works on Darwin as well. Also removed --timeout, as it no longer relies on busy-looping to listen for connections. This is rather ugly, and I have half a mind to make a PR converting everything to asyncio, so "waiting for a thing to happen" does not block the built-in signal handling.

Allow the GPG agent to handle concurrent requests, similar to the SSH agent. Also, it no longer crashes if a single request experiences an error.

SlugFiller · 2023-09-17T15:36:21Z

@romanz Sorry for the delayed response. I wanted to be 100% certain about my answer.

I'm rather unhappy with the changes you've mentioned. They were mostly hacked together to make testing the GPG changes easier. To a certain extent, the GPG depends on them, but not vice versa, so you are correct in that it can be separated to another PR.

However, with your permission, I would like to try porting trezor-agent to trio. I've tested, and it seems to be able to handle signals in a sane way on both Windows and Linux. Although I've only done basic tests and didn't try spawning a server, it's showing signs of having less gotchas than asyncio and Python itself. And it's certainly a better solution that abusing ctypes and hoping it's not broken on certain platform variants in unexpected ways.

It would be a big patch, though, since it's essentially replacing all the IO.

Your thoughts?

(P.S. In the mean time, I've finished another PR that's a bit simpler)

romanz · 2023-09-17T17:12:22Z

It would be a big patch, though, since it's essentially replacing all the IO.

IIRC, woudn't that require trezorlib to be async-friendly, to prevent blocking the event loop?

SlugFiller · 2023-09-17T17:23:17Z

No. You just put all the device stuff in a thread. There are three ways to handle the concurrent calls it would inevitably create.

First, avoid them at the caller level, since you can (or rather, have to) just await the thread. If you have concurrent tasks, you can do locks between them.

Second, have a regular lock inside the thread. It won't influence the awaiting, since that would allow other tasks to run.

Third, and this is one I've done in not-Python a lot, spawn a single thread that acts as a job queue, receiving tasks from the other threads, and executing them one at a time in order.

TL;DR You just use trio.to_thread

romanz · 2023-09-18T19:24:40Z

However, with your permission, I would like to try porting trezor-agent to trio. I've tested, and it seems to be able to handle signals in a sane way on both Windows and Linux.

Just to be sure - is the main issue cross-platform signal handling?

SlugFiller · 2023-09-19T06:47:43Z

Just to be sure - is the main issue cross-platform signal handling?

It's not so much the signal handling itself. Rather, it's the lack of a way to wait for the first of multiple blocking actions, and signals is where it manifests the worst.

Another way to look at it is that there's no straightforward way to wait for the first of two threading.Events to be set. However, in this case there's a(n ugly) workaround: Create two threads, each waiting for one of the signals. When the thread ends, trigger a common event that represents the combination of the two. When the program ends, force-trigger the two events, in order to kill the threads.

For something like sockets, you can create a thread per socket, and you can close the socket from another thread to cancel the operation (which is how you'd usually use a cancel, anyway). Not efficient, but works.

With signals, such a workaround does not exist. The main reason is that Python's signal handler doesn't actually run registered callbacks, but simply sets a flag that makes the handler run after the next IO/blocking function ends in the main thread. This means, the following would freeze:

hup_event = threading.Event()
def handler(_):
  hup_event.set()
  
  signal.signal(SIGHUP, handler)
  hup_event.wait()

Even if SIGHUP is sent, the handler will not be called until hup_event.wait returns, which creates a deadlock. This wouldn't be an issue if Python had a signal.signal_spawn_thread which spawns a thread for the handler, instead of queuing it on the main thread. But it doesn't, so you have to use an async library instead.

Even if signal.pause was available on al platforms (it isn't), unlike events and sockets, there's no action you can take from another thread to force the pause to break on another action. So, for example, if you're waiting for a SIGINT to close the program, but then a KILLAGENT is received over a GPG socket, you can close the server, but the main thread would still be stuck waiting for a SIGINT that will never come. And there's no programmatic way to break it out of that wait.

Mind you, this issue is unique to Python (specifically, CPython), due to the way it designed its signal handling and GIL.

romanz · 2023-09-20T18:42:11Z

Unfortunately I won't be available to review this and the suggested PR in the upcoming weeks...
Maybe you can try to rewrite the existing code to use trio and if it doesn't make it more complex and doesn't introduce too much external dependencies, I will be happy to review it :)

SlugFiller · 2023-09-24T01:47:52Z

Rebased on top of #456 and added support for deleting keys (requires --delete-secret-and-public-keys)

I'm still contemplating extending derivation path setting to primary keys, and refactoring the way the derivation path is extracted for a key.

SlugFiller · 2023-09-26T15:15:07Z

@romanz Okay, I've had a thought about taking this PR in a completely different direction. And I want your opinion.

Normally, GPG-agent is supposed to manage private/secret keys. It stores them, plain or encrypted, in ${GNUPGHOME}/private-keys-v1.d/{bin2hex(keygrip)}.key. Meanwhile, GPG itself manages the store of public keys, and certification signatures of said keys.

Trezor-GPG naturally doesn't store any private keys. Instead, it pretends to store them, by querying the public keys stored, and claiming it has a private key for any public key which has a specific metadata attached to it. This method creates two problems:

Trezor-GPG can't respond to secret key deletions, since it never keeps a record of stored secret keys in the first place.
Public keys are marked in a way that makes it easily detectable if a key was produced by Trezor-GPG or a different agent. This signature is needed by Trezor-GPG in order to recreate the matching private key, but it is visible not only to Trezor-GPG and GPG itself, but also to any third parties that need to verify the public key.

My suggestion is therefore to copy GPG-agent's behavior, but replace the "secret key" with a derivation path. Trezor-GPG would deterministically create "secret key" files named after the matching keygrip, and containing the derivation path. This method gives the following advantages:

The public key doesn't need to contain a single hint about what the derivation path is. The derivation path can therefore be arbitrary, and simply default to the user id if not specified.
Trezor-GPG can delete a secret key without deleting the macthing public key, and assert it as being deleted, if queried by a GPG client.
It can also recreate it at any time if called with the same parameters used originally, since the secret key is deterministic.
It requires fewer calls to GPG while running. Specifically, it doesn't need to call gpg --export just to find a key by keygrip, as it can simply look for a file with a matching name, and check that it contains a valid derivation path.

What do you think?

doolio · 2024-01-09T15:52:04Z

Any further thought given to this?

SlugFiller · 2024-01-11T02:11:25Z

Since I didn't get any response in a while, so I decided to go ahead and implement key storage in private-keys-v1.d. Now it's possible to separately delete public or secret keys.

SlugFiller force-pushed the advanced-gpg branch 2 times, most recently from 239df93 to c0357c0 Compare September 14, 2023 02:00

Port to trio async library

5719f1a

SlugFiller force-pushed the advanced-gpg branch from c0357c0 to 719bce7 Compare September 24, 2023 01:45

SlugFiller force-pushed the advanced-gpg branch 2 times, most recently from 7f88219 to 03ccef3 Compare January 11, 2024 00:52

GPG advanced key management

b41b483

SlugFiller force-pushed the advanced-gpg branch from 03ccef3 to b41b483 Compare January 11, 2024 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPG advanced key management #446

GPG advanced key management #446

SlugFiller commented Sep 14, 2023 •

edited

SlugFiller commented Sep 14, 2023

SlugFiller commented Sep 14, 2023

romanz commented Sep 16, 2023

SlugFiller commented Sep 16, 2023

romanz commented Sep 16, 2023

SlugFiller commented Sep 17, 2023

romanz commented Sep 17, 2023

SlugFiller commented Sep 17, 2023

romanz commented Sep 18, 2023

SlugFiller commented Sep 19, 2023

romanz commented Sep 20, 2023

SlugFiller commented Sep 24, 2023 •

edited

SlugFiller commented Sep 26, 2023

doolio commented Jan 9, 2024

SlugFiller commented Jan 11, 2024

GPG advanced key management #446

Are you sure you want to change the base?

GPG advanced key management #446

Conversation

SlugFiller commented Sep 14, 2023 • edited

SlugFiller commented Sep 14, 2023

SlugFiller commented Sep 14, 2023

romanz commented Sep 16, 2023

SlugFiller commented Sep 16, 2023

romanz commented Sep 16, 2023

SlugFiller commented Sep 17, 2023

romanz commented Sep 17, 2023

SlugFiller commented Sep 17, 2023

romanz commented Sep 18, 2023

SlugFiller commented Sep 19, 2023

romanz commented Sep 20, 2023

SlugFiller commented Sep 24, 2023 • edited

SlugFiller commented Sep 26, 2023

doolio commented Jan 9, 2024

SlugFiller commented Jan 11, 2024

SlugFiller commented Sep 14, 2023 •

edited

SlugFiller commented Sep 24, 2023 •

edited