Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBAT security generations for the Linux kernel #590

Open
jsetje opened this issue Jul 27, 2023 · 11 comments
Open

SBAT security generations for the Linux kernel #590

jsetje opened this issue Jul 27, 2023 · 11 comments

Comments

@jsetje
Copy link
Collaborator

jsetje commented Jul 27, 2023

This issue is for discussion what such a security generation or set of security generations should track. These generations do not need to apply to anyone building a small enough chain of trust to track their revocations via simpler methods such as revocations by hash and regular certificate rotations.

SBAT generation numbers are not trying to put a number next to how secure a binary is. They serve as milestones that signify that a binary has addressed specific issues. In the context of Secure Boot these are limited to issues that could introduce untrusted privileged code.

At the bare minimum, anyone using a signed shim needs to protect boot services state, so any bypasses in the EFI stub need to be covered. At the other end of the spectrum someone, that is extremely concerned about integrity, may want to be able to revoke for any unauthorized kernel memory write issues.

Obviously the more extreme case is quite hard to track accurately and will at least at the current time be hard to distinguish from every single kernel binary. So, why not just use utsname? The answer to that distributions encode their versions differently from each other and even if shim learned how to decode and compare every one of them, forcing someone building a private ecosystem to implement their own shim code to do so does not seem reasonable. A variation of this that is implementable is some form of external mapping table, although at that point mapping lists of fixes to CVEs to generation numbers is the better choice since it actually captures what shouldn't be missed.

The major use case for shim and SBAT today are things trusted by the UEFI CA. For that use case we obviously need to protect boot services so any bugs in the admittedly simple and rather stable EFI stub code need to be tracked. It may (or may not) make sense to break out the UEFI stub explicitly and track any issues in it with a generation number that only applies to Linux kernel code that runs while boot services are still active.

Since a set of binaries that can be leveraged to bypass integrity on one system can be brought to another and used to compromise it, it must to be possible to make a revocation is for a binary or set of binaries actively being used as part of an attack on any system that trusts the UEFI CA. In order for SBAT to be useful there, the generation needs to be bumped often enough to be able to limit the scope of a revocation as much as possible. Distros also need to know or be able to compute which CVEs have been addressed with which generation number. This also means that kexec should enforce these revocations, at least in binaries signed by a chain of trust that enforced integrity and relies in SBAT based revocations.

From an OS lifecycle management point of view this could be as simple as bumping the vendor generation number every quarterly release, and for intermediate kernel updates only if a particularly concerning vulnerability is fixed. The global number would then track trivial to exploit CVEs in large groups.

I'm open to any ideas here, especially if they do a good job of addressing the various postures that folks implement with shim without compromising what is needed to satisfy the (continuously evolving) requirements for the UEFI CA.

@jsetje
Copy link
Collaborator Author

jsetje commented Jul 27, 2023

We are already approaching the need for some infrastructure beyond the header file history that maps generation numbers to CVE lists. Part of the outcome here will almost certainly be something that offers these lists in human readable form and also generates the binaries required to apply revocations.

@aronowski
Copy link
Contributor

IMHO, the SbatLevel_Variable.txt file is a good start, though maybe in the long term it would be viable to rewrite the concept from this file to a format that is both parsable by utilities and readable by humans. Then introduce the concept suggested in the introductory post: that a new quarterly release would result in revoking the older one (add an additional entry to such file).

Then consult with the Linux Foundation, to introduce the generation numbers, similar to what has been done with GRUB2 SBAT support. What's more, the upstream implementation shall also honor kexec and other mechanisms.

This idea is for the Linux Foundation (upstream) to be the main source of truth behind mapping generation numbers to CVEs, which downstream vendors honor.
What if this idea got realized, but then an older kernel (from the pre-realization period) was shipped by a company providing long-term support for it? In that case, I suppose it would be something similar to what has been described in the SBAT.md document, with the Acme Corporation shipping their GRUB2 without FSF's entry, am I right?

If so, then:

  • each downstream party would be introducing their own conventions for mapping CVEs in the kernel they ship—those that, e.g., were present in their older kernel and fixed but no upstream mapping exists?
  • what about kexec and the proper protections? A backport from upstream would then be required for a secure operation, as well as the signing procedure?

If there has already been some work done on that or some consultations performed, which I'm not aware of, please let me know. Once an agreement has been reached, I will think about technicalities like the mapping file format, among others.

@aronowski
Copy link
Contributor

Well, looks like I've been indeed missing out - there already are some ongoing consultations: 1, 2.

Considering that, is the idea of a single-source-of-truth file in a human-readable and parsable format still applicable?

@jsetje
Copy link
Collaborator Author

jsetje commented Aug 2, 2023

I don't think it makes sense to track any of this in the upstream branches. The generation number advertised by a binary is fundamentally tied to the source branch that it is produced from, and none of the distros really ship upstream stable with a default config. Even if the upstream branch had a number assigned, how would we know that a required fix was merged correctly in the tree used to produce the binary?

@aronowski
Copy link
Contributor

Would it be appropriate that applicants for a secure bootchain implement their own convention for mapping CVEs to generation numbers in the kernel(s) they ship? In this case, it would still, I believe, need to be discussed with a verifier and tracked somewhere in a single source of truth if required for issuing a shim signing request. In particular, if this is something that may change in the future.

In other words, I think it would become an additional requirement for a shim review issue to have a clear procedure on this that both official and unofficial reviewers can verify.

For instance, let's imagine that a fictional vendor named MyCompany initially got their bootchain reviewed and successfully implemented, with their SBAT entry starting with linux.mycompany. This was written into a file that's a single source of truth and is publicly available, something similar to SbatLevel_Variable.txt.

Later, this vendor decided to start shipping a new kernel variant with a different configuration, let's say a more hardened variant that disables some features for security. So, from now on, these different flavors will have distinct ways of tracking generation numbers (because the hardened variant never had security issues addressed in the initial "stock" variant). Furthermore, what are we going to do from now on with the sensible name of that vendor? Should it be something like freezing their linux.mycompany entry, treating it as an upstream one (that won't ever change in that vendor's kernel SBAT entries unless something big happens?) and making the vendor ship "downstream" entries below that one, with the names linux.mycompany.stock and linux.mycompany.hardened?

Or maybe a bit more applied and simple example that does not account for the aforementioned splitting: distinct entries like linux.oracle.rhck and linux.oracle.uek with no "upstream" linux.oracle entry, as they all would have started being implemented at the same point in time? 
How would something like this look like from the point of view of Oracle's kernel developers/maintainers, as well as shim reviewers? Would it be optimal?

In a matter of future-proofing, how would such an additional requirement, which the shim reviewers would then need to review and maintain, influence shim reviewing, especially if right now, when it's not yet implemented, the ongoing thing (those who know, know...) has already stalled the reviews from being reviewed? Would it then only be a matter of having more humanpower, i.e., more developers and reviewers?


 Even if the upstream branch had a number assigned, how would we know that a required fix was merged correctly in the tree used to produce the binary?

Unless someone writes and maintains an extensive OpenQA infrastructure for automated testing of whether certain CVEs have been resolved, we do not.
I'm not promising anything, but maybe, just maybe, sometime in the future, someone will accomplish this or something similar. ;-)

@jsetje
Copy link
Collaborator Author

jsetje commented Aug 9, 2023

Thank you for helping me work through this!

Would it be appropriate that applicants for a secure bootchain implement their own convention for mapping CVEs to generation numbers in the kernel(s) they ship? In this case, it would still, I believe, need to be discussed with a verifier and tracked somewhere in a single source of truth if required for issuing a shim signing request. In particular, if this is something that may change in the future.

If at all possible I would like to come up with some guidelines that we can have orgs follow, but as long as we can come back to the organizations with a CVE or list of CVEs and get the cutoff for where they are fixed, this is actually not unreasonable. FWIW I'm making that statement in the context of things that are signed with the public UEFI CA. If someone is doing this on a large enough private deployment to require SBAT to avoid running out of dbx, then they could either copy what's done for the public trust, or implement their own policy. (I could see governments ending up in this space eventually, but AFAIK, they are not there yet and I do not expect current dbx space limitations to exist forever either.)

Or maybe a bit more applied and simple example that does not account for the aforementioned splitting: distinct entries like linux.oracle.rhck and linux.oracle.uek with no "upstream" linux.oracle entry, as they all would have started being implemented at the same point in time?  How would something like this look like from the point of view of Oracle's kernel developers/maintainers, as well as shim reviewers? Would it be optimal?

Lets suppose that there is a CVE in some of these binaries and at least one of them is being used in an active widespread attack. The first two things we would want to be able to do are:

  • Provide some level of mitigation to the organizations dealing with the attack. Exactly how would depend on the exact circumstances, but would likely be highly targeted with respect to the attack taking place.
  • Make sure there are kernels out there for our customers that will not be revoked by a complete revocation. For a new CVE this would involve releasing updates to all the supported OS releases.
  • Push out a complete revocation to be applied by both Linux distros as well as via a Window update. (Whether this should be forcefully applied or not or whether there should be a time gap between the two would ultimately also depend on the exact circumstances, I can envision circumstances that would be bad enough to consider immediate forced revocation, but we have not seen anything like that yet.)

This means we need to look at which kernels are impacted and can hopefully map that to an SBAT generation that's as far in the past as possible. If the bug came from upstream, it would be nice to tie it to a global generation number and coordinate with other distros. If this is in an RH-like non-upstream patch, ideally this would be coordinated with all the impacted distros. If the code is UEK specific, then this could be limited to just revoking the impacted linux.oracle.uek generations.

Obviously we want an approach that makes it more likely that step two does not have to result in creating and releasing updates.

In a matter of future-proofing, how would such an additional requirement, which the shim reviewers would then need to review and maintain, influence shim reviewing, especially if right now, when it's not yet implemented, the ongoing thing (those who know, know...) has already stalled the reviews from being reviewed? Would it then only be a matter of having more humanpower, i.e., more developers and reviewers?

I suspect you're asking if getting this implemented is blocking current shim reviews. I do not speak for the folks making the final signing decisions, but I have not heard anyone say that this is a hard requirement with a date attached to it. However the sooner we get this in place, the less likely it is that we will need to do this in a panic situation.

Even if the upstream branch had a number assigned, how would we know that a required fix was merged correctly in the tree used to produce the binary?

Unless someone writes and maintains an extensive OpenQA infrastructure for automated testing of whether certain CVEs have been resolved, we do not. I'm not promising anything, but maybe, just maybe, sometime in the future, someone will accomplish this or something similar. ;-)

In the interest of not exploding the metadata in the SbatLevel variable, I do think it makes sense to track some sort of a global number. Even if it is largely date driven and is only used to reconcile the metadata for revocations that are a couple years behind.

The other option would be to dictate this number via lists of fixed CVEs in shim review, although that is slightly strange since it's shim and not kernel (or GRUB2) review, even if we do this for GRUB2 today. The obvious difference there is the amount of change that distros routinely ship in the kernel vs GRUB2 which is updated a bit less aggressively.

@aronowski
Copy link
Contributor

Thank you for helping me work through this!

That's great to hear!

In a matter of future-proofing, how would such an additional requirement, which the shim reviewers would then need to review and maintain, influence shim reviewing, especially if right now, when it's not yet implemented, the ongoing thing (those who know, know...) has already stalled the reviews from being reviewed? Would it then only be a matter of having more humanpower, i.e., more developers and reviewers?

I suspect you're asking if getting this implemented is blocking current shim reviews. I do not speak for the folks making the final signing decisions, but I have not heard anyone say that this is a hard requirement with a date attached to it. However the sooner we get this in place, the less likely it is that we will need to do this in a panic situation.

Nope.

For context, there's been a security-related situation ongoing for some time. I'm specifically not giving the details, as I'm unsure if I'm allowed to talk about it in public. Therefore, I used the euphemism: the ongoing thing. It looks like I just did not express myself as clearly as I intended.

Some time ago, I got informed that that situation got security experts and bootchain developers involved, hence why shim reviews are currently on hold. So, while having a proposal for SBAT in the kernel implemented is not yet required, I try my best to help out with it, so it itself would not require a suboptimal time and humanpower involvement of the reviewers once we have the implementation. Or, in other words, I wish we could combine a fairly simple and easy-to-maintain design with real-world scenarios that may happen, like the example I provided in my earlier comment, where a company decides to split a product into two.

In the interest of not exploding the metadata in the SbatLevel variable, I do think it makes sense to track some sort of a global number. Even if it is largely date driven and is only used to reconcile the metadata for revocations that are a couple years behind.

The other option would be to dictate this number via lists of fixed CVEs in shim review, although that is slightly strange since it's shim and not kernel (or GRUB2) review, even if we do this for GRUB2 today. The obvious difference there is the amount of change that distros routinely ship in the kernel vs GRUB2 which is updated a bit less aggressively.

An idea sparkles in my mind. What if we had a list of several distros that are commonly used as a base for forks/rebuilds and used them as some kind of base for tracking generation numbers in their kernels (for instance, we would be storing them in a file common_kernel_configs_for_sbat.md in the shim-review repository)? Then, downstream forks would be describing whether their fork is just a rebuild that provides a kernel with the same CVEs or if they configure their kernel differently.

This idea ultimately must not discriminate against distros that are tailored products, which use way different configurations than the common ones.

So, maybe we could have a question in the shim-review template like:

If your product's kernel config is the same as one in the common_kernel_configs_for_sbat.md file, please point out, which one of these it is.
If you use a custom kernel config, that is much different than the common ones in that file, please describe in a simple way your strategy of implementing generations.

Once an organization gets their review accepted, their custom configuration would then be added to that file. Why? I'll explain my reasoning later. For now, I'll write down how I believe this idea would solve the problem of the fictional MyCompany splitting their product into two I described earlier, as they would start their first product (being based on an upstream one made by a fictional MegaCorp, Inc.) with adding the entries:

linux.megacorp,X,[...]
linux.mycompany,1,[...]

and once the product splitting is made available to the market, the first product would have that entry stay the same (being still based on upstream), while the other one (the "MyCompany Hardened Kernel") may have an entry like:

linux.mycompany.hardened,1,[...]

(no upstream entry, since it would be using a tailored config not found in the common_kernel_configs_for_sbat.md file.)

The distinction and naming sensibility would indeed be there, as there would be a clear indication of what's based on what.

Once there's a major security issue in MegaCorp's kernel, an entry regarding their product is added to a revocation list, while MyCompany's is not. Therefore, flash space is saved.
(And since MyCompany's initial product is just a rebuild of MegaCorp's one, it always has the number 1 as a product-specific generation number.)

This means we need to look at which kernels are impacted and can hopefully map that to an SBAT generation that's as far in the past as possible. If the bug came from upstream, it would be nice to tie it to a global generation number and coordinate with other distros. If this is in an RH-like non-upstream patch, ideally this would be coordinated with all the impacted distros. If the code is UEK specific, then this could be limited to just revoking the impacted linux.oracle.uek generations.

In regard to the aforementioned largely date-driven global generation number, maybe we could have one that only gets updated after there's a major issue that affects all the kernels that implement SBAT and have their implementation approved in that file mentioned above. This way we would have an ecosystem that works with all approved products, no matter if these are just rebuilds or tailored ones, i.e., no discrimination, and it wouldn't be updated that often, I think—how often would there be issues that would affect the whole diverse ecosystem like that one?

If at all possible I would like to come up with some guidelines that we can have orgs follow, but as long as we can come back to the organizations with a CVE or list of CVEs and get the cutoff for where they are fixed, this is actually not unreasonable.

If the ideas I proposed are worthwhile, maybe then I could write a draft of a document that would cover the whole thing like a generic walkthrough. The details and specifics would be added later on, once the "common distros" thing had been agreed on. And the exact CVEs and their impact I would leave for security experts—I'm not one yet.

@aronowski
Copy link
Contributor

In regard to the ongoing consultation I mentioned earlier, I tried to email Emanuele Giuseppe Esposito, the person who started the thread on a kernel mailing list I linked to there, but I suppose the provider of the redhat.com domain treated my message as spam, since I got no response. It was sad to be treated like this.

The thread started with the sentence:

Important: this is just an RFC, as I am not expert in this area and
I don't know what's the best way to achieve this.

but after reading the discussions and seeing barely any chances for an upstream implementation (for the wise reasons the kernel developers described), I thought it might be viable to continue with the proposals, which would have better chances of succeeding as well as implementing something practical with efficient management right here, in this GitHub issue. Being a non-expert in kernel development doesn't prohibit anyone from being an expert in downstream implementations, I think.

@esposem, could you please take a look at this GitHub issue and share your thoughts there?

@esposem
Copy link

esposem commented Sep 20, 2023

Hi @aronowski,

As you also understood, there is no way SBAT is going to be used in the upstream Linux kernel. Despite the answers being way too aggressive from the kernel maintainers, they have some points on why it is not practical to keep track of SBAT generation numbers upstream. I invite you to read the original thread that my patch caused (filtering all inappropriate answers of course).

I think your proposal might be interesting, but it requires collaboration from all distros right? That requires a lot of work, I mean you are welcome to try but good luck with that... I tried my part upstream and you saw the answer I got.

We at RH are definitely going to use SBAT for UKI. What the others will do and if we are going to use another common shared mechanism it's up to you :)

CCing also @berrange and @vittyvk

@vittyvk
Copy link

vittyvk commented Sep 20, 2023

For the record, this is how RHEL's UKI's SBAT looks like:

sbat,1,SBAT Version,sbat,1,https://github.com/rhboot/shim/blob/main/SBAT.md
systemd,1,The systemd Developers,systemd,252,https://systemd.io/
systemd.rhel,1,Red Hat Enterprise Linux,systemd,252-16.el9,https://bugzilla.redhat.com/
linux,1,Red Hat,linux,5.14.0-349_bug2225529_v2.el9.x86_64,https://bugzilla.redhat.com/
linux.rhel,1,Red Hat,linux,5.14.0-349_bug2225529_v2.el9.x86_64,https://bugzilla.redhat.com/
kernel-uki-virt.rhel,1,Red Hat,kernel-uki-virt,5.14.0-349_bug2225529_v2.el9.x86_64,https://bugzilla.redhat.com/

The idea behind 'linux,1' in the absence of upstream agreement was that shim now owns this information so it can define which particular CVEs/issues/... need to be fixed to declare a certain generation. Non-UKI kernel can adopt this too but unless we stop loading binaries without '.sbat' section in shim, doing so requires an update of the signing certificate and this is no easy process.

@berrange
Copy link

Practically speaking any downstream can add 'linux,1' without co-ordination....only once someone decides there is a 1st CVE that justifies bumping the generation, does the question of co-ordination need to be answered, which I guess is where shim community could come into play to assign generations to CVE (sets) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants