Add Secure Connections Standard #548

markus-hentsch · 2024-04-04T10:50:59Z

Closes #547

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

josephineSei

nice first draft - I have a few questions inline

Standards/scs-0114-v1-secure-connections.md

josephineSei · 2024-04-08T13:32:50Z

Standards/scs-0114-v1-secure-connections.md

+## Conformance Tests
+
+Conformance tests are limited to communication channels exposed to users, such as the public API interfaces.
+


What would be necessary to test here besides the exposed channels?
RPC and DB? Maybe Libvirt? - Can we audit this somehow? Maybe an tcp-dump looking for such traffic?

However we will decide on this - it should also be documented here, so CSPs will know how to deal with possible audits.

Essentially everything that mentions a "MUST" or "MUST NOT" in this standard, except for the external API.

AMQP/RPC

SQL/DB

internal API

Libvirt port

Most of it could be determined based on a network traffic recording (e.g. tcpdump) by looking at the port numbers and traffic in conjunction with the target addresses/ports configured in the OpenStack *.confs. For the Libvirt port exposure, a netstat on the compute nodes might help or simply grep'ing the Libvirt configuration for listen_addr.

Most of this is deep in the infrastructure though. A filtered network capture might work but I don't think CSPs would be especially happy about it.

Is there any existing standard where non-external audit is proposed? I'm unsure how to tackle this in the standard.

At least, standard should mention, that conformance tests do only check external endpoints. Internal endpoints MUST be checked with manual audits. We will have manual audits in other conformance tests as well. E.g. in #555, we investigate, how to check flavor implementation and I guess, manual audits are required in this context, too. In favor to of this PR, I suggest to open a new issue to propose a general concept for manual audits.

I agree with Anja, we need a central place for manual audits. And a document that includes all necessary checks, so any auditor can simply follow through.

https://crypto.stackexchange.com/a/95660 Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

and implement all tests based on the current standard draft Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

martinmo · 2024-04-10T08:47:25Z

@markus-hentsch I have general remark: Because TLS configuration and security is a moving target, have you considered to base the recommended configuration on one of the profiles offered by Mozilla SSL? For example, the "intermediate" profile, see https://ssl-config.mozilla.org/ and https://wiki.mozilla.org/Security/Server_Side_TLS. (AFAIK, these can be checked with sslyze.)

artificial-intelligence · 2024-04-10T09:41:32Z

@markus-hentsch I have general remark: Because TLS configuration and security is a moving target, have you considered to base the recommended configuration on one of the profiles offered by Mozilla SSL? For example, the "intermediate" profile, see https://ssl-config.mozilla.org/ and https://wiki.mozilla.org/Security/Server_Side_TLS. (AFAIK, these can be checked with sslyze.)

lol, I had the same idea and actually checked our haproxy TLS implementation, seems there is some opportunity to do some hardening there:

COMPLIANCE AGAINST MOZILLA TLS CONFIGURATION
 --------------------------------------------

    Checking results against Mozilla's "MozillaTlsConfigurationEnum.INTERMEDIATE" configuration. See https://ssl-config.mozilla.org/ for more details.

    a.regiocloud.tech:443: FAILED - Not compliant.
        * ciphers: Cipher suites {'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA', 'TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384', 'TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA', 'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256'} are supported, but should be rejected.

artificial-intelligence · 2024-04-10T09:53:22Z

working on a fix for upstream: https://bugs.launchpad.net/kolla-ansible/+bug/2060787

Standards/scs-0114-v1-secure-connections.md

anjastrunk

Just some minor cosmetic changes:

Is prefer to write Cloud Service Provider instead of CSP, as "CSP" is not an official abbreviation
I do not like writing "SCS proposes..", "SCS decides...". AFAIK, SCS stands for Sovereign Cloud Stack, which is a Software Stack, which cannot decide something. I prefer to write "SCS project" or "SCS community"

But again. This is just cosmetics.

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

markus-hentsch · 2024-04-19T13:21:54Z

Just some minor cosmetic changes:

* Is prefer to write Cloud Service Provider instead of CSP, as "CSP" is not an official abbreviation

* I do not like writing "SCS proposes..", "SCS decides...". AFAIK, SCS stands for Sovereign Cloud Stack, which is a Software Stack, which cannot decide something. I prefer to write "SCS project" or "SCS community"

But again. This is just cosmetics.

I adjusted the SCS references. I left "CSP" as-is and added a glossary instead, like I did with some other standards. We seem to use CSP a lot in other standards so I'd like to stay consistent. The glossary at the top should introduce the abbreviation sufficiently now.

josephineSei

Overall I think this is a good starting point. We need something for the manual audits, but this may something to discuss in the standards SIG or in the IAM and Security Meeting

Standards/scs-0114-v1-secure-connections.md

josephineSei · 2024-04-22T08:12:18Z

Standards/scs-0114-v1-secure-connections.md

+## Conformance Tests
+
+Conformance tests are limited to communication channels exposed to users, such as the public API interfaces.
+


I agree with Anja, we need a central place for manual audits. And a document that includes all necessary checks, so any auditor can simply follow through.

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

artificial-intelligence

Thanks for all the work done so far on this.

Unfortunately I think there's still some work left to do.

Thanks.

Standards/scs-01XX-v1-secure-connections.md

artificial-intelligence · 2024-04-29T09:31:25Z

Standards/scs-01XX-v1-secure-connections.md

+3. For the message queue, AMQP-based solutions such as RabbitMQ and QPid do offer TLS natively which is also supported by OpenStack. ZeroMQ does not and requires IPsec or CIPSO instead.
+4. External API endpoints can be protected easily by using a TLS proxy. They can then be registered with their HTTPS endpoint in the Keystone service catalog. The certificates of external APIs usually need to be signed by a well-known CA in order to be accepted by arbitrary external clients.
+5. Internal API endpoints can be treated and secured similarly to the external ones. Optionally, the TLS certificate provider and PKI can differ to the external ones. It is often sufficient for the CA of internal TLS endpoints to be accepted within the infrastructure and doesn't need to be a common public CA.
+6. For protecting the data transferred between compute nodes during live-migration of VMs, [Nova offers support for QEMU-native TLS](https://docs.openstack.org/nova/latest/admin/secure-live-migration-with-qemu-native-tls.html). As an alternative, SSH is also a channel that Nova can be configured to use between hosts for this but requires passwordless SSH keys with root access to all other compute nodes which in turn requires further hardening.


this does not necessarily require root SSH access. also note that certain migration scenarios require ssh based transport instead. One scenario is the usage of secure boot VMs that use the by nova provided nvram file to store their secure boot keys. This file can only be transferred currently via SSH as far as I know.

Also if you have a volume backed instance you need ssh, see the PTG notes:

https://etherpad.opendev.org/p/nova-dalmatian-ptg#L617

to quote it, as it's an editable etherpad, which might vanish in the future:

(tobias-urdin, not attending thu-fri): live migration of a volume-backed instance requires SSH access between compute nodes the RemoteFS code is used to mkdir the volume directory on the destination, could we move that to be done by the nova-compute service over the RPC layer instead? same RemoteFS code is used to copy vtpm and config-drive from /var/lib/nova/instances directory when a volume backed instance is used, but I don't see any good way to replace that since we need a datapath and nova-compute <-> nova-compute API would probably be messy (mikal) This sort of came up for me the other day. The config-drive code is a bit weird at the moment. Its either on the hypervisor, or in your network block store as long as the network block store is exactly ceph. Is it weird that we can't store config-drives on cinder volumes? I think maybe we ended up this way because we didn't want to create lots of small cinder volumes, but maybe we should revisit that so that live migration doesn't need to copy a config-drive around at all? Should that also be true for UEFI NVRAM persistence? We could just pretend they were small disks too?

This is also why ssh based migration was, in fact, undeprecated again (well it's in the process still):

https://review.opendev.org/c/openstack/nova/+/915481

Still, qemu-native-tls is good, if it works for the needed usecase.

But it would certainly be possible to secure the ssh access in a way fairly trivially to only support certain locked down commands, see the "command" section in the authorized keyfiles man page:

https://manpages.debian.org/bullseye/openssh-server/authorized_keys.5.en.html#AUTHORIZED_KEYS_FILE_FORMAT

So maybe we could extend this to make ssh based live migration secure as well.
I have opened an issue for that on our side, as we need to implement it in kolla either way: osism/issues#1001

Standards/scs-01XX-v1-secure-connections.md

artificial-intelligence · 2024-04-29T10:31:44Z

Standards/scs-01XX-v1-secure-connections.md

+
+### Hypervisor and Live Migration Connections
+
+- If using libvirt on compute nodes, the libvirt port (as per `listen_addr`) MUST NOT be exposed to the network in an unauthenticated and unprotected fashion. It SHOULD be bound to `127.0.0.1`.


I don't think the part "It SHOULD be bound to 127.0.0.1" can really be done.
This address is used for libvirt live migration and must be reachable by other compute nodes, which obviously doesn't work via localhost.

See e.g. this old security notice for details: https://wiki.openstack.org/wiki/OSSN/OSSN-0007

I'll need to look into this again. There was a way with just unix domain sockets in conjunction with SSH iirc: https://gitlab.com/yaook/operator/-/issues/164#note_553333437

Thanks for that insightful source! Though the comment is very wrong about the ssh access - I talk about this part specifically:

[..] because it would result in a full host access instead of only libvirt[..]

It's certainly possible to lock down ssh access as you like, I assume the person commenting was just not familiar with how this is possible and only referring to the upstream - honestly quite confusing - nova docs around this:

https://docs.openstack.org/nova/latest/admin/configuring-migrations.html#general-configuration

specifically point 3 which reads:

Enable password-less SSH so that root on one compute host can log on to any other compute host without providing a password.

as I said already, this can of course be locked down, the nova docs are just wrong here, you don't need to expose arbitrary root access here.

as I said already, this can of course be locked down, the nova docs are just wrong here, you don't need to expose arbitrary root access here.

Are you referring to your other comment which states:

But it would certainly be possible to secure the ssh access in a way fairly trivially to only support certain locked down commands, see the "command" section in the authorized keyfiles man page

?
If so, have you looked at steps 8 and onward in the comment I linked¹? Isn't that essentially what the comment author was proposing? Or am I missing something?
In this setup you'd have libvirt exposed as a local unix domain socket only and migration connections from other compute hosts would use SSH as a tunnel and the local unix socket as the target. The SSH tunnel will be restricted to commands known to be used by Nova for migration purposes specifically.

Footnotes

https://gitlab.com/yaook/operator/-/issues/164#note_553333437 ↩

Ah I missed that step 8, thanks for bringing it to my attention.

So a local unix domain socket might work, never tested that, seems like a clever idea.

I don't know the package used in that comment though ("restricted-ssh-commands") to be honest, I would like to review that. Usually you don't need an extra package for this and you need to craft a list of allowed commands manually by pure necessity I think, either way. So I would be sceptical about this package.

Reading the man-page: https://manpages.debian.org/bookworm/restricted-ssh-commands/restricted-ssh-commands.1.en.html

This seems to just allow for a regex, which I don't think is fine grained enough.

The way this is usually done is, that you either write a program in your favorite secure compiled language or just use a scripting language if it can be made secure enough.

You encode all the allowed invariants directly into your program/script, maybe add some parameters which can be supplied via the ssh command and place that with the sufficient permissions inside the correct nova containers.

then you alter your authorized-keys file like this:

command="/path/to/necessary_nova_command $with_option1 $with_option2" ssh-ed25519 AAAAC[...]

this way the connection via this ssh key can only invoke this command, nothing else, you get no shell access, you need to care only for the usual environment sanitization to avoid usual env attacks, like LD_PRELOAD et al. This is pretty robust and easy, no need for additional packages.

Regex is on one side not complex enough, because nova might need to call different commands/needs access to different folders etc, so you need to wrap that inside a script, on the other side regex is too complex because you might end up allowing more than you originally wanted.

The man page even has a security warning which just reiterates my point really, regex is not a good format itself to do this kind of stuff.

Also this tool is written in bash, which can be fine, but I would do a security code review before using it, I have seen innocent bash proxy scripts leading to remote code execution before.

The code can be seen here:
https://github.com/bdrung/restricted-ssh-commands/blob/master/restricted-ssh-commands

But this is really an implementation detail, not relevant to the standard itself I would say.
Feel free to ping me if you or some colleague of you wants to chat more about this topic. 🙂

I put a bit more research into this today and for now I added a whole new section for this to the design considerations for now.
It seems like there are two ways in total to address OSSN-0007, i.e. the problem of unauthorized access to the libvirt interface:

Switching libvirt to TLS + using QEMU-native TLS for live migration + employing TLS client certificates (while restricting the libvirt port to known identities)

Restricting the libvirt port to localhost (127.0.0.1) + enabling libvirt UNIX socket + using SSH for live migration + restricting SSH user permissions

In case of 1 the access restriction would be implemented by distributing X.509 client certificates and allowlisting them and in case of 2 the access restriction would be implemented using SSH key pairs in a similar fashion.

I still need to pursue this further and derive a better recommendation in the actual standard section (i.e. the line this comment thread was started on).

Standards/scs-01XX-v1-secure-connections.md

artificial-intelligence · 2024-04-29T10:35:31Z

Standards/scs-01XX-v1-secure-connections.md

+
+## Conformance Tests
+
+Conformance tests are limited to communication channels exposed to users, such as the public API interfaces.


Why? At least there should be some justification either here or in a decision record document, no?

The automatic conformance tests as I understand them are executed on SCS infrastructure running again remotely reachable cloud infrastructure. Is this correct? This would simply not allow to look into internal communication channels automatically. E.g. the connection from Nova to the DB can only be checked when checking the config files. Same goes for the Message Queue. This can only be done if you have access on the nodes and not just to the OpenStack Infrastructure.

While I understand that, I find it difficult to standardize something when the conformance tests for the standard can not really guarantee that the standard is being correctly implemented in it's full form. Imho this leads to a false sense of security.

At least it must be made clear to end users what is not tested (e.g. live migration traffic), and/or maybe the CSP must somehow declare that they in fact follow the standard to the letter, but I find this a much weaker sign than the result of an automated test.

Maybe create an automated test that only the CSP can run internally and mandate that results are published? You still have to trust the CSP to not tamper with the test, but at least the output can be parsed automatically via machines and be made available on scs.community this way.

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

markus-hentsch · 2024-05-21T15:05:01Z

Updated standard and test script to use the Mozilla TLS "intermediate" preset now.

markus-hentsch · 2024-05-21T15:11:50Z

Tests/iaas/secure-connections/requirements.txt

@@ -0,0 +1,2 @@
+openstacksdk>=3.0.0
+sslyze>=6.0.0


TODO: we need proper pinning for this, preferably originating from Tests/scs-compatible-iaas.yaml as discussed with @mbuechse

We need to document the corresponding Mozilla JSON¹ tightly coupled with the latest SSLyze library version that ships that exact JSON² for testing.

Requires: #595

Footnotes

https://ssl-config.mozilla.org/guidelines/5.7.json ↩

https://github.com/nabla-c0d3/sslyze/blob/761892bf6a613726f272eb4e19b33faee2272b6f/sslyze/mozilla_tls_profile/5.7.json ↩

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Add first part of the secure connections standard

1cdb4e3

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

markus-hentsch added the SCS-VP10 Related to tender lot SCS-VP10 label Apr 4, 2024

markus-hentsch changed the title ~~Add Secure Communication Standard~~ Add Secure Connections Standard Apr 4, 2024

markus-hentsch added 7 commits April 4, 2024 15:03

Add notes about the classifications

47cb3c7

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Add considered options and open questions

28bbc62

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Fix linter problems

173f05d

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Add TLS standardization

414746a

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Add database and message queue channel security, extend TLS cipher rules

9012b99

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Add remaining decision sections

1bd794c

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Add testing script for secure connection standard (WIP)

eeb24de

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

josephineSei reviewed Apr 8, 2024

View reviewed changes

markus-hentsch added 5 commits April 8, 2024 16:44

Turn avoiding CBC mode into a recommendation.

58e7c62

https://crypto.stackexchange.com/a/95660 Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Refactor the TLS test script to use SSLyze

9b786f5

and implement all tests based on the current standard draft Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Add testing README and reference

4371cc7

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Address review feedback

1e4da01

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

Fix typo in test script comment

e657b91

Signed-off-by: Markus Hentsch <markus.hentsch@cloudandheat.com>

markus-hentsch marked this pull request as ready for review April 9, 2024 12:25

artificial-intelligence self-requested a review April 10, 2024 08:38

bitkeks self-requested a review April 10, 2024 08:38

markus-hentsch mentioned this pull request Apr 10, 2024

Secure communication standard for IaaS infastructure #547

Open

8 tasks

anjastrunk self-requested a review April 15, 2024 07:21

anjastrunk reviewed Apr 16, 2024

View reviewed changes