Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: Add options to configure TLS for HA using Redis #87567

Merged
merged 8 commits into from
May 14, 2024

Conversation

fayzal-g
Copy link
Contributor

@fayzal-g fayzal-g commented May 9, 2024

What is this feature?
Updates the unified_alerting configuration with additional fields to configure TLS on the Redis client when enabling alerting high availability using Redis.

The following fields have been added:

  • ha_redis_tls_enabled
  • ha_redis_tls_cert_path
  • ha_redis_tls_key_path
  • ha_redis_tls_ca_path
  • ha_redis_tls_server_name
  • ha_redis_tls_insecure_skip_verify
  • ha_redis_tls_cipher_suites
  • ha_redis_tls_min_version

Why do we need this feature?
Some users may have an instance of Redis that is only able to accept communications with TLS but it is currently not possible to configure client-side TLS on the Grafana side

Special notes for your reviewer:

Please check that:

  • It works as expected from a user's perspective.
  • If this is a pre-GA feature, it is behind a feature toggle.
  • The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

Comment on lines +1104 to +1105
# The maximum number of simultaneous redis connections.
# ha_redis_max_conns = 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this change but added it in to make it consistent with defaults.ini

@fayzal-g fayzal-g changed the title Add Alerting HA Redis Client TLS config Alerting: Add options to configure TLS for HA using Redis May 10, 2024
@fayzal-g fayzal-g marked this pull request as ready for review May 10, 2024 15:20
@fayzal-g fayzal-g requested review from torkelo, a team and brendamuir as code owners May 10, 2024 15:20
@fayzal-g fayzal-g requested review from rwwiv, JacobsonMT, yuri-tceretian, grobinson-grafana, JohnnyQQQQ, alexweav and stevesg and removed request for a team May 10, 2024 15:20
Copy link
Contributor

@alexweav alexweav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - everything below is a nitpick or a question

ha_redis_tls_enabled = false

# Path to the PEM-encoded TLS client certificate file used to authenticate with the redis server.
ha_redis_tls_cert_path =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these required, or can you leave them blank and the client will use the certificates installed in the OS?

Whichever the answer is, might be worth noting in the doc comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, they're required if TLS is enabled. Will update the docs to make this clear.

CertPath: certPaths.clientCert,
KeyPath: certPaths.clientKey,
CAPath: certPaths.ca,
ServerName: "localhost",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, will miniredis retry on randomized ports if the default redis port is already taken on the host? Just wondering about test portability, because without port retry this could fail on a dev machine with an unrelated redis running.

I don't expect us to add dynamic ports as part of this PR, we could just implement it in the future if someone complains - just curious if it already works or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there is port retry, from what I can see it will try a random port on localhost and error if it is already in use.

"github.com/stretchr/testify/require"
)

func TestNewRedisPeerWithTLS(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very cool test 👍

@@ -66,6 +66,7 @@ database for HA and cannot support the meshing of all Grafana servers.
1. Set `ha_redis_address` to the Redis server address Grafana should connect to.
1. [Optional] Set the username and password if authentication is enabled on the redis server using `ha_redis_username` and `ha_redis_password`.
1. [Optional] Set `ha_redis_prefix` to something unique if you plan to share the redis server with multiple Grafana instances.
1. [Optional] Set `ha_redis_tls_enabled` to `true` and configure the corresponding `ha_redis_tls_*` fields to secure communications between Grafana and redis with TLS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. [Optional] Set `ha_redis_tls_enabled` to `true` and configure the corresponding `ha_redis_tls_*` fields to secure communications between Grafana and redis with TLS.
1. [Optional] Set `ha_redis_tls_enabled` to `true` and configure the corresponding `ha_redis_tls_*` fields to secure communications between Grafana and Redis with Transport Layer Security (TLS).

@@ -66,6 +66,7 @@ database for HA and cannot support the meshing of all Grafana servers.
1. Set `ha_redis_address` to the Redis server address Grafana should connect to.
1. [Optional] Set the username and password if authentication is enabled on the redis server using `ha_redis_username` and `ha_redis_password`.
1. [Optional] Set `ha_redis_prefix` to something unique if you plan to share the redis server with multiple Grafana instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. [Optional] Set `ha_redis_prefix` to something unique if you plan to share the redis server with multiple Grafana instances.
1. [Optional] Set `ha_redis_prefix` to something unique if you plan to share the Redis server with multiple Grafana instances.

@@ -66,6 +66,7 @@ database for HA and cannot support the meshing of all Grafana servers.
1. Set `ha_redis_address` to the Redis server address Grafana should connect to.
1. [Optional] Set the username and password if authentication is enabled on the redis server using `ha_redis_username` and `ha_redis_password`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. [Optional] Set the username and password if authentication is enabled on the redis server using `ha_redis_username` and `ha_redis_password`.
1. [Optional] Set the username and password if authentication is enabled on the Redis server using `ha_redis_username` and `ha_redis_password`.

Copy link
Contributor

@stevesg stevesg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks for doing this. Just one comment about configuration error handling and this looks good to go.

if cfg.tlsEnabled {
tlsClientConfig, err := cfg.tls.GetTLSConfig()
if err != nil {
logger.Error("Failed to get TLS config", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would return the error here, a quick skim at the errors from GetTLSConfig shows they are mostly configuration issues and are likely unrecoverable. I believe the argument for not failing on a failed Ping lower down the code, is because the failure is transient and could resolve after startup.

Suggested change
logger.Error("Failed to get TLS config", "err", err)
logger.Error("Failed to get TLS config", "err", err)
return nil, err

Comment on lines 82 to 89
HARedisTLSEnabled bool
HARedisTLSCertPath string
HARedisTLSKeyPath string
HARedisTLSCAPath string
HARedisTLSServerName string
HARedisTLSInsecureSkipVerify bool
HARedisTLSCipherSuites string
HARedisTLSMinVersion string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] We could use dstls.ClientConfig here, would save an instance of re-listing the options (here)

Copy link
Contributor

@stevesg stevesg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes, looks ready.

Copy link
Member

@JohnnyQQQQ JohnnyQQQQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🚀

@fayzal-g fayzal-g merged commit 7a2fbad into main May 14, 2024
14 checks passed
@fayzal-g fayzal-g deleted the alerting-redis-tls branch May 14, 2024 13:21
@ephemeral-instances-bot
Copy link

Error building instance: Contact #proj-ephemeral-hg-instances if it is not a compile error. Logs

Error message

handling pull request closed event: deleting instance by slug: unexpected response status: status=502 responseBody=

<title>502 Server Error</title>

Error: Server Error

The server encountered a temporary error and could not complete your request.

Please try again in 30 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants