User Request: allow targeting subset of destination service #843

dasch · 2024-03-07T10:28:56Z

Is your feature request related to a problem? Please describe.
A typical scenario I want to test is how service A response to its direct dependency service B being partially unavailable; I basically want to verify that A has proper timeouts and retries in place to be able to gracefully handle e.g. a single B pod being overloaded or in a bad state.

Describe the solution you'd like
I see that it's possible to scope a network disruption to just a list of specific IP addresses with the network.hosts field. However, I do not know the IP addresses of the B pods at the time of writing the Disruption. I would like to instead be able to provide a count of the destination service's pods that should be in scope for the disruption, with a percentage allowed. This would be dynamically translated to a list of IPs.

Describe alternatives you've considered
I can create a disruption on B instead of A, and set the count as I wish. However, that causes a disruption to all clients of B, whereas I want to limit the scope to A, which is the subject under test. We do not have dedicated environments for this, so limiting the impact of disruptions is key to staying popular with my colleagues :D

The text was updated successfully, but these errors were encountered:

Devatoria · 2024-03-07T11:16:57Z

Hey, just to clarify: you would like to drop all packets but only for a subset of hosts behind the hostname you provide to the disruption right?

In other words and with an example, your use case would be: I want to drop 100% of packets going to 50% of the hosts behind the provided hostname.

And it is a different use case than: I want to drop 50% of packets going to the provided hostname.

dasch · 2024-03-07T12:44:12Z

Yup; in my concrete case I probably want to delay rather than drop, but it's only for a subset of hosts behind the hostname, yes.

Devatoria · 2024-03-07T13:15:27Z

Ok, I think there's a simplistic way to implement such a feature by resolving the given hostname and picking x% of returned IPs in the injector component.

@ptnapoleon wdyt?

ptnapoleon · 2024-03-07T13:38:27Z

Is it literally just x% of returned IPs, or is there any other filtering you want to do on those hosts? Do you need the same x% of IPs to be picked across all injectors, or is it fine if they're all just picking a random x%? Do you need this to work for the spec.network.hosts field or also spec.network.services?

dasch · 2024-03-08T10:57:33Z

It would probably be better if it's the same IPs across all the selected pods, but that's not a hard requirement. But I'm thinking it would be relatively easy to do with a consistent hash? Doesn't have to be the same across runs, so maybe throw some Disruption specific value in there.

By the way, this would also work well for testing resilience to e.g. a single Aurora database reader being unavailable; there's a single hostname for the reader endpoint, with the configured number of reader instances behind it, so you can't target disruptions on just the hostname. It's another case where it's valuable to test that application retry connections, for example.

Devatoria · 2024-03-08T14:29:27Z

Sounds good and easy to integrate within the host filters doing the resolution of hostnames: https://github.com/DataDog/chaos-controller/blob/main/injector/network_disruption.go#L1177

Passing a percentage of resolved IPs to keep would probably be enough and it is a valuable feature.

ptnapoleon · 2024-03-08T16:09:01Z

I'll open a ticket for this internally for us to track

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Request: allow targeting subset of destination service #843

User Request: allow targeting subset of destination service #843

dasch commented Mar 7, 2024

Devatoria commented Mar 7, 2024

dasch commented Mar 7, 2024

Devatoria commented Mar 7, 2024

ptnapoleon commented Mar 7, 2024

dasch commented Mar 8, 2024

Devatoria commented Mar 8, 2024

ptnapoleon commented Mar 8, 2024

User Request: allow targeting subset of destination service #843

User Request: allow targeting subset of destination service #843

Comments

dasch commented Mar 7, 2024

Devatoria commented Mar 7, 2024

dasch commented Mar 7, 2024

Devatoria commented Mar 7, 2024

ptnapoleon commented Mar 7, 2024

dasch commented Mar 8, 2024

Devatoria commented Mar 8, 2024

ptnapoleon commented Mar 8, 2024