balancerd: Handle canceling connections #24081

ParkMyCar · 2023-12-21T19:26:26Z

What version of Materialize are you using?

v0.81.0-dev

What is the issue?

It appears as though balancerd isn't properly canceling requests. If you start running a subscribe in psql and then try canceling it with CTRL+C, the subscribe will not stop. Also it seems like when you close the terminal window that's running psql, the session does not get cleaned up. See this Slack thread for more details

The text was updated successfully, but these errors were encountered:

benesch · 2023-12-21T19:47:57Z

The code bears this out!

materialize/src/balancerd/src/lib.rs

Lines 555 to 558 in dbff751

    
           // Balancer ignores cancel requests. 
        
           // 
        
           // TODO: Can/should we return some error here so users are informed 
        
           // this won't ever work?

benesch · 2023-12-21T19:49:28Z

Two ideas that I talked about with @mjibson way back when:

If we have SNI, we're good: balancerd: use pgwire SNI if present #23907
Without SNI, we could send the cancellation request to all environments. Feels bad, but the combination of (conn_id, secret_key) is probably unique enough, especially if there are only a few hundred environments behind each balancer.

benesch · 2024-01-03T05:51:54Z

Talked about this in the TL sync today.

@def- suggested that we could have our PostgreSQL sessions cycle the cancellation keys every few minutes. This would introduce more randomness, to make guessing the cancellation key for a session harder. I pointed out that the protocol didn't support this, but @def- pointed out on Slack that libpq nonetheless handles changes to the cancellation key mid session. Weird! I'd still be nervous about relying on this, though, because other pgwire protocol implementations are likely not so kind.

Here's an idea I proposed. We have 64 bits to play with here: 32 bits of connection ID and 32 bits of secret key. We don't need 2^32 connection IDs! At the moment max_connections is constrained to 1000. Let's be conservative and assume one day we want to support 1MM connections. That means we need 20 bits of the connection ID for the actual connection ID. The remaining 12 bits can be used to represent the first 12 bits of the environment ID. That's not enough to identify a full UUID (128 bits), but at our current scale we'd have very few collisions, and even in the limit where we have hundreds or thousands of environments per Kubernetes cluster, we'd still expect each 12-bit UUID prefix to only hit a handful of environments.

We can also randomly assign connection IDs. Today we sequentially assign connection IDs. But that leaves a fair bit of randomness on the table. If both connection IDs and secret keys are randomly generated, that gives us 52 bits of randomness (4 quadrillion possibilities) instead of 32 bits of randomness (4 billion possibilities).

On the Kubernetes side, we could still leverage DNS here. Have the environment controller create a service named after the first 12 bits of each environment's organization ID by using the first three hex digits of the UUID. An environment for org ID 2b826f1e-4691-40b5-b627-8f7e4fc6369f would use service name 2b8. A given service will map to multiple environments in the case of collision, and that's fine. balancerd, when it gets a cancellation request, will look up the service for the given org ID prefix, and then crucially forward the cancellation request on to all of the IPs that the DNS record resolves to.

tl;dr

High order 12 bits of connection IDs are the low order 12 bits of environment IDs
Low order 20 bits of connection IDs are a randomly generated connection ID
Secret key is a randomly generated 32-bit integer, as it is today

cc @alex-hunt-materialize @mjibson

ParkMyCar · 2024-01-03T14:13:29Z

The remaining 12 bits can be used to represent the first 12 bits of the environment ID.

As @mjibson has pointed out to me before, bits 6, 7, and 12 of a UUIDv4 are constant. What do you think about using the last 12 bits of the environment ID for more randomness?

benesch · 2024-01-03T16:24:07Z

Oh, wow, good to know. That's too bad, because it's much harder to sort the environments by their last 12 bits than their first, but agreed that we should do it, because losing 3 of the 12 bits of UUID randomness would be a big hit.

alex-hunt-materialize · 2024-01-04T22:52:20Z

What should be responsible for maintaining the services? We could probably do this in the environment-controller, but we currently don't have anything there that operates on all environments, only on one environment at a time.

I guess on environment reconciliation we could query any existing service, but it's a bit clunky. We'd probably want to serialize interactions with these services, so that we don't race.

benesch · 2024-01-05T03:33:37Z

I think it should be fine to handle the creation of the service during the normal environment reconciliation. It will need to do two things:

Add a label to the environmentd pod like: cancel-suffix.materialize.cloud=<last twelve bits of org id>
Ensure a service named cancel-<last twelve bits of org id> exists with a pod selector that matches on the label from (1).

It's true that environments that are reconciled in parallel will race on (2). But that should be fine, because they're both going to try to create identical services. IIUC, the service membership doesn't need to be managed, since that's handled by Kubernetes automatically, and so I think it all works out pretty smoothly.

alex-hunt-materialize · 2024-01-08T18:54:44Z

Ah, that's a nice way to handle it for creation. I'm not sure how to handle delete that way, though.

benesch · 2024-01-08T21:04:04Z

It's plausibly fine just to leave them around? There's only going to be 4096 of them at most.

pH14 · 2024-01-10T14:24:02Z

If the issue is the mechanics of the reconciler, It's not too uncommon to have a Kube controller that considers all CRs for reconciliation if any one of them changes. kube-rs supports this through reconcile_all_on, which I think we can use to solve the creation / deletion issues here? Any time an Environment changes, we scan through all cancellation Services and all Environments and ensure the right ones exist

This will decrease their predictability, which is desirable for the balancer. See MaterializeInc#24081 (comment)

This will decrease their predictability, which is desirable for the balancer. See MaterializeInc#24081 (comment) Bump connection id space to 20 bits as part of balancer cancellation.

This will decrease their predictability, which is desirable for the balancer. See #24081 (comment) ### Motivation * This PR adds a known-desirable feature. ### Checklist - [x] This PR has adequate test coverage / QA involvement has been duly considered. - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)).  - [ ] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): - n/a

This will decrease their predictability, which is desirable for the balancer. See MaterializeInc#24081 (comment) ### Motivation * This PR adds a known-desirable feature. ### Checklist - [x] This PR has adequate test coverage / QA involvement has been duly considered. - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](https://github.com/MaterializeInc/cloud/pull/5021)).  - [ ] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): - n/a

Use the lower 12 bits of the environment id as the upper 12 bits of the connection id. See #24081 (comment) ### Motivation * This PR adds a known-desirable feature. ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)).  - [ ] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): - n/a

When balancer receives a cancel request, broadcast it to all envds with matching ids. See #24081 (comment) ### Motivation * This PR adds a known-desirable feature. ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)).  - [ ] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): - n/a

maddyblue · 2024-01-22T22:34:31Z

Adapter parts of this are complete. There's a new arg to balancerd (see #24503) for a cancel broadcast DNS lookup address.

benesch · 2024-02-14T01:02:11Z

@alex-hunt-materialize pointed out today on Slack that our plan to have cancellation Services in a cancellation namespace doesn't quite work as previously described, because the environmentd pods the service wants to reference are strewn about many different namespaces.

I think we can still make this work by manually managing the endpoint slices with the "FQDN" address type:

Service:

apiVersion: v1
kind: Service
metadata:
  name: cancel-b3f
spec:
  clusterIP: "None"

EndpointSlice:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: cancel-b3f-1
  labels:
    kubernetes.io/service-name: cancel-b3f
addressType: FQDN
endpoints:
  - addresses:
      - "environmentd.environment-b3fxxxx.svc.cluster.local"
      - "environmentd.environment-b3fyyyy.svc.cluster.local"

ParkMyCar added the C-bug Category: something is broken label Dec 21, 2023

ParkMyCar assigned maddyblue Dec 21, 2023

maddyblue added a commit to maddyblue/materialize that referenced this issue Jan 11, 2024

ore: randomize connection ids

e4f51c9

This will decrease their predictability, which is desirable for the balancer. See MaterializeInc#24081 (comment)

maddyblue mentioned this issue Jan 11, 2024

ore: randomize connection ids #24356

Merged

5 tasks

maddyblue mentioned this issue Jan 17, 2024

environmentd: use part of envid in connid #24477

Merged

5 tasks

maddyblue mentioned this issue Jan 18, 2024

balancerd: cancelation broadcasting #24503

Merged

5 tasks

maddyblue mentioned this issue Feb 27, 2024

balancerd: use configmap-based cancellation scheme #25575

Merged

5 tasks

alex-hunt-materialize closed this as completed Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

balancerd: Handle canceling connections #24081

balancerd: Handle canceling connections #24081

ParkMyCar commented Dec 21, 2023

benesch commented Dec 21, 2023

benesch commented Dec 21, 2023

benesch commented Jan 3, 2024 •

edited

ParkMyCar commented Jan 3, 2024

benesch commented Jan 3, 2024

alex-hunt-materialize commented Jan 4, 2024 •

edited

benesch commented Jan 5, 2024

alex-hunt-materialize commented Jan 8, 2024

benesch commented Jan 8, 2024

pH14 commented Jan 10, 2024

maddyblue commented Jan 22, 2024

benesch commented Feb 14, 2024

balancerd: Handle canceling connections #24081

balancerd: Handle canceling connections #24081

Comments

ParkMyCar commented Dec 21, 2023

What version of Materialize are you using?

What is the issue?

benesch commented Dec 21, 2023

benesch commented Dec 21, 2023

benesch commented Jan 3, 2024 • edited

ParkMyCar commented Jan 3, 2024

benesch commented Jan 3, 2024

alex-hunt-materialize commented Jan 4, 2024 • edited

benesch commented Jan 5, 2024

alex-hunt-materialize commented Jan 8, 2024

benesch commented Jan 8, 2024

pH14 commented Jan 10, 2024

maddyblue commented Jan 22, 2024

benesch commented Feb 14, 2024

benesch commented Jan 3, 2024 •

edited

alex-hunt-materialize commented Jan 4, 2024 •

edited