Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CosmosClient initialization ergonomics #4405

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

onionhammer
Copy link
Contributor

Pull Request Template

Description

Improves real-world ergonomics over using CreateAndInitializeAsync, by making InitializeContainersAsync public.

serviceBuilder.AddSingleton(services =>
{
  return CosmosClient.CreateAndInitializeAsync(...containers...).GetAwaiter().GetResult();
});

A better option would be

serviceBuilder.AddSingleton(services =>
{
  var client = new CosmosClient(...);
  _ = client.InitializeContainersAsync(... containers ...);
  return client;
});

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • [*] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [*] This change requires a documentation update

Closing issues

To automatically close an issue: closes #4404

@kirankumarkolli
Copy link
Member

Both the async and the callers needs to wait. Is the challenge here about calling async over sync?

At-least the factory pattern is guaranteeing

  • Client is pre-initialized otherwise it becomes a convention
  • Initialization called once (in-correct usage pattern are common)

Overall there is very little value with the side-affect of initialization called multiples.

@onionhammer
Copy link
Contributor Author

onionhammer commented Apr 12, 2024

Both the async and the callers needs to wait.

@kirankumarkolli Not sure what you mean by this? I don't believe the SDK cares; you can grab things from containers while this initialize function is still going on.

I've been struggling to think of another Microsoft SDK that has something like this; an async factory method to initialize the client. This should be done in the background, and this PR gives the user' that opportunity, it doesn't prescribe an opinionated way of how to do it.

Take EntityFramework as an example; it has extension methods to add the DbContext to the servicebuilder; where is the async dbContext.Database.Migrate() function?

The cosmos SDK's CreateAndInitializeAsync tries to have it two ways:

  1. You want a call to be async so other things can happen at the same time on the same thread (thats what async is for)
  2. You have to call this factory method only once at initialization

Which is it?

@ealsur
Copy link
Member

ealsur commented Apr 12, 2024

Let me try to chip in here and see if I can shed some light into the topic.

The reason CreateAndInitializeAsync exists is to fulfill the requirement: "I want to use the SDK client once it is initialized because in my business scenario I cannot pay the initialization cost during the first couple of data plane operations". For this requirement, the client instance needs to be initialized, any failure initializing cannot be ignored, otherwise it would break the premise of having an initialized client, either you get an initialized CosmosClient instance or an error.

you can grab things from containers while this initialize function is still going on.

If your application can handle this, which means that you are fine paying the cost of initialization during data plane calls, then why not simply create a client instance and let data plane initialize it, paying the latency cost hit at that time? Because that is what will happen if you ignore and not await the initialization, right? Triggering the initialization as a background process does not ensure anything (that the initialization completes successfully and when), so what's the point?

@onionhammer
Copy link
Contributor Author

onionhammer commented Apr 12, 2024

Let me try to chip in here and see if I can shed some light into the topic.

The reason CreateAndInitializeAsync exists is to fulfill the requirement: "I want to use the SDK client once it is initialized because in my business scenario I cannot pay the initialization cost during the first couple of data plane operations". For this requirement, the client instance needs to be initialized, any failure initializing cannot be ignored, otherwise it would break the premise of having an initialized client, either you get an initialized CosmosClient instance or an error.

Then it should be sync, not async

you can grab things from containers while this initialize function is still going on.

If your application can handle this, which means that you are fine paying the cost of initialization during data plane calls, then why not simply create a client instance and let data plane initialize it, paying the latency cost hit at that time? Because that is what will happen if you ignore and not await the initialization, right? Triggering the initialization as a background process does not ensure anything (that the initialization completes successfully and when), so what's the point?

This PR leaves it up to the implementer to decide what is best for their scenario. The application can still hold off until the the client is fully initialized by awaiting the result. If you don't pre-initialize ALL your containers, then EVERY operation that happens that accesses an uninitialized container will have to pay that cost on first-run.

If you have a /startup healthcheck endpoint that waits until the container connections are initialized, for instance, that app wont receive any traffic until that returns true.

This just gives you more choices.

@ealsur
Copy link
Member

ealsur commented Apr 12, 2024

Then it should be sync, not async

Can you expand on that? Initialization requires I/O calls, they are async by nature. If the method is sync then it would block the threads during those executions? You can make it sync if you want with GetAwaiter().GetResult() but it has the same issues as if the SDK API would be sync, so why would the SDK API be sync? Not all environments have the limitation that they cannot perform async calls during initialization so why do it?

@ealsur
Copy link
Member

ealsur commented Apr 12, 2024

This PR leaves it up to the implementer to decide what is best for their scenario.

The point is that this freedom exists today already. Create the client and move on or await the CreateAndInitialize. Either you need the client to be fully initialized or you don't. The proposed API and usage of it in this PR is simply doing a background initialization that might or might not benefit the data plane calls that the application does afterwards. With that level of uncertainty in the outcome, what's the benefit?

@ealsur
Copy link
Member

ealsur commented Apr 12, 2024

serviceBuilder.AddSingleton(services =>
{
  var client = new CosmosClient(...);
  _ = client.GetContainer().ReadItem(); // <-- for each container you want
  return client;
});

Would achieve the same, a background initialization of the connections with no guarantees.

@onionhammer
Copy link
Contributor Author

onionhammer commented Apr 12, 2024

Would achieve the same, a background initialization of the connections with no guarantees.

@ealsur only for a single container. Suppose you have 10 API endpoints, one for each of 10 containers; you can either pay the cost once very early on in the application's lifecycle, or you can pay it 10 times, one for every time a new container is called.

_ = client.GetContainer().ReadItem(); // <-- for each container you want

  1. If that was all there was to it, why wouldn't that be how CreateAndInitializeAsync was doing it underneath the sheets? I suspect ReadItem() would only find a single partition.
  2. This PR does that more ergonomically, and retains the existing logic for users who still want to block everything else.

Agreeing/disagreeing with this PR essentially comes down to three viewpoints:

  1. For: Users should have choice, where they can initialize a connection to multiple containers early in the lifecycle of their app, or block their app from being initialized with async-over-sync
  2. Against: No, I'm right and anyone who disagrees is wrong; users should not be given that choice.
  3. Against: Users could be given a choice, but it's already 'good enough', and we don't need the code flux

The view of the members here appears to fall under 2.

@kirankumarkolli
Copy link
Member

With current public API, few ways applications can achieve the goals are

Lazy initialization:

  • Create CosmosClient but caches and connections are initialized on demand
  • Cold client initialization might impact some requests latency which are first to a Container/Partition
  • It's still functional and most CX do today use this pattern successfully (as most initialization overhead is amortized and max impacts tail latency)
// Initialization/Startup block
CosmosClient client = new CosmosClient();

// Data flows
var result = await client.GetContainer(database, container).ReadItemAsync();

It's also possible for startup/initialization to hold the Container object as well.

Pre-initialize clients:

  • Even any start-up or tail latency is not acceptable from application perspective
  • Initialization:
    • Involves NW IO with service interactions and creating connections.
    • Initialization efforts required is proportional to the opted collections partitions.
    • Any number of collections from the account can be initialized
    • Its async call and the caller needs to wait
      • Why not sync API:
        • Sync API will not be conveying the underlying unit of work it needs to do.
        • Even as convenience what SDK needs to do it just Task.Wait() or Task.Result (not much value)

Option1:

// Initialization/Startup block
IReadOnlyList<(string databaseId, string containerId)> containers = ...;
CosmosClient client = new CosmosClient.CreateAndInitializeAsync(, containers);

// Data flows
var result = await client.GetContainer(database, container).ReadItemAsync();

Option2:

// Initialization/Startup block
CosmosClientBuilder clientBuilder = new CosmosClientBuilder();
IReadOnlyList<(string databaseId, string containerId)> containers = ...;

// Async startup
CosmosClient client = await clientBuilder.BuildAndInitializeAsync(containers);

// Sync startup
CosmosClient client = clientBuilder.BuildAndInitializeAsync(containers).Result;

// Data flows
var result = await client.GetContainer(database, container).ReadItemAsync();

Overall the application developers can achieve both scenarios.
Also builder pattern do also help make the code ergonomics similar to the proposal above.

The one possible corner scenario is creating client sync but application initialize it in background. This model is also non-deterministic in behavior the application. We would like to understand this better if this what is referred above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve client initialization ergonomics
3 participants