Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing: Adds Parallel Request Hedging #4198

Open
wants to merge 125 commits into
base: master
Choose a base branch
from

Conversation

NaluTripician
Copy link
Contributor

@NaluTripician NaluTripician commented Nov 27, 2023

Pull Request Template

Description

See issue #3782.

The goal of this PR is to introduce parallel request hedging + availability strategies to the .NET SDK. Users will be able to create an availability strategy with a threshold specifying when the request hedging triggers as well as a step time which will indicate how often after the availability strategy triggers parallel requests will be sent out. If the step is 0, only one parallel request will be sent out. If set to TimeSpan.0 then only all parallel requests will be sent out simutaniously.

Design

Sending out parallel requests will be done in the RequestInovkerHandler. Before a request is sent out first, we check to see if a request can be sent out with parallel request hedging. Currently, only document read requests can use this feature.

Next, the request is cloned, and parallel requests are routed to all available read regions by setting the exclude regions property in the request options. Finally, all requests are sent out with the appropriate delay and once the SDK receives a response, all in flight parallel requests are canceled.

Parallel Hedging

When Building a new CosmosClient there will be an option to include Parallel hedging in that client.

CosmosClient client = new CosmosClientBuilder("connection string")
    .WithAvailabilityStrategy(
        new ParallelHedging(
            threshold: TimeSpan.FromMilliseconds(500)))
    .Build();

or

CosmosClientOptions options = new CosmosClientOptions()
{
    AvailabilityStrategyOptions
     = new AvailabilityStrategyOptions(
        new ParallelHedging(
            threshold: TimeSpan.FromMilliseconds(500)))
};

CosmosClient client = new CosmosClient(
    accountEndpoint: "account endpoint",
    authKeyOrResourceToken: "auth key or resource token",
    clientOptions: options);

The example above will create a CosmosClient instance with AvailabilityStrategy enabled with at 500ms threhshold. This means that if a request takes longer than 500ms the SDK will send a new request to the backend in order of the Preferred Regions List. If still no response comes back after the step time, another parallel request will be made to the next region. The SDK will then return the first response that comes back from the backend. The threshold parameter is a required parameter can be set to any value greater than 0. There will also be options to specify all options for the AvailabilityStrategyOptions object at request level and enable or disable at request level. If no client level AvailabilityStrategy is set, adding AvailabilityStrategyOptions to the request options will allow the request to use an AvailabilityStrategy.

Override AvailabilityStrategy:

RequestOptions requestOptions = new RequestOptions()
{
    AvailabilityStrategyOptions = new AvailabilityStrategyOptions(new ParallelHedging( TimeSpan.FromMilliseconds(400)))
};

Disabling availability strategy:

RequestOptions requestOptions = new RequestOptions()
{
    AvailabilityStrategyOptions = new AvailabilityStrategyOptions(new DisableStrategy(), enabled: false)
};

Type of change

Please delete options that are not relevant.

  • [] New feature (non-breaking change which adds functionality)
  • [] This change requires a documentation update

Closing issues

To automatically close an issue: closes #IssueNumber

@NaluTripician NaluTripician marked this pull request as ready for review December 28, 2023 18:14
@NaluTripician NaluTripician changed the title Routing: Adds Parallel Request Hedging in Preview Mode Routing: Adds Parallel Request Hedging Dec 28, 2023
CancellationTokenSource cancellationTokenSource)
{
RequestMessage clonedRequest;
using (clonedRequest = request.Clone(request.Trace.Parent))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: Simpler model might be to also support include region as concept.
That way the caller here can set the exact one region.

The list creations are expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in Java as well - the reason why we only allowed excludedRegions is that when you allow the caller to specify a random region you need to come-up with a new error in case that region is not a valid one anymore. At least for public surface area I still think that was a good idea. If some internal API really helps with perf -that is an option. But I would get a CPU profile before making that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Reviewer_Waiting
Development

Successfully merging this pull request may close these issues.

None yet

7 participants