Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanner latency issue on streaming read #2757

Open
amb-sebastian-podgorski-pt opened this issue Dec 11, 2023 · 7 comments
Open

Spanner latency issue on streaming read #2757

amb-sebastian-podgorski-pt opened this issue Dec 11, 2023 · 7 comments
Assignees
Labels
api: spanner Issues related to the googleapis/java-spanner API.

Comments

@amb-sebastian-podgorski-pt

I found our service has problems with latency and every single read statements is taking more than 40ms.

I tried to use native/derived queries and it always takes more than 40ms.

Reading data is really fast, it is taking below 1ms but from logs I can see that establishing grcp connection to Spanner takes most of the time.

logs from reading

10:16:36.485 [http-nio-8080-exec-2] [,] DEBUG o.s.web.servlet.DispatcherServlet - GET "/test", parameters={}
10:16:36.485 [http-nio-8080-exec-2] [,] DEBUG o.s.w.s.m.m.a.RequestMappingHandlerMapping - Mapped to edu.Controller#test()
10:16:36.485 [http-nio-8080-exec-2] [,] DEBUG o.s.o.j.s.OpenEntityManagerInViewInterceptor - Opening JPA EntityManager in OpenEntityManagerInViewInterceptor
10:16:36.485 [http-nio-8080-exec-2] [,] DEBUG c.g.c.s.d.s.c.SpannerTransactionManager - Creating new transaction with name [Repository.findById]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT,readOnly
10:16:36.485 [http-nio-8080-exec-2] [,] DEBUG c.g.c.s.d.s.core.SpannerTemplate - Executing read on table test_table with keys: {[6fb703b4-59b4-4381-abd5-0594924c093c]} and columns: id, name
10:16:36.485 [http-nio-8080-exec-2] [,] DEBUG c.g.c.s.d.s.core.SpannerTemplate - Read elapsed milliseconds: 0
10:16:36.487 [grpc-default-worker-ELG-1-5] [,] DEBUG i.g.n.s.i.g.netty.NettyClientHandler - [id: 0xea23po983, L:/192.168.1.1:50000 - R:spanner.googleapis.com/200.200.200.200:443] OUTBOUND HEADERS: streamId=11 headers=GrpcHttp2OutboundHeaders[:authority: spanner.googleapis.com:443, :path: /google.spanner.v1.Spanner/BeginTransaction, :method: POST, :scheme: https
10:16:36.487 [grpc-default-worker-ELG-1-5] [,] DEBUG i.g.n.s.i.g.netty.NettyClientHandler - [id: 0xea23po983, L:/192.168.1.1:50000 - R:spanner.googleapis.com/200.200.200.200:443] OUTBOUND DATA:
10:16:36.525 [grpc-default-worker-ELG-1-5] [,] DEBUG i.g.n.s.i.g.netty.NettyClientHandler - [id: 0xea23po983, L:/192.168.1.1:50000 - R:spanner.googleapis.com/200.200.200.200:443] INBOUND HEADERS:

as you can see from timestamps of the last two logs it takes 39ms, I tried to use read only transactions, makes read without transaction, set rcp priority on SpannerTemplate to low but nothing helps and simple call to rest endpoint which is fetching object by id takes more than 40ms when reading from db takes less than 0ms.

Application details

Java 17
google-cloud-spanner - 6.55
spring-cloud-gcp-starter-data-spanner - 4.8.4

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/java-spanner API. label Dec 11, 2023
@arpan14
Copy link
Collaborator

arpan14 commented Dec 11, 2023

A few queries to start investigating

  1. What's your session configuration? Please provide the min/max sessions which you are configuring using SessionPoolOptions in SpannerOptions.
  2. What is the max/average QPS for your application?
  3. What is the latency percentile? Can you provide details about the p99, p95 and p50?
  4. Are you facing troubles post using a new client version? Or is this a general problem which has always been present?
  5. Could you provide some details about your schema and a sample query which you are executing?

@amb-sebastian-podgorski-pt
Copy link
Author

  1. Default values (max = -1, min = -1)
  2. I don't know yet. I am trying to achieve 1k request per sec for one instance now it is around 100 RPS
  3. p50 - 85ms, p95 - 136ms, p99 - 190ms
  4. It is general problem, I've created a simple app with really simple table and I am facing same issue
  5. Schema is really basic, some strings / integers, query has to be read to face this issue

it looks like for read queries fetching meta data using stream reading takes up to 50ms each time, reading elapsed time is from 0 to 1ms but fetching metadata takes more most of the time.

I also tried native queries / derived / using jdbc and still I am facing same issue.

Exmaple query

select * from test_table limit 1;

@amb-sebastian-podgorski-pt
Copy link
Author

For every query stream seems to be in state not_ready what causes starting new stream for every request and it fetches metadata about the table using streaming and it takes more than 40ms which seems to be very long I don't know.
As long as you don't call next(); on result set then it is fine but calling next function on resultSet from package com.google.cloud.spanner causes starting this long stream process

@arpan14
Copy link
Collaborator

arpan14 commented Dec 19, 2023

The default session pool options have 100 min sessions and 400 max sessions. Can you share with me the code snippet how the client object is being created?

@amb-sebastian-podgorski-pt
Copy link
Author

@arpan14 the client is being created with default values using spring-cloud-gcp-autoconfigure:5.0.0

config

The values are set to -1 for min and max connections

@olavloite
Copy link
Collaborator

@amb-sebastian-podgorski-pt I think that there is an underlying misunderstanding here.

| Reading data is really fast, it is taking below 1ms but from logs I can see that establishing grcp connection to Spanner takes most of the time.

This observation is where the misunderstanding is. The Java client does not execute the query directly when you call the executeQuery(..) method. Instead, the execution of the query is delayed until the first call to ResultSet#next(). That means that:

  1. The less-than-1-ms time that you are seeing when calling executeQuery(..) is not and indication that reading the data is really fast. It is only an indication that the client library is directly ready to execute the query.
  2. The 40ms time that you are seeing the first time that you are calling ResultSet#next() is the actual query time + network latency.

40ms is however a too high value for a very simple query, which indicates that there might be more going on here. It is not clear to me whether the select * from test limit 1 query is the one that you are testing with in all cases, but the best way to test query latency is by:

  1. Execute a query like select * from test where id=@id (so use a parameterized query to select one row based on the primary key).
  2. Make sure you execute the query from a Google Cloud Compute Engine VM (or similar) in the same region as where your Cloud Spanner database is. That will minimize network latency, and also reduce any randomness in network latency that might be introduced if you run from for example your local laptop.

Using the above strategy should bring a test query like the above down to around 5ms. Note that the very first query might be slower, as the first time that Cloud Spanner sees a query, it needs to parse and plan it. Subsequent executions will re-use the plan that has already been compiled and cached.

@arpan14 arpan14 assigned rahul2393 and unassigned arpan14 Apr 15, 2024
@olavloite
Copy link
Collaborator

@amb-sebastian-podgorski-pt Is this still an issue? If so, could you try the steps that I described in the previous comment to verify the actual latency for your query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/java-spanner API.
Projects
None yet
Development

No branches or pull requests

4 participants