Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on timeout options needs clarification #167

Open
timtucker-dte opened this issue Aug 28, 2023 · 13 comments
Open

Documentation on timeout options needs clarification #167

timtucker-dte opened this issue Aug 28, 2023 · 13 comments
Labels
documentation Improvements or additions to documentation

Comments

@timtucker-dte
Copy link

"timeout is the maximum time to execute an operation. It has Buffer type because timestamp in Hive has capacity 64. So for such value, you should use node-int64 npm module."

Right now the documentation just says that the timeout represents "time", but doesn't have any indication of the unit.

Is it the amount of time in milliseconds?
Seconds?
Minutes?

Seems like a pretty easy update to documentation to add in the unit (and potentially an example of setting the timeout).

@koesper
Copy link

koesper commented Aug 30, 2023

I ran into this as well, i'm making the assumption that it should be like this?

const Int64 = require('node-int64');   
await session.executeStatement(query, {runAsync: true, queryTimeout: new Int64('0x000120000')});

And hopefully that then means 2 minutes timeout?

@kravets-levko
Copy link
Collaborator

Hi @timtucker-dte and @koesper! That's true that our docs are not perfect, and we're working on improving them, so your feedback is really valuable to us. Regarding your question: this is how the query timeout option is described in our thrift service definition: The number of seconds after which the query will timeout on the server. I don't think I can add anything to this. Currently you should pass an Int64 instance from node-int64 library - please refer to that library's docs on how to initialize it. I think it's a good idea to make library accept a regular JS number in addition to Int64 - will implement it soon

@xdumaine
Copy link

+1 for improved types and docs on this, as well as accepting a raw number. I'd love to just say { queryTimeout: 300 } for a 5 minute timeout without having to introduce a new library 😬

@xdumaine
Copy link

I'm not seeing that passing a timeout with Int64 is effective - I'm passing a timeout of 1 second, and never getting any error, while queries are taking 3-5 seconds.

@kravets-levko
Copy link
Collaborator

Hi @xdumaine! Can you please share your code where you're trying to use a query timeout?

@xdumaine
Copy link

Definitely, I appreciate you looking.

  const query = await buildDatabricksQuery(input);

  const queryTimeout = options?.timeoutInSeconds || 1;
  const int64Timeout = new Int64(queryTimeout);

  console.log(
    'Running query with timeout',
    queryTimeout,
    int64Timeout.toString()
  ); // both evaluate to `1`

  const queryOperation = await input.databricksSession.executeStatement(query, {
    queryTimeout: int64Timeout,
  });

  const result = (await queryOperation.fetchAll({})) as any[];
  await queryOperation.close();

@kravets-levko
Copy link
Collaborator

kravets-levko commented Mar 14, 2024

@xdumaine Can you please check one more thing for me. So can you please open a query history page, filter by your warehouse, and check the Duration column. It shows the actual query execution time, without various overheads (processing results by server and/or client, data exchange overhead, etc.) - that's the value to which query timeout is applied. await op.executeStatement() often takes more than that due to many reasons, and if you measure the execution time of await op.executeStatement() you most likely will not get the right value

@xdumaine
Copy link

xdumaine commented Mar 14, 2024

image

This is what I see, maybe you can help me understand to which value the timeout is applied? I would expect it to apply to running or executing at least, but it would be ideal if I could also apply it to the scheduling time.

My use case is an API request which as a 5 minute timeout on the ELB, meaning I must respond within 5 minutes. If the databricks query can't complete (due to any reason) within, say 4 minutes, I want to cancel is and return an error.

@kravets-levko
Copy link
Collaborator

Thank you! Yes, you're correct - if you set a 1s timeout, this particular query execution had to be interrupted. I remember that it was definitely working last time I touched this code. It may be some regression, I need to reach the Thrift backend team

@kravets-levko kravets-levko added the documentation Improvements or additions to documentation label Apr 11, 2024
@kravets-levko
Copy link
Collaborator

Hi @xdumaine! Sorry for the delay, sometimes even simple things take longer that expected. Turns out, that this option is actually used, but only on Compute clusters. If you run query against SQL Warehouse, the option is ignored and some server configuration is used instead. We're still discussing this behavior with other Databricks developers, but I doubt that this behavior is going to be changed (at least, in the near future). If you need to have a query timeout with SQL Warehouse - you can try to work it around using timer and cancelling the operation. Sorry for the invonvenience.

P.S. PR to allow regular numbers for a queryTimeout option is coming very soon

@xdumaine
Copy link

Thanks @kravets-levko - I've moved on from this for now, but FWIW, I was not able to work around it using a timer and cancelling the operation, because the handle to cancel the operation isn't returned until after it completes:

/** This does not work */
  
  // start the query
  const queryOperation = await input.databricksSession.executeStatement(query);
  setTimeout(() => {
    // I need a handle on the query operation to cancel it
    // but I don't get that handle until `executeStatement` resolves
    queryOperation.cancel().catch(() => console.error.bind(console))
  }, 5000);
  // the cancel would work if `fetchAll` was the slow part, but it's not, the `executeStatement` is
  const result = (await queryOperation.fetchAll({})) as any[];

Basically - how can I cancel executeStatement if the operation handle isn't accessible until it resolves?

@kravets-levko
Copy link
Collaborator

kravets-levko commented Apr 20, 2024

By default, executeStatement will ask Thrift server to wait for query execution and return a part of data - we call this feature DirectResults. It saves some requests to server and is especially handy with small result sets. But you can disable it by passing maxRows: null to executeStatement - in this case, server will just enqueue your query and return immediately

@kravets-levko
Copy link
Collaborator

kravets-levko commented Apr 22, 2024

Okay, probably the last update on this: this behavior is by design. To configure query timeout for SQL Warehouse, other mechanism is available: https://docs.databricks.com/en/sql/language-manual/parameters/statement_timeout.html

@kravets-levko kravets-levko mentioned this issue Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants