Is there a way to disable query cache (both for CPU and GPU)? #26

dongheuw · 2022-01-13T00:26:20Z

Hi there,

I'm benchmarking OmnisciDB via pyomnisci, is there a way to disable query cache (both for CPU and GPU) between runs for a single query (disabling omnisci to use any result of the previously run queries but letting it use the tables loaded in memory)?

Also, is there a way to set the number of CPU threads used during query execution (on CPU)? Is num-executors the correct flag to set?

Thanks so much,
Dong

cdessanti · 2023-01-31T11:05:53Z

Hi @dongheuw ,

Sorry for the extremely late response.

There isn't any proper automatic result cache in the database; the speedup you get between the first and subsequent runs is because of a series of factors. The most important ones are the following.

all the data is loaded in the proper cache layer (on system ram for CPU execution, on VRAM for GPU execution).
the llvm code that serves the query has been generated, compiled, and cached.
The hash indexes have been calculated and cached in case a join is occurring.
other minors.

To test that, try running some queries and changing the filter. As an example using the NYC Open data of yellow cab.

this query
heavysql> select vendorid,extract(month from tpep_pickup_datetime), sum(total_amount), avg(passenger_count) from yellow_tripdata where tpep_pickup_datetime between timestamp '2010-01-01 00:00:00' and timestamp '2011-01-01 00:00:00' group by 1,2;

┌────────┬──────┬───────────┬─────────────────┐
│vendorid│EXPR$1│   EXPR$2  │      EXPR$3     │
├────────┼──────┼───────────┼─────────────────┤
│VTS     │     4│88972158.69│2.040251185236914│
[CUT RESULTS]
│VTS     │    12│87473959.67│2.039379561238744│
│CMT     │    12│78540890.76│ 1.31927503873061│
└────────┴──────┴───────────┴─────────────────┘
 22 rows returned. Exec: 1781 ms, Total: 1875 ms

it took 1781 ms to parse, optimize, generate the llvm code, read around 3GB data from disk to system memory, and execute the query.

heavysql> \memory_summary
HeavyDB Server CPU Memory Summary:
            MAX            USE      ALLOCATED           FREE
   51.411,41 MB    2.938,84 MB    8.192,00 MB    5.253,16 MB

Clearing the caches and re-running the query it takes 300ms less, because the 0plan generation step has been

heavysql> alter system clear cpu memory;
heavysql> select vendorid,extract(month from tpep_pickup_datetime), sum(total_amount), avg(passenger_count) from yellow_tripdata where tpep_pickup_datetime between timestamp '2010-01-01 00:00:00' and timestamp '2011-01-01 00:00:00' group by 1,2;
┌────────┬──────┬─────────────┬─────────────────┐
│vendorid│EXPR$1│    EXPR$2   │      EXPR$3     │
├────────┼──────┼─────────────┼─────────────────┤
│VTS     │     4│88.972.158,69│2,040251185236914│
[CUT RESULT]
│VTS     │    12│87.473.959,67│2,039379561238744│
│CMT     │    12│78.540.890,76│ 1,31927503873061│
└────────┴──────┴─────────────┴─────────────────┘
 22 rows returned. Exec: 1423 ms, Total: 1433 ms

re-running the query with everything in the cache, takes just 1000ms less

heavysql> select vendorid,extract(month from tpep_pickup_datetime), sum(total_amount), avg(passenger_count) from yellow_tripdata where tpep_pickup_datetime between timestamp '2010-01-01 00:00:00' and timestamp '2011-01-01 00:00:00' group by 1,2;
[Omitted results]
22 rows returned. Exec: 731 ms, Total: 732 ms

If we change the filters, the query will reuse the plan and the data cached in memory, so it will take more or less the same

heavysql> select vendorid,extract(month from tpep_pickup_datetime), sum(total_amount), avg(passenger_count) from yellow_tripdata where tpep_pickup_datetime between timestamp '2010-01-02 00:00:00' and timestamp '2010-12-31 00:00:00' group by 1,2;
[Results omitted]
22 rows returned. Exec: 767 ms, Total: 768 ms

At the time of writing, the number of cores used to run a query is determined by the number of fragments processed by the query. The default fragment size is 32 Million, so a table of 1B of rows will be of 31 fragments, and the queries will use up to 31 threads to run the queries.

The num-executor flag isn't related to the number of CPU threads used to run the query, and in the latest version of the software we added the option to run queries concurrently on CPU and on CPU and GPUs, so depending on the workload is better having less threads run per query rather than all the resources used for a single query.

If you have any further question, let me know.

Candido

dongheuw · 2023-01-31T22:46:20Z

Thank you! I will let you know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to disable query cache (both for CPU and GPU)? #26

Is there a way to disable query cache (both for CPU and GPU)? #26

dongheuw commented Jan 13, 2022 •

edited

cdessanti commented Jan 31, 2023

dongheuw commented Jan 31, 2023

Is there a way to disable query cache (both for CPU and GPU)? #26

Is there a way to disable query cache (both for CPU and GPU)? #26

Comments

dongheuw commented Jan 13, 2022 • edited

cdessanti commented Jan 31, 2023

dongheuw commented Jan 31, 2023

dongheuw commented Jan 13, 2022 •

edited