-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to disable query cache (both for CPU and GPU)? #26
Comments
Hi @dongheuw , Sorry for the extremely late response. There isn't any proper automatic result cache in the database; the speedup you get between the first and subsequent runs is because of a series of factors. The most important ones are the following.
To test that, try running some queries and changing the filter. As an example using the NYC Open data of yellow cab. this query
it took 1781 ms to parse, optimize, generate the llvm code, read around 3GB data from disk to system memory, and execute the query.
Clearing the caches and re-running the query it takes 300ms less, because the 0plan generation step has been
re-running the query with everything in the cache, takes just 1000ms less heavysql> select vendorid,extract(month from tpep_pickup_datetime), sum(total_amount), avg(passenger_count) from yellow_tripdata where tpep_pickup_datetime between timestamp '2010-01-01 00:00:00' and timestamp '2011-01-01 00:00:00' group by 1,2; If we change the filters, the query will reuse the plan and the data cached in memory, so it will take more or less the same
At the time of writing, the number of cores used to run a query is determined by the number of fragments processed by the query. The default fragment size is 32 Million, so a table of 1B of rows will be of 31 fragments, and the queries will use up to 31 threads to run the queries. The num-executor flag isn't related to the number of CPU threads used to run the query, and in the latest version of the software we added the option to run queries concurrently on CPU and on CPU and GPUs, so depending on the workload is better having less threads run per query rather than all the resources used for a single query. If you have any further question, let me know. Candido |
Thank you! I will let you know! |
Hi there,
I'm benchmarking OmnisciDB via pyomnisci, is there a way to disable query cache (both for CPU and GPU) between runs for a single query (disabling omnisci to use any result of the previously run queries but letting it use the tables loaded in memory)?
Also, is there a way to set the number of CPU threads used during query execution (on CPU)? Is num-executors the correct flag to set?
Thanks so much,
Dong
The text was updated successfully, but these errors were encountered: