Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treat query files as entire scripts or multiline queries #246

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

vicenteg
Copy link
Contributor

Fixes #244

Read all lines in the query file, and join them to create a single
query or script.

Read all lines in the query file, and join them to create a single
query or script.
@codecov
Copy link

codecov bot commented May 17, 2020

Codecov Report

Merging #246 into dev will not change coverage.
The diff coverage is 0.00%.

@@            Coverage Diff            @@
##                dev     #246   +/-   ##
=========================================
  Coverage     83.01%   83.01%           
  Complexity      203      203           
=========================================
  Files            34       34           
  Lines          1101     1101           
  Branches         66       66           
=========================================
  Hits            914      914           
  Misses          178      178           
  Partials          9        9           
Impacted Files Coverage Δ Complexity Δ
...m/google/cloud/pontem/config/WorkloadSettings.java 44.23% <0.00%> (ø) 10.00 <0.00> (ø)

@ldanielmadariaga
Copy link
Collaborator

Query files by design expect to contain 1 query per line as multiple queries (forming a workload) will usually be included in one file, the proposed change defeats the purpose of the query file.

If you'd like to implement something like this you'd need to have a parsing scheme that identifies whether a line represents a complete query or an unfinished query and adjust parsing accordingly.

@vicenteg
Copy link
Contributor Author

Thanks for looking at this.

Query files by design expect to contain 1 query per line as multiple queries (forming a workload) will usually be included in one file, the proposed change defeats the purpose of the query file.

Here's an example of how I've been configuring workloads:

concurrencyLevel: 5
isRatioBasedBenchmark: true
benchmarkRatios: [0.2, 0.5, 1.0, 2.0]
outputFileFolder: ./
workloads:
- name: my_workload
  projectId: my_project
  queryFiles:
    - queries/query1.sql
    - queries/query2.sql
    - queries/query3.sql

Is this not equivalent to having a single query file with multiple queries?

One query per line makes complex queries and scripts much harder to read, and it requires pre-processing of the query to remove newlines. While this is not necessarily hard to do, it does represent friction.

If you'd like to implement something like this you'd need to have a parsing scheme that identifies whether a line represents a complete query or an unfinished query and adjust parsing accordingly.

Parsing SQL is non-trivial and BigQuery already parses SQL. I'd suggest that a query file's contents should be passed to BigQuery unmodified, so that BigQuery can do the parsing. This is especially important since people may want to test scripts, which are very naturally multiple line files and should be interpreted by BigQuery and not the tool submitting the script.

Diverging from the precedent set by bq is a bit unexpected as well. The bq tool does not parse the query but instead lets BigQuery handle that. For example, a query file like the one below works with bq but would fail with the current implementation of the workload tester:

$ cat <<EOF > /tmp/simplequeries.sql
> SELECT *
> FROM (SELECT 1);
> SELECT 2;
> EOF
$ bq query < /tmp/simplequeries.sql
Waiting on bqjob_r7d03abef82f68ffd_0000017233128fb5_1 ... (1s) Current status: DONE   
SELECT *
FROM (SELECT 1); -- at [1:1]
+-----+
| f0_ |
+-----+
|   1 |
+-----+
SELECT 2; -- at [3:1]
+-----+
| f0_ |
+-----+
|   2 |
+-----+

Thanks for the consideration!

--vince

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WorkloadSettings.readQueries() assumes one query per line
2 participants