Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner/spannertest: support STARTS_WITH #4661

Closed
sryoya opened this issue Aug 23, 2021 · 6 comments · Fixed by #4670
Closed

spanner/spannertest: support STARTS_WITH #4661

sryoya opened this issue Aug 23, 2021 · 6 comments · Fixed by #4670
Assignees
Labels
api: spanner Issues related to the Spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@sryoya
Copy link
Contributor

sryoya commented Aug 23, 2021

Is your feature request related to a problem? Please describe.
I suppose spannertest doesn't support the STARTS_WITH clause yet.
It would be great if it's supported.
As the official doc recommends using it rather than LIKE for a forward match, there should be certain motivation to use it.

Describe the solution you'd like
I tried to implement it, but, due to the following characteristic of STARTS_WITH, it looks there is a certain point to consider in spansql.

  • STARTS_WITH is a kind of comparison clause same as LIKE and BETWEEN
    But,
  • It takes both comparison targets as arguments unlike the other comparison clauses take one of them before the clause.

Due to the above, we cannot simply parse it as with the other comparison clauses even though what it does is the same as them.

I think there are two implementation options to handle that (I suppose the first one is better).

  • Create another Op struct and the parsing func for STARTS_WITH.
  • Put the parsing code for STARTS_WITH in the same func with the other comparison clauses, and add code in the parent parsing funcs so that STARTS_WITH won't be parsed in the different code.
@sryoya sryoya added the triage me I really want to be triaged. label Aug 23, 2021
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Aug 23, 2021
@hengfengli hengfengli added priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Aug 23, 2021
@hengfengli hengfengli assigned olavloite and unassigned hengfengli Aug 23, 2021
@hengfengli
Copy link
Contributor

hengfengli commented Aug 23, 2021

@olavloite Any thoughts about this?

@olavloite
Copy link
Contributor

STARTS_WITH is a function that will return a BOOL, so I don't think we should parse this in the same way as normal comparison operators. There are are a lot of other functions that also return BOOL, and all other functions could in theory also be used in comparisons, so the correct way to implement this would be:

  1. Add generic support for parsing function expressions (currently the highest item on the 'not implemented' list: https://github.com/googleapis/google-cloud-go/blob/master/spanner/spannertest/README.md)
  2. Add an implementation for this specific function.

By doing the above you would get a solution that would work automatically, as the implementation of STARTS_WITH would be an expression that returns a BOOL, and the further evaluation of the entire query expression would just use that as any other expression that would return a BOOL value.

@sryoya
Copy link
Contributor Author

sryoya commented Aug 23, 2021

@olavloite Thanks for the answer.
I agree with your view that STARTS_WITH should be treated as a function that returns BOOL rather than a comparison operator.

Also, your solution looks fine and I'm trying to implement with that way.

Let me ask you a bit about implementing the generic support.
Is it ok to have a dedicated operator for functions that return BOOL ?
spansql already has the Func struct, which seems the generic operator for function expressions. But, what you meant and "function expressions" in the "not implemented" list are different from that, I guess?
I feel we may want to have the dedicated operator for functions that returns BOOL apart from this struct as it doesn't know if an operation has the "NOT" clause and there are a lot of functions that also return BOOL

@olavloite
Copy link
Contributor

Is it ok to have a dedicated operator for functions that return BOOL ?

I'm not entirely sure I understand what you mean by that, but more in general: I don't think you need to (or should) implement function parsing differently for functions that return a BOOL compared to any other function. A function call is an expression that in the end will just return a value, just like any other expression. If that value is part of a SELECT list, then the value should be returned in the query result. If the value is used in a WHERE condition, then the value should be used in the comparison that you were already parsing.

spansql already has the Func struct, which seems the generic operator for function expressions. But, what you meant and "function expressions" in the "not implemented" list are different from that, I guess?

No, that is what I meant, but there seems to be more implemented already than I thought. So there is a bare-bone implementation for functions already, but it's very limited at the moment:

  1. The only functions that are implemented are aggregate functions (COUNT, SUM, ...):
    var aggregateFuncs = map[string]aggregateFunc{
  2. It seems to only be used for parsing functions in the SELECT expression list:
    if name := strings.ToUpper(tok.value); funcs[name] && p.sniff("(") {
    . It should however also work (at least in theory) in a WHERE clause.

The above means that:

  1. A statement like SELECT COUNT(*) FROM Foo will work.
  2. A statement like SELECT STARTS_WITH(Bar, 'B') FROM Foo will (probably) be recognized as a valid statement with a function expression in the select, but there is no implementation for the function STARTS_WITH, so the query engine has no idea what value to return.
  3. A statement like SELECT * FROM Foo WHERE STARTS_WITH(Bar, 'B') will (probably) also be recognized as a valid statement, but will also fail as the query engine does not know how to calculate the value of the function expression for each row in the table.

I feel we may want to have the dedicated operator for functions that returns BOOL apart from this struct as it doesn't know if an operation has the "NOT" clause and there are a lot of functions that also return BOOL

No. Parsing a function should be completely independent of any operators before or after the function itself. The expression that is returned by the parser is added to the evaluation tree, just as the NOT operator that preceded it, and the evaluator is responsible for evaluating the entire tree once all expressions have been parsed.

You should think of a function exactly as any other expression, such as for example a constant literal (for example the boolean value TRUE). Any literal in a SQL string could be replaced by a random function call and the parser should not care or know. All it does is parse the different expressions and add it to the expression tree. The entire tree will then be evaluated once all expressions have been parsed.

Consider the following two statements:

SELECT * FROM Foo WHERE NOT STARTS_WITH(Bar, 'B')

SELECT * FROM Foo WHERE NOT TRUE

There's nothing different or magical about the NOT before the STARTS_WITH function call compared to the NOT before the literal TRUE.

olavloite added a commit that referenced this issue Aug 24, 2021
- Adds support for the STARTS_WITH function.
- Adds an example for how further functions can be implemented.

Fixes #4661
@olavloite
Copy link
Contributor

@sryoya I've added an implementation in #4670 for this specific function. I think that also shows how other functions could be implemented if there are additional functions that you are using and would like to include in the testing using spansql/spannertest.

@sryoya
Copy link
Contributor Author

sryoya commented Aug 24, 2021

Thanks for the very clear and detailed explanation 😆 Now I understand the design concept of the SQL parsing for function expression.

Also, thanks for #4670. According to your explanation, your solution looks reasonable.
Looking forward to seeing this PR merged and released 🐈

hengfengli added a commit that referenced this issue Aug 25, 2021
- Adds support for the STARTS_WITH function.
- Adds an example for how further functions can be implemented.

Fixes #4661

Co-authored-by: Hengfeng Li <hengfeng@google.com>
BrennaEpp pushed a commit to BrennaEpp/google-cloud-go that referenced this issue Aug 27, 2021
…is#4670)

- Adds support for the STARTS_WITH function.
- Adds an example for how further functions can be implemented.

Fixes googleapis#4661

Co-authored-by: Hengfeng Li <hengfeng@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants