-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FriendlySQL] Unpacked COLUMNS() Expression #11872
base: feature
Are you sure you want to change the base?
Conversation
… with unpacked columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good - some comments:
|
||
# IN (...) | ||
query I | ||
select 2 in (*COLUMNS(*)) from (select 1, 2, 3) t(a, b, c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add some more tests?
- Combining
COLUMNS(*)
with*COLUMNS(*)
- Multiple
*COLUMNS(*)
in the same statement? *COLUMNS(*)
inside the lambda of a*COLUMNS(*)
statement?- Can we test this with a varargs function, e.g.
CONCAT
? - Can we try
*COLUMNS(*) + 42
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that COLUMNS(COLUMNS(COLUMNS(*)))
was allowed, because it wasn't checking for "columns" in the transformer, that now properly errors
This was caused by the check that enables COLUMNS(*)
, COLUMNS
is a StarExpression and *
is a StarExpression
When a StarExpression (*) appears in COLUMNS we just merge them.
But that also allowed COLUMNS(COLUMNS(COLUMNS(*)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! Some more comments from my side:
@@ -361,10 +361,13 @@ class Transformer { | |||
public: | |||
static void SetQueryLocation(ParsedExpression &expr, int query_location); | |||
static void SetQueryLocation(TableRef &ref, int query_location); | |||
void SetQuery(const string &query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this again
vector<unique_ptr<ParsedExpression>> &replacements, | ||
ColumnUnpackResult &parent) { | ||
D_ASSERT(expr); | ||
if (expr->GetExpressionClass() == ExpressionClass::STAR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code can be simplified a bit by passing the vector<unique_ptr<ParsedExpression>>
child list in, or using some other callback for star expressions. I don't think the gathering into the ColumnUnpackResult
first is the simplest way of doing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting back into the code, I remember why I went this route
Every input expression can create n
output expressions, in that sense it's kind of like a table-in-out function
GetChild() gives back the list of expressions that the input expression expanded into, this works like an iterator, forwarding it internally on every call:
// Replace children
vector<unique_ptr<ParsedExpression>> new_children;
for (auto &unused : function_expr.children) {
(void)unused;
auto child_expressions = children.GetChild();
for (auto &child : child_expressions) {
new_children.push_back(std::move(child));
}
}
function_expr.children = std::move(new_children);
// Replace FILTER
if (function_expr.filter) {
auto child_expressions = children.GetChild();
if (child_expressions.size() != 1) {
throw NotImplementedException("*COLUMNS(...) is not supported in the filter expression");
}
function_expr.filter = std::move(child_expressions[0]);
}
This is done because ParsedExpressionIterator has no way to provide me with context as to what expression I am iterating over, is it a projection list or are we looking at a filter?
So here I am iterating through the expressions in the same way that the ParsedExpressionIterator does so I can relate the resulting expansion to the destination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I can simplify this if I create an extended version of the ParsedExpressionIterator that not only provides the ParsedExpression but wraps it in a struct that contains the origin of that expression, so I can make replacements directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually I don't think so
Let's say we're replacing one child of the children of a FunctionExpression.
We can't insert into this vector directly, that would invalidate the iterator that we're using to iterate through it
This is not a problem if we were only doing a 1->1 replacement, then we can change the unique_ptr without disturbing the vector, but if we replace with 2 or more expressions we can't get away with this.
break; | ||
} | ||
default: { | ||
throw BinderException("Unpacked columns (*COLUMNS(...)) are not allowed in this expression"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to provide the actual expression type here?
} | ||
auto unpacked_expressions = children.GetChild(); | ||
for (auto &unpacked_expr : unpacked_expressions) { | ||
new_select_list.push_back(std::move(unpacked_expr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be able to use *COLUMNS(*)
at the root level so this should never result in more than 1 expression
expr = make_uniq<ConstantExpression>(Value::LIST(LogicalType::VARCHAR, values)); | ||
return true; | ||
} | ||
if (in_columns) { | ||
throw BinderException("COLUMNS expression is not allowed inside another COLUMNS expression"); | ||
throw BinderException("(*)COLUMNS expression is not allowed inside another (*)COLUMNS expression"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave this error message in the original format? I don't think we need to treat *COLUMNS(*)
as something separate from COLUMNS
throw BinderException(*expr, | ||
"Multiple different STAR/COLUMNS in the same expression are not supported"); | ||
throw BinderException( | ||
*expr, "Multiple different STAR/COLUMNS/*COLUMNS in the same expression are not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
This PR adds the
*COLUMNS(...)
expression that changes the behavior of the existingCOLUMNS(...)
expression.Instead of expanding vertically, instead the list expands horizontally.
To explain this in more detail, here's some SQL:
COLUMNS behavior
When the
COLUMNS
expression is expanded, the resulting query becomes:The result of this query is:
*COLUMNS behavior
When the
*COLUMNS
expression is expanded, the resulting query becomes:The result of this query is: