Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FriendlySQL] Unpacked COLUMNS() Expression #11872

Draft
wants to merge 15 commits into
base: feature
Choose a base branch
from

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Apr 30, 2024

This PR adds the *COLUMNS(...) expression that changes the behavior of the existing COLUMNS(...) expression.
Instead of expanding vertically, instead the list expands horizontally.

To explain this in more detail, here's some SQL:

COLUMNS behavior

select COALESCE(COLUMNS(*)) from (select NULL, 2, 3) t(a, b, c)

When the COLUMNS expression is expanded, the resulting query becomes:

select COALESCE(a) a, COALESCE(b) b, COALESCE(c) c from (select NULL, 2, 3) t(a, b, c)

The result of this query is:

┌───────┬───────┬───────┐
│   a   │   b   │   c   │
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│       │     2 │     3 │
└───────┴───────┴───────┘

*COLUMNS behavior

select COALESCE(*COLUMNS(*)) from (select NULL, 2, 3) t(a, b, c)

When the *COLUMNS expression is expanded, the resulting query becomes:

select COALESCE(a, b, c) from (select NULL, 2, 3) t(a, b, c)

The result of this query is:

┌─────────────────────────┐
│ COALESCE(t.a, t.b, t.c) │
│          int32          │
├─────────────────────────┤
│                       2 │
└─────────────────────────┘

@Mytherin Mytherin changed the base branch from main to feature May 3, 2024 14:43
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good - some comments:


# IN (...)
query I
select 2 in (*COLUMNS(*)) from (select 1, 2, 3) t(a, b, c);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some more tests?

  • Combining COLUMNS(*) with *COLUMNS(*)
  • Multiple *COLUMNS(*) in the same statement?
  • *COLUMNS(*) inside the lambda of a *COLUMNS(*) statement?
  • Can we test this with a varargs function, e.g. CONCAT?
  • Can we try *COLUMNS(*) + 42?

Copy link
Contributor Author

@Tishj Tishj May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that COLUMNS(COLUMNS(COLUMNS(*))) was allowed, because it wasn't checking for "columns" in the transformer, that now properly errors

This was caused by the check that enables COLUMNS(*), COLUMNS is a StarExpression and * is a StarExpression

When a StarExpression (*) appears in COLUMNS we just merge them.
But that also allowed COLUMNS(COLUMNS(COLUMNS(*)))

test/sql/parser/test_columns_unpacked.test Show resolved Hide resolved
test/sql/parser/test_columns_unpacked.test Show resolved Hide resolved
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 22, 2024 08:03
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Some more comments from my side:

@@ -361,10 +361,13 @@ class Transformer {
public:
static void SetQueryLocation(ParsedExpression &expr, int query_location);
static void SetQueryLocation(TableRef &ref, int query_location);
void SetQuery(const string &query);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this again

vector<unique_ptr<ParsedExpression>> &replacements,
ColumnUnpackResult &parent) {
D_ASSERT(expr);
if (expr->GetExpressionClass() == ExpressionClass::STAR) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code can be simplified a bit by passing the vector<unique_ptr<ParsedExpression>> child list in, or using some other callback for star expressions. I don't think the gathering into the ColumnUnpackResult first is the simplest way of doing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting back into the code, I remember why I went this route
Every input expression can create n output expressions, in that sense it's kind of like a table-in-out function

GetChild() gives back the list of expressions that the input expression expanded into, this works like an iterator, forwarding it internally on every call:

			// Replace children
			vector<unique_ptr<ParsedExpression>> new_children;
			for (auto &unused : function_expr.children) {
				(void)unused;
				auto child_expressions = children.GetChild();
				for (auto &child : child_expressions) {
					new_children.push_back(std::move(child));
				}
			}
			function_expr.children = std::move(new_children);

			// Replace FILTER
			if (function_expr.filter) {
				auto child_expressions = children.GetChild();
				if (child_expressions.size() != 1) {
					throw NotImplementedException("*COLUMNS(...) is not supported in the filter expression");
				}
				function_expr.filter = std::move(child_expressions[0]);
			}

This is done because ParsedExpressionIterator has no way to provide me with context as to what expression I am iterating over, is it a projection list or are we looking at a filter?
So here I am iterating through the expressions in the same way that the ParsedExpressionIterator does so I can relate the resulting expansion to the destination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I can simplify this if I create an extended version of the ParsedExpressionIterator that not only provides the ParsedExpression but wraps it in a struct that contains the origin of that expression, so I can make replacements directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually I don't think so

Let's say we're replacing one child of the children of a FunctionExpression.
We can't insert into this vector directly, that would invalidate the iterator that we're using to iterate through it

This is not a problem if we were only doing a 1->1 replacement, then we can change the unique_ptr without disturbing the vector, but if we replace with 2 or more expressions we can't get away with this.

break;
}
default: {
throw BinderException("Unpacked columns (*COLUMNS(...)) are not allowed in this expression");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to provide the actual expression type here?

}
auto unpacked_expressions = children.GetChild();
for (auto &unpacked_expr : unpacked_expressions) {
new_select_list.push_back(std::move(unpacked_expr));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be able to use *COLUMNS(*) at the root level so this should never result in more than 1 expression

expr = make_uniq<ConstantExpression>(Value::LIST(LogicalType::VARCHAR, values));
return true;
}
if (in_columns) {
throw BinderException("COLUMNS expression is not allowed inside another COLUMNS expression");
throw BinderException("(*)COLUMNS expression is not allowed inside another (*)COLUMNS expression");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave this error message in the original format? I don't think we need to treat *COLUMNS(*) as something separate from COLUMNS

throw BinderException(*expr,
"Multiple different STAR/COLUMNS in the same expression are not supported");
throw BinderException(
*expr, "Multiple different STAR/COLUMNS/*COLUMNS in the same expression are not supported");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants