[FriendlySQL] Unpacked COLUMNS() Expression #11872

Tishj · 2024-04-30T07:23:32Z

This PR adds the *COLUMNS(...) expression that changes the behavior of the existing COLUMNS(...) expression.
Instead of expanding vertically, instead the list expands horizontally.

To explain this in more detail, here's some SQL:

COLUMNS behavior

select COALESCE(COLUMNS(*)) from (select NULL, 2, 3) t(a, b, c)

When the COLUMNS expression is expanded, the resulting query becomes:

select COALESCE(a) a, COALESCE(b) b, COALESCE(c) c from (select NULL, 2, 3) t(a, b, c)

The result of this query is:

┌───────┬───────┬───────┐
│   a   │   b   │   c   │
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│       │     2 │     3 │
└───────┴───────┴───────┘

*COLUMNS behavior

select COALESCE(*COLUMNS(*)) from (select NULL, 2, 3) t(a, b, c)

When the *COLUMNS expression is expanded, the resulting query becomes:

select COALESCE(a, b, c) from (select NULL, 2, 3) t(a, b, c)

The result of this query is:

┌─────────────────────────┐
│ COALESCE(t.a, t.b, t.c) │
│          int32          │
├─────────────────────────┤
│                       2 │
└─────────────────────────┘

… with unpacked columns

Mytherin

Thanks for the PR! Looks good - some comments:

Mytherin · 2024-04-30T19:49:03Z

test/sql/parser/test_columns_unpacked.test

+
+# IN (...)
+query I
+select 2 in (*COLUMNS(*)) from (select 1, 2, 3) t(a, b, c);


Could we add some more tests?

Combining COLUMNS(*) with *COLUMNS(*)

Multiple *COLUMNS(*) in the same statement?

*COLUMNS(*) inside the lambda of a *COLUMNS(*) statement?

Can we test this with a varargs function, e.g. CONCAT?

Can we try *COLUMNS(*) + 42?

I noticed that COLUMNS(COLUMNS(COLUMNS(*))) was allowed, because it wasn't checking for "columns" in the transformer, that now properly errors

This was caused by the check that enables COLUMNS(*), COLUMNS is a StarExpression and * is a StarExpression

When a StarExpression (*) appears in COLUMNS we just merge them.
But that also allowed COLUMNS(COLUMNS(COLUMNS(*)))

test/sql/parser/test_columns_unpacked.test

Mytherin

Thanks for the changes! Some more comments from my side:

Mytherin · 2024-05-31T11:33:43Z

src/include/duckdb/parser/transformer.hpp

@@ -361,10 +361,13 @@ class Transformer {
 public:
 	static void SetQueryLocation(ParsedExpression &expr, int query_location);
 	static void SetQueryLocation(TableRef &ref, int query_location);
+	void SetQuery(const string &query);


Can we remove this again

Mytherin · 2024-05-31T11:45:36Z

src/planner/binder/expression/bind_star_expression.cpp

+                                           vector<unique_ptr<ParsedExpression>> &replacements,
+                                           ColumnUnpackResult &parent) {
+	D_ASSERT(expr);
+	if (expr->GetExpressionClass() == ExpressionClass::STAR) {


I think this code can be simplified a bit by passing the vector<unique_ptr<ParsedExpression>> child list in, or using some other callback for star expressions. I don't think the gathering into the ColumnUnpackResult first is the simplest way of doing this

Getting back into the code, I remember why I went this route
Every input expression can create n output expressions, in that sense it's kind of like a table-in-out function

GetChild() gives back the list of expressions that the input expression expanded into, this works like an iterator, forwarding it internally on every call:

// Replace children vector<unique_ptr<ParsedExpression>> new_children; for (auto &unused : function_expr.children) { (void)unused; auto child_expressions = children.GetChild(); for (auto &child : child_expressions) { new_children.push_back(std::move(child)); } } function_expr.children = std::move(new_children); // Replace FILTER if (function_expr.filter) { auto child_expressions = children.GetChild(); if (child_expressions.size() != 1) { throw NotImplementedException("*COLUMNS(...) is not supported in the filter expression"); } function_expr.filter = std::move(child_expressions[0]); }

This is done because ParsedExpressionIterator has no way to provide me with context as to what expression I am iterating over, is it a projection list or are we looking at a filter?
So here I am iterating through the expressions in the same way that the ParsedExpressionIterator does so I can relate the resulting expansion to the destination.

Perhaps I can simplify this if I create an extended version of the ParsedExpressionIterator that not only provides the ParsedExpression but wraps it in a struct that contains the origin of that expression, so I can make replacements directly

Hmm actually I don't think so

Let's say we're replacing one child of the children of a FunctionExpression.
We can't insert into this vector directly, that would invalidate the iterator that we're using to iterate through it

This is not a problem if we were only doing a 1->1 replacement, then we can change the unique_ptr without disturbing the vector, but if we replace with 2 or more expressions we can't get away with this.

Mytherin · 2024-05-31T11:46:47Z

src/planner/binder/expression/bind_star_expression.cpp

+			break;
+		}
+		default: {
+			throw BinderException("Unpacked columns (*COLUMNS(...)) are not allowed in this expression");


Maybe we want to provide the actual expression type here?

Mytherin · 2024-05-31T11:47:12Z

src/planner/binder/expression/bind_star_expression.cpp

+		}
+		auto unpacked_expressions = children.GetChild();
+		for (auto &unpacked_expr : unpacked_expressions) {
+			new_select_list.push_back(std::move(unpacked_expr));


We should not be able to use *COLUMNS(*) at the root level so this should never result in more than 1 expression

Mytherin · 2024-05-31T11:48:29Z

src/planner/binder/expression/bind_star_expression.cpp

 			expr = make_uniq<ConstantExpression>(Value::LIST(LogicalType::VARCHAR, values));
 			return true;
 		}
 		if (in_columns) {
-			throw BinderException("COLUMNS expression is not allowed inside another COLUMNS expression");
+			throw BinderException("(*)COLUMNS expression is not allowed inside another (*)COLUMNS expression");


Can we leave this error message in the original format? I don't think we need to treat *COLUMNS(*) as something separate from COLUMNS

Mytherin · 2024-05-31T11:48:34Z

src/planner/binder/expression/bind_star_expression.cpp

-				throw BinderException(*expr,
-				                      "Multiple different STAR/COLUMNS in the same expression are not supported");
+				throw BinderException(
+				    *expr, "Multiple different STAR/COLUMNS/*COLUMNS in the same expression are not supported");


Tishj added 8 commits March 2, 2024 11:23

*COLUMNS(...) parser support, working on logic for replacing children…

61f622c

… with unpacked columns

very rough PoC

49bb57c

add first test

a240a19

add more behavior tests

a4475f1

add more examples and tests for *COLUMNS

5ae6515

Merge remote-tracking branch 'upstream/main' into unpacked_columns

682129f

(de)serialize of StarExpression

e19272a

IN in where clause

19852c9

Mytherin changed the base branch from main to feature May 3, 2024 14:43

Mytherin reviewed May 3, 2024

View reviewed changes

Mytherin added the Changes Requested label May 3, 2024

Tishj added 7 commits May 5, 2024 15:06

disallow *COLUMNS at root level

801dfc8

fix Copy, Equals and ToString for StarExpression

d7cc338

fix bug that allowed COLUMNS(COLUMNS(COLUMNS(*))) syntax

9c9d084

update error message to reference *COLUMNS

bf88b74

add more comments around "FindStarExpression"

107a331

add behavioral tests

3f2a740

Merge remote-tracking branch 'upstream/feature' into unpacked_columns

6dfa19f

duckdb-draftbot marked this pull request as draft May 22, 2024 08:03

Mytherin reviewed May 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FriendlySQL] Unpacked COLUMNS() Expression #11872

[FriendlySQL] Unpacked COLUMNS() Expression #11872

Tishj commented Apr 30, 2024

Mytherin left a comment

Mytherin Apr 30, 2024

Tishj May 5, 2024 •

edited

Mytherin left a comment

Mytherin May 31, 2024

Mytherin May 31, 2024

Tishj Jun 6, 2024

Tishj Jun 6, 2024

Tishj Jun 6, 2024

Mytherin May 31, 2024

Mytherin May 31, 2024

Mytherin May 31, 2024

Mytherin May 31, 2024

[FriendlySQL] Unpacked COLUMNS() Expression #11872

Are you sure you want to change the base?

[FriendlySQL] Unpacked COLUMNS() Expression #11872

Conversation

Tishj commented Apr 30, 2024

COLUMNS behavior

*COLUMNS behavior

Mytherin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tishj May 5, 2024 • edited

Choose a reason for hiding this comment

Mytherin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tishj May 5, 2024 •

edited