Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Appender] Add AppendDefault #11905

Draft
wants to merge 27 commits into
base: feature
Choose a base branch
from
Draft

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented May 2, 2024

This feature has been requested many times, so I figured it was time to add it.

We add AppendDefault to the Appender class (not BaseAppender).
This method can be used in place of Append(Value val)

The behavior of AppendDefault is the same as calling INSERT INTO <name> VALUES (DEFAULT) through SQL

src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Great idea. Some comments:

test/appender/test_appender.cpp Outdated Show resolved Hide resolved
src/include/duckdb/main/appender.hpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
@Tishj Tishj marked this pull request as ready for review May 3, 2024 11:08
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the PR! These changes come in handy!

The main functionality I am missing in this PR is writing a default value directly to a vector in a data chunk. Any performance-focused appender implementation most likely uses duckdb_append_data_chunk instead of appending value-by-value.

From my understanding, in order to write default values to a vector in a data chunk, we need the context of the appender, as we need to know the context of the table. I.e., we cannot add this functionality to the data chunk API. So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector, which takes a vector and a row index and writes the default value to the expected position... 🤔

src/main/appender.cpp Show resolved Hide resolved
test/appender/test_appender.cpp Show resolved Hide resolved
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 7, 2024 08:23
@Tishj
Copy link
Contributor Author

Tishj commented May 7, 2024

So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector, which takes a vector and a row index and writes the default value to the expected position... 🤔

That makes sense, I think to be performant it makes more sense to take a list of integers instead of a single row index.
Since internally we use an ExpressionExecutor, running this expression on multiple rows at once is likely more performant.

@Tishj
Copy link
Contributor Author

Tishj commented May 7, 2024

So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector, which takes a vector and a row index and writes the default value to the expected position... 🤔

That makes sense, I think to be performant it makes more sense to take a list of integers instead of a single row index. Since internally we use an ExpressionExecutor, running this expression on multiple rows at once is likely more performant.

@taniabogatsch I've made a commit that starts this, taking a SelectionVector and a count for the amount of defaults that need to be appended (taking the indices from the SelectionVector)

I've had to make some adjustments to the ExpressionExecutor that make me think this is not it's intended purpose (even though it does already take a const SelectionVector *sel)
Please check out this commit first PoC for AppendDefaultToVector

src/main/appender.cpp Outdated Show resolved Hide resolved
@Giorgi
Copy link
Contributor

Giorgi commented May 7, 2024

Any performance-focused appender implementation most likely uses duckdb_append_data_chunk instead of appending value-by-value.

From my understanding, in order to write default values to a vector in a data chunk, we need the context of the appender, as we need to know the context of the table. I.e., we cannot add this functionality to the data chunk API. So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector, which takes a vector and a row index and writes the default value to the expected position... 🤔

Not only performance-focused, duckdb_append_data_chunk is needed to append more "complex" data types (uuid, decimal, enum, list, etc) so we definitely need a way to append a default value to a vector.

@Tishj
Copy link
Contributor Author

Tishj commented May 7, 2024

I didn't feel like the appender was the right place to add methods that query table information.
As such I've created a duckdb_table_description object, which wraps a TableDescription object, which is also present inside the Appender.

This can then be queried for information; duckdb_column_has_default(table, 0)
I imagine this can be extended in the future to include duckdb_column_is_generated, duckdb_column_get_type, duckdb_column_get_name, etc..

Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Tishj, the changes and improvements look nice! 😄 I've added mostly nits and also some comments.

src/include/duckdb.h Outdated Show resolved Hide resolved
src/execution/expression_executor/execute_constant.cpp Outdated Show resolved Hide resolved
src/execution/expression_executor.cpp Outdated Show resolved Hide resolved
src/include/duckdb/execution/expression_executor.hpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/include/duckdb/main/appender.hpp Outdated Show resolved Hide resolved
test/api/capi/test_capi_appender.cpp Outdated Show resolved Hide resolved
test/api/capi/test_capi_appender.cpp Show resolved Hide resolved
test/api/capi/test_capi_appender.cpp Outdated Show resolved Hide resolved
test/appender/test_appender.cpp Outdated Show resolved Hide resolved
@Giorgi
Copy link
Contributor

Giorgi commented May 13, 2024

@Tishj Will you expose AppendDefaultsToVector in the C API as part of this PR?

src/main/appender.cpp Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
Vector intermediate(type, count);
context->RunFunctionInTransaction([&]() { executor.ExecuteExpression(column, intermediate); });
for (idx_t i = 0; i < count; i++) {
auto temp_idx = i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can remove temp_idx and use i, no?

Comment on lines +8 to +11
// FIXME: should the table name be its own struct for extensibility?
// i.e duckdb_table_path
// duckdb_table_path_set_schema(...)
// duckdb_table_path_set_catalog(...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. We can eventually use it to add another appender creation function allowing non-default schema appenders. Should we add it to this PR and change the parameters of duckdb_table_description_create?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//! The qualifier contains a path into the database.
//! Must be destroyed with `duckdb_qualifier_destroy`.
typedef struct _duckdb_qualifier {
	void *__qual;
} * duckdb_qualifier;

I really hope it doesn't need a destroy method, but since it will contain std::string variables, it probably needs it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mytherin how do you feel about this?

Like Tania said, we are already suffering from this issue in the appender, possibly other places.
This way we wouldn't need to make a breaking change to add a char *catalog to duckdb_appender_create and can instead deprecate it, in favor of duckdb_appender_create_with_qualifier ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, somewhat related, I think I'll separate out the duckdb_table_description addition into a separate PR, as they can now be made standalone
Then we can worry about whether we want the duckdb_qualifier on that PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just add a duckdb_appender_create_ext method that also takes the catalog, and leave the current method alone? I think adding the duckdb_table_description stuff seems overkill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duckdb_table_description also serves other use cases such as getting the column count, has_default, column_name, etc. If a separate struct is overkill, we need to add this functionality to the appender. I.e., duckdb_appender_column_has_default and duckdb_appender_column_name.
cc @Mytherin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, there is already suggestion for duckdb_appender_column_name #11737

@taniabogatsch
Copy link
Contributor

@Giorgi, we plan to expose a way for the appender to write default values when using append_data_chunk in the C API. We'll expose it in a follow-up PR if it is not included in this PR.

@Giorgi
Copy link
Contributor

Giorgi commented May 13, 2024

@Giorgi, we plan to expose a way for the appender to write default values when using append_data_chunk in the C API. We'll expose it in a follow-up PR if it is not included in this PR.

Great! If we don't specify that we want to use default values for some of the columns, will the DuckDB appender automatically use default values for that column? For example, if I have a table with ID auto-increment, will we still need to tell DuckDB that we want to use default for that column?

@taniabogatsch
Copy link
Contributor

Great! If we don't specify that we want to use default values for some of the columns, will the DuckDB appender automatically use default values for that column? For example, if I have a table with ID auto-increment, will we still need to tell DuckDB that we want to use default for that column?

Currently, yes. In this first iteration, it is necessary to specify that a value is a default value (AppendDefault or by using AppendDefaultsToVector). For example, not writing any value (specific or default) to an auto-increment ID will result in an error.

@Tishj Tishj changed the base branch from main to feature May 16, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants