New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Appender] Add AppendDefault
#11905
base: feature
Are you sure you want to change the base?
[Appender] Add AppendDefault
#11905
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Great idea. Some comments:
…no DEFAULT expression | ConstantFold where possible | introduce ExpressionExecutor for non foldable expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the PR! These changes come in handy!
The main functionality I am missing in this PR is writing a default value directly to a vector in a data chunk. Any performance-focused appender implementation most likely uses duckdb_append_data_chunk
instead of appending value-by-value.
From my understanding, in order to write default values to a vector in a data chunk, we need the context of the appender, as we need to know the context of the table. I.e., we cannot add this functionality to the data chunk API. So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector
, which takes a vector and a row index and writes the default value to the expected position... 🤔
That makes sense, I think to be performant it makes more sense to take a list of integers instead of a single row index. |
@taniabogatsch I've made a commit that starts this, taking a SelectionVector and a count for the amount of defaults that need to be appended (taking the indices from the SelectionVector) I've had to make some adjustments to the ExpressionExecutor that make me think this is not it's intended purpose (even though it does already take a |
Not only performance-focused, |
I didn't feel like the This can then be queried for information; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Tishj, the changes and improvements look nice! 😄 I've added mostly nits and also some comments.
…ts into the table
@Tishj Will you expose |
Vector intermediate(type, count); | ||
context->RunFunctionInTransaction([&]() { executor.ExecuteExpression(column, intermediate); }); | ||
for (idx_t i = 0; i < count; i++) { | ||
auto temp_idx = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can remove temp_idx
and use i
, no?
// FIXME: should the table name be its own struct for extensibility? | ||
// i.e duckdb_table_path | ||
// duckdb_table_path_set_schema(...) | ||
// duckdb_table_path_set_catalog(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. We can eventually use it to add another appender creation function allowing non-default schema appenders. Should we add it to this PR and change the parameters of duckdb_table_description_create
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! The qualifier contains a path into the database.
//! Must be destroyed with `duckdb_qualifier_destroy`.
typedef struct _duckdb_qualifier {
void *__qual;
} * duckdb_qualifier;
I really hope it doesn't need a destroy method, but since it will contain std::string
variables, it probably needs it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mytherin how do you feel about this?
Like Tania said, we are already suffering from this issue in the appender, possibly other places.
This way we wouldn't need to make a breaking change to add a char *catalog
to duckdb_appender_create
and can instead deprecate it, in favor of duckdb_appender_create_with_qualifier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, somewhat related, I think I'll separate out the duckdb_table_description
addition into a separate PR, as they can now be made standalone
Then we can worry about whether we want the duckdb_qualifier
on that PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just add a duckdb_appender_create_ext
method that also takes the catalog, and leave the current method alone? I think adding the duckdb_table_description
stuff seems overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duckdb_table_description
also serves other use cases such as getting the column count, has_default
, column_name
, etc. If a separate struct is overkill, we need to add this functionality to the appender. I.e., duckdb_appender_column_has_default
and duckdb_appender_column_name
.
cc @Mytherin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, there is already suggestion for duckdb_appender_column_name
#11737
…or to include the vector_size
@Giorgi, we plan to expose a way for the appender to write default values when using |
Great! If we don't specify that we want to use default values for some of the columns, will the DuckDB appender automatically use default values for that column? For example, if I have a table with ID auto-increment, will we still need to tell DuckDB that we want to use default for that column? |
Currently, yes. In this first iteration, it is necessary to specify that a value is a default value ( |
This feature has been requested many times, so I figured it was time to add it.
We add
AppendDefault
to theAppender
class (not BaseAppender).This method can be used in place of
Append(Value val)
The behavior of AppendDefault is the same as calling
INSERT INTO <name> VALUES (DEFAULT)
through SQL