Add more MultiFilereader features/hooks #11984
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds some missing links for extending the MultiFileReader
Firstly I added a field
case_insensitive_map_t<Value> MultiFileReaderOptions::custom_options
for passing custom options.Secondly, I added the concept of a
MultiFileReaderGlobalState
. This is a state that should generally be created in theInitGlobal
of a table function using the MultiFileReader. The global state allows the MultiFileReader to store state that is created while already knowing what columns are in the projection.A crucial part of the
MultiFileReaderGlobalState
is the extra_columns param. This parameter will be set by the MultiFileReader to indicate that the scan will produce more columns than are actually projected. These columns are for internal use by the MultiFileReader during theFinalizeChunk
step. This is crucial for the upcoming delta extension to be able to properly apply deletion vectors. To apply a deletion vector, we need to know which rows from the file are actually selected. This means thefile_row_number
column needs to be available in theFinalizeChunk
step. However, this column should not be returned by the actual scan. The solution is very similar to what we currently do for Filter pruning: where columns that are only used for pushed down filters are removed during the scan.@Mytherin I've managed to push most complexity into the delta extension for now to keep this PR simple, eventually we may want to pull the logic for populating the extra_columns up in the default MultiFileReader though