Skip to content

Custom extractor design

michaelcahill edited this page Sep 11, 2014 · 3 revisions

This page contains some notes on the implementation of custom extractors, issue #1199.

Basic registration and lookup on table open is identical to custom collators.

If a custom extractor is set on an index:

  • key_format for the index must be set during the create

  • all primary key columns are implicitly added to the index key

  • value_format must not be set (i.e., it must be the default "u"), and values will be empty as with built-in indexes.

  • if no key column names are supplied for an index, all index queries will do a primary lookup: it is not possible to do projections without key column names.

  • the application may assign column names to the index keys, but if the index column names match the table column names, the contents of those columns must be identical (for covering index queries to work).

Implementation details

Need to merge specified key_format with primary key_format. This needs care because of the optimization when the last column is a WT_ITEM (we don't store the length).

We still need WT_INDEX::idxkey_format with a custom extractor to mask the visible index columns (and hide the primary key columns that we add to the end).

Index keys are currently generated by __wt_schema_project_merge in APPLY_IDX in cur_table.c. When a custom extractor is set, this loop needs another level of nesting (to deal with multiple index keys per record).

We use "plans" to slice and dice columns to form keys and values. This is a string describing a sequence of steps through the columns to copy out the necessary columns in the required order. Custom extractors replace the use of WT_INDEX::key_plan when generating the index key from the table. However, they still need a key_plan to extract the primary key from the index. It might be simplest to generate this key_plan manually (just skip the number of columns in the visible index key and copy out each column in the primary key).

They also still need WT_INDEX::value_plan, to find the matching table columns, and WT_CURSOR_INDEX::value_plan for use with projections. These should continue to work unchanged (with the note above that if the application names columns in the index key and the names overlap with columns in the table, they must contain identical values or the results will be undefined).

Testing

test/format should be modified to optionally create an index with a custom extractor that creates 2 index keys for each primary record.

Estimate

There is probably about a week of work here: ~1 day of documentation, ~2 days for implementation plus ~2 days for testing.