You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe what enhancement you'd like to have
distinct_retract is a connector, distinct means filter the duplicate data to the next aggregation operator, and retract is the internal impl about CDC with changelog. (we use virtual column tp_delta, +1 means data in, and -1 means retract.)
Since we have introduced counted_value_hash_map in #484, the distinct still uses an ordered map, actually, the distinct connector does not care about the order of the element.
the related code can be found at
using Map = CountedValueMap<StringRef, false>; /// map<key(without delta_col), uint32>
using Self = AggregateFunctionDistinctRetractGenericData;
Map map;
another thing we wanna improve/refactor is currently we use the stringRef style for all the input, recalling another distinct implementation, we notice the function has to do dispatch with input data type. that's say, current distinct_retract needs extra serialize/deser for numeric data, but, we are input the numeric data, we do not need such string serialization overhead.
Describe what enhancement you'd like to have
distinct_retract is a connector,
distinct
means filter the duplicate data to the next aggregation operator, andretract
is the internal impl about CDC with changelog. (we use virtual column tp_delta, +1 means data in, and -1 means retract.)short: this issue wanna tackle 2 things:
detailed
Since we have introduced counted_value_hash_map in #484, the distinct still uses an ordered map, actually, the distinct connector does not care about the order of the element.
the related code can be found at
proton/src/AggregateFunctions/Streaming/AggregateFunctionDistinctRetract.h
Lines 19 to 25 in 00d0411
another thing we wanna improve/refactor is currently we use the
stringRef
style for all the input, recalling another distinct implementation, we notice the function has to do dispatch with input data type. that's say, current distinct_retract needs extra serialize/deser for numeric data, but, we are input the numeric data, we do not need such string serialization overhead.proton/src/AggregateFunctions/Streaming/AggregateFunctionDistinct.cpp
Lines 37 to 63 in 00d0411
upstream #426
The text was updated successfully, but these errors were encountered: