-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HSEARCH-5133 Support basic metrics aggregations #4144
base: main
Are you sure you want to change the base?
Conversation
Sharing the same aggregation builder
72b75a3
to
58ebd31
Compare
And support them in Elasticsearch backend
And implement for Elastic backend
Eventually it will be merged in something more structured, but now I need something easy to run and understand...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice progress 😃,
I've added some inline comments here and there. As for the ElasticsearchFieldCodec#convertFromDouble(Double value)
:
- it is probably fine for now, while we are iterating on the implementation.
- let's figure out what it means for non-numeric types
- if we don't go with doubles for the Lucene backend and it will stay only as an implementation detail for the Elasticsearch backend, we probably wouldn't need to expose it for any user customization so it can stay in the codec. otherwise maybe it'll make more sense to have something in
SearchIndexValueFieldTypeContext
@@ -60,6 +60,11 @@ public T decodeAggregationKey(JsonElement key, JsonElement keyAsString) { | |||
return decode( keyAsString ); | |||
} | |||
|
|||
@Override | |||
public T convertFromDoubleNullSafe(Double value) { | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried date fields with the Elasticsearch backend? Is it not possible to do any metric aggregations with them? if not, this probably should fail with an assertion error and corresponding aggregations just not added to the ElasticsearchIndexValueFieldType.Builder
...
@@ -53,6 +53,11 @@ public F decode(JsonElement element) { | |||
return fromJsonArray( JsonElementTypes.ARRAY.fromElement( element ) ); | |||
} | |||
|
|||
@Override | |||
public F convertFromDoubleNullSafe(Double value) { | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unlikely that we'd want any aggregations for vectors, so maybe just an assertion error.
@@ -65,6 +65,11 @@ public BigDecimal decode(JsonElement element) { | |||
return JsonElementTypes.BIG_DECIMAL.fromElement( element ); | |||
} | |||
|
|||
@Override | |||
public BigDecimal convertFromDoubleNullSafe(Double value) { | |||
return BigDecimal.valueOf( value ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably best to move these converter methods to the utils similar to ParseUtils
.
builder.queryElementFactory( AggregationTypeKeys.SUM, | ||
new ElasticsearchMetricFieldAggregation.Factory<>( codec, "sum" ) ); | ||
builder.queryElementFactory( AggregationTypeKeys.MIN, | ||
new ElasticsearchMetricFieldAggregation.Factory<>( codec, "min" ) ); | ||
builder.queryElementFactory( AggregationTypeKeys.MAX, | ||
new ElasticsearchMetricFieldAggregation.Factory<>( codec, "max" ) ); | ||
builder.queryElementFactory( AggregationTypeKeys.COUNT, | ||
new ElasticsearchMetricLongAggregation.Factory<>( codec, "value_count" ) ); | ||
builder.queryElementFactory( AggregationTypeKeys.COUNT_DISTINCT, | ||
new ElasticsearchMetricLongAggregation.Factory<>( codec, "cardinality" ) ); | ||
builder.queryElementFactory( AggregationTypeKeys.AVG, | ||
new ElasticsearchMetricDoubleAggregation.Factory<>( codec, "avg" ) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's try testing these aggregations with other types as well and see which are supported (I'd hope that at least temporal types would work with some of these). When I say test, I mean just to create an index and run some simple aggregation HTTP request so that we don't need to add all the code if the backend does not even support it 🙈 😃.
MetricDoubleAggregationBuilder builder = dslContext.scope() | ||
.fieldQueryElement( fieldPath, AggregationTypeKeys.AVG ).type(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a double average may not be the best for the temporal types. any chance we can follow the same pattern as for the sum?
.aggregation( sumIntegers, f -> f.sum().field( "integer", Integer.class ) ) | ||
.aggregation( sumConverted, f -> f.sum().field( "converted", String.class ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have these new metric aggregations extend AggregationFilterStep
maybe let's have a test with nested objects and a filter to make sure that there are no surprises there 🙈 😃.
Also, maybe let's try to have some explicit ValueConvert.YES/NO
?
Quality Gate passedIssues Measures |
https://hibernate.atlassian.net/browse/HSEARCH-5133
Draft (Preview):
1. we need to implement count, count distinct and avgdone!2. we need to implement the Lucene backend
3. write good tests covering all the fields types
4. write some doc
Still I prefer to release this draft so that we can get some feedback...