Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSEARCH-5133 Support basic metrics aggregations #4144

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

fax4ever
Copy link
Contributor

@fax4ever fax4ever commented May 2, 2024

https://hibernate.atlassian.net/browse/HSEARCH-5133

Draft (Preview):

1. we need to implement count, count distinct and avg done!
2. we need to implement the Lucene backend
3. write good tests covering all the fields types
4. write some doc

Still I prefer to release this draft so that we can get some feedback...

@fax4ever fax4ever force-pushed the HSEARCH-5133 branch 2 times, most recently from 72b75a3 to 58ebd31 Compare May 3, 2024 13:33
And support them in Elasticsearch backend
And implement for Elastic backend
Eventually it will be merged in something more structured, but now I need something easy to run and understand...
Copy link
Member

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress 😃,

I've added some inline comments here and there. As for the ElasticsearchFieldCodec#convertFromDouble(Double value):

  • it is probably fine for now, while we are iterating on the implementation.
  • let's figure out what it means for non-numeric types
  • if we don't go with doubles for the Lucene backend and it will stay only as an implementation detail for the Elasticsearch backend, we probably wouldn't need to expose it for any user customization so it can stay in the codec. otherwise maybe it'll make more sense to have something in SearchIndexValueFieldTypeContext

@@ -60,6 +60,11 @@ public T decodeAggregationKey(JsonElement key, JsonElement keyAsString) {
return decode( keyAsString );
}

@Override
public T convertFromDoubleNullSafe(Double value) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried date fields with the Elasticsearch backend? Is it not possible to do any metric aggregations with them? if not, this probably should fail with an assertion error and corresponding aggregations just not added to the ElasticsearchIndexValueFieldType.Builder...

@@ -53,6 +53,11 @@ public F decode(JsonElement element) {
return fromJsonArray( JsonElementTypes.ARRAY.fromElement( element ) );
}

@Override
public F convertFromDoubleNullSafe(Double value) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unlikely that we'd want any aggregations for vectors, so maybe just an assertion error.

@@ -65,6 +65,11 @@ public BigDecimal decode(JsonElement element) {
return JsonElementTypes.BIG_DECIMAL.fromElement( element );
}

@Override
public BigDecimal convertFromDoubleNullSafe(Double value) {
return BigDecimal.valueOf( value );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best to move these converter methods to the utils similar to ParseUtils.

Comment on lines +71 to +83
builder.queryElementFactory( AggregationTypeKeys.SUM,
new ElasticsearchMetricFieldAggregation.Factory<>( codec, "sum" ) );
builder.queryElementFactory( AggregationTypeKeys.MIN,
new ElasticsearchMetricFieldAggregation.Factory<>( codec, "min" ) );
builder.queryElementFactory( AggregationTypeKeys.MAX,
new ElasticsearchMetricFieldAggregation.Factory<>( codec, "max" ) );
builder.queryElementFactory( AggregationTypeKeys.COUNT,
new ElasticsearchMetricLongAggregation.Factory<>( codec, "value_count" ) );
builder.queryElementFactory( AggregationTypeKeys.COUNT_DISTINCT,
new ElasticsearchMetricLongAggregation.Factory<>( codec, "cardinality" ) );
builder.queryElementFactory( AggregationTypeKeys.AVG,
new ElasticsearchMetricDoubleAggregation.Factory<>( codec, "avg" )
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's try testing these aggregations with other types as well and see which are supported (I'd hope that at least temporal types would work with some of these). When I say test, I mean just to create an index and run some simple aggregation HTTP request so that we don't need to add all the code if the backend does not even support it 🙈 😃.

Comment on lines +26 to +27
MetricDoubleAggregationBuilder builder = dslContext.scope()
.fieldQueryElement( fieldPath, AggregationTypeKeys.AVG ).type();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a double average may not be the best for the temporal types. any chance we can follow the same pattern as for the sum?

Comment on lines +57 to +58
.aggregation( sumIntegers, f -> f.sum().field( "integer", Integer.class ) )
.aggregation( sumConverted, f -> f.sum().field( "converted", String.class ) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have these new metric aggregations extend AggregationFilterStep maybe let's have a test with nested objects and a filter to make sure that there are no surprises there 🙈 😃.

Also, maybe let's try to have some explicit ValueConvert.YES/NO ?

Copy link

sonarcloud bot commented Jun 1, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants