Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticSearch incorrecly serialize percolated queries #8003

Open
KondzioSSJ4 opened this issue Dec 18, 2023 · 8 comments
Open

ElasticSearch incorrecly serialize percolated queries #8003

KondzioSSJ4 opened this issue Dec 18, 2023 · 8 comments
Labels
8.x Relates to 8.x client version Category: Bug

Comments

@KondzioSSJ4
Copy link

Elastic.Clients.Elasticsearch version: 8.11 (latest)

Elasticsearch version: 8.11.3 (latest)

.NET runtime version: 6

Operating system version: Windows 11

Description of the problem including expected versus actual behavior:
Percolated queries in Elastic Search doesn't serialized to correct value

For example simple query like:

Query.Terms(new TermsQuery()
{
    Field = Item.GetExpression(x => x.Name1),
    Terms = new TermsQueryField(new[] { FieldValue.String("value") })
})

when converted to percolated query in index it will provide:

"query":{"terms":{"name1":{}}}

What is incorrect, because it should converted to:

"query":{"terms":{"name1":["value"]}}

Steps to reproduce:
Go to repository:
https://bitbucket.org/KondzioSSJ4/es_percolatedserialization/src/meh/

And then just run any of this projects
each of them returns error when try to index data

Expected behavior
Percolated queries would correctly indexed in elastic

Provide ConnectionSettings (if relevant):
localhost... any Docker would be helpfull

Provide DebugInformation (if relevant):

Invalid Elasticsearch response built from a unsuccessful (400) low level call on PUT: /percolated-index/_doc/3df093da-1f93-487f-b2bc-5616a33f2a56
 Exception: Request failed to execute. Call: Status code 400 from: PUT /percolated-index/_doc/3df093da-1f93-487f-b2bc-5616a33f2a56. ServerError: Type: document_parsing_exception Reason: "[1:79] failed to parse: Required [index, id, path]" CausedBy: "Type: illegal_argument_exception Reason: "Required [index, id, path]""

# Audit trail of this API call:
 - [1] BadResponse: Node: http://localhost:9200/ Took: 00:00:00.0266196
# OriginalException: Elastic.Transport.TransportException: Request failed to execute. Call: Status code 400 from: PUT /percolated-index/_doc/3df093da-1f93-487f-b2bc-5616a33f2a56. ServerError: Type: document_parsing_exception Reason: "[1:79] failed to parse: Required [index, id, path]" CausedBy: "Type: illegal_argument_exception Reason: "Required [index, id, path]""
# Request:
{"dingerId":"3df093da-1f93-487f-b2bc-5616a33f2a56","query":{"terms":{"name1":{}}}}
# Response:
{"error":{"root_cause":[{"type":"document_parsing_exception","reason":"[1:79] failed to parse: Required [index, id, path]"}],"type":"document_parsing_exception","reason":"[1:79] failed to parse: Required [index, id, path]","caused_by":{"type":"illegal_argument_exception","reason":"Required [index, id, path]"}},"status":400}
@KondzioSSJ4 KondzioSSJ4 added the 8.x Relates to 8.x client version label Dec 18, 2023
@KondzioSSJ4
Copy link
Author

My walkarround was to add the custom serializer
with added converter that supports Union conversion correctly

@flobernd flobernd added the bug label Dec 22, 2023
@alientourist
Copy link

alientourist commented Feb 21, 2024

Hi @KondzioSSJ4 ! I am having exactly the same problem but have trouble understanding how to add a custom serializer that supports union conversion correctly. Any chance you could share code, point to some documentation or just explain your solution in a bit more detail? Would be very grateful! Thanks in advance!

@KondzioSSJ4
Copy link
Author

@alientourist

Yes, sure

Adding converter to context:

var pool = new SingleNodePool(new Uri(_esConfig.Uri));
var connection = new ElasticsearchClientSettings(pool, CreateSerializer)
 .Authentication(....)
 ...;

var client = new ElasticsearchClient(_settings);

where most important is this method:

private static Serializer CreateSerializer(Serializer originalSerializer, IElasticsearchClientSettings settings)
{
    return new DefaultSourceSerializer(settings, x =>
    {
        x.Converters.Add(new TermsQueryFieldConverter());
    });
}

And converter is like:

internal sealed class TermsQueryFieldConverter : JsonConverter<TermsQueryField>
{
    public override TermsQueryField? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        if (reader.TokenType == JsonTokenType.StartArray)
        {
            var fields = JsonSerializer.Deserialize<FieldValue[]>(ref reader, options);
            return new TermsQueryField(fields ?? Array.Empty<FieldValue>());
        }
        else
        {
            var lookup = JsonSerializer.Deserialize<TermsLookup>(ref reader, options);
            return new TermsQueryField(lookup ?? new TermsLookup());
        }
    }

    public override void Write(Utf8JsonWriter writer, TermsQueryField value, JsonSerializerOptions options)
    {
        if (value is null)
        {
            writer.WriteNullValue();
            return;
        }

        value.Match(
            c =>
            {
                JsonSerializer.Serialize(writer, c, options);
            },
            tl =>
            {
                JsonSerializer.Serialize(writer, tl, options);
            });
    }
}

The good idea would be adding such converter inside the repository to fix issue for other user of ElasticSearch
But the "problem" is with way how it's good to add to repository
(probably because that problem would be for all Union then they probably would like to add similar converter and attributes to class for every single class that implement Union and... I don't have time to find the way to make it :D)

@alientourist
Copy link

Thanks @KondzioSSJ4 ! Now I understand a bit more! Have started adding more converters for other types that we need...

@IhnatKlimchuk
Copy link

IhnatKlimchuk commented Feb 24, 2024

So I decided to spend some time digging into that in order to solve. I found 2 serializers used in the client: DefaultRequestResponseSerializer for serializing complex internal types and DefaultSourceSerializer for serializing customer payload. First has many custom converters that allows to serialize and deserialize union types and handle TermsQuery. Ironically, there is even unit test for that: test and expected result

By default requests are serialized with DefaultRequestResponseSerializer, but CustomJsonWriterConverter<TDocument> converter detects IndexRequest, IndexRequestDescriptor or CreateRequest by searching ICustomJsonWriter interface and calls Document serialization with DefaultSourceSerializer instead of DefaultRequestResponseSerializer, resulting in dropping support of internal converters.

I guess nobody initially thought that percolate query feature will add internal types as part of the index/create request. Unfortunately, DefaultRequestResponseSerializer is marked as internal similar to converters used inside, making it impossible to re-use. IndexRequest, IndexRequestDescriptor or CreateRequest are sealed and you can't override ICustomJsonWriter interface implementation. Only option for now looks like adding custom converters into DefaultSourceSerializer as @KondzioSSJ4 mentioned.

Possible resolutions:
Quick: allow string/stream payload into IndexRequest, IndexRequestDescriptor or CreateRequest. There will be no need for struggle with DoRequestAsync, but allowing manual serialization.
Long term: create ability to switch back to DefaultRequestResponseSerializer in case field type is Query either through yet another custom converter or by extending CustomJsonWriterConverter related logic, and avoid switching to DefaultSourceSerializer in the first place.

This requires owners decision here, so I can't provide a PR for now here as most likely it will be throw away work. Waiting for maintainers discussion and decision...

@flobernd flobernd added Category: Bug and removed bug labels Apr 4, 2024
@IhnatKlimchuk
Copy link

Note: BoolQuery has MinimumShouldMatch as separate class that extends Union<int?, string> that means UnionConverter is missing in DefaultSourceSerializer

@IhnatKlimchuk
Copy link

In case there are too many issues with serializer and percolate query item is simple and has query and id only you can keep separate instance of ElasticsearchClient with next hack for settings setup:

var settings = new ElasticsearchClientSettings(...your config...);

var desiredSerializer = typeof(TransportConfigurationBase<ElasticsearchClientSettings>)
    .GetProperty("UseThisRequestResponseSerializer", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance)
    .GetValue(settings);
typeof(ElasticsearchClientSettingsBase<ElasticsearchClientSettings>)
    .GetField("_sourceSerializer", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance)
    .SetValue(settings, desiredSerializer);

This will allow to use same serializer for payload as for query in search, but you have to double check if payload serialized correctly.

@flobernd
Copy link
Member

flobernd commented Apr 5, 2024

Hi there! After the big 8.13 release I finally have some time to look into the open issues 🙂

Another possible solution would be to expose a [RequestResponseConverter] attribute similar to the [SourceConverter] one to allow switching from source- to request/response-serialization. However, this won't work if the DefaultSourceSerializer is completely replaced by the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.x Relates to 8.x client version Category: Bug
Projects
None yet
Development

No branches or pull requests

4 participants