Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store source for nested objects #108818

Merged
merged 9 commits into from
May 22, 2024

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented May 20, 2024

Covers nested objects with fields or arrays of objects. Adds support for tracking the source of arrays in disabled objects, too.

Related to #106825, #108417

@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@kkrik-es kkrik-es requested review from martijnvg and lkts May 20, 2024 12:42
@kkrik-es kkrik-es marked this pull request as ready for review May 20, 2024 12:42
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left two comments.

@@ -287,6 +287,19 @@ static void parseObjectOrNested(DocumentParserContext context) throws IOExceptio
}

if (context.parent().isNested()) {
if (context.mappingLookup().isSourceSynthetic() && context.getClonedSource() == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about the added code here. Is the reason that this is needed in case there is an object field that is mapped as nested field? The array case is then covered by parseNonDynamicArray(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, just in case there's an entry not using an array. Should be rare but it can happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case is the || mapper instanceof NestedObjectMapper addition at line 612 required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, that section covers the usual case where the nested object contains an array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a nested object has already been added to ignored source, then I think we don't need to do anything for that json object contains an array? Since it has already been added to ignored source as part of the object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we either get an array or an object, not an object wrapping an array. I verified this in the unit test, isn't that the valid syntax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see without || mapper instanceof NestedObjectMapper we only store the first element of an array. So subsequent objects in an array would not get recorded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add this is as a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 615 to 622
boolean isRoot = context.parent() instanceof RootObjectMapper;
context.addIgnoredField(
new IgnoredSourceFieldMapper.NameValue(
isRoot ? arrayFieldName : context.parent().name() + "." + arrayFieldName,
isRoot ? 0 : context.parent().fullPath().length() + 1,
XContentDataHelper.encodeXContentBuilder(tuple.v2())
)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe turn this logic into a helper method? I think this appears now several times in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, @lkts added one. Applied here.

context = tuple.v1();
token = context.parser().currentToken();
parser = context.parser();
}
context = context.createNestedContext((NestedObjectMapper) context.parent());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the DocumentParserContext constructor that is used eventually from createNestedContext(...) method also copy clonedSource field from provided context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this needs to be propagated.

@@ -287,6 +287,19 @@ static void parseObjectOrNested(DocumentParserContext context) throws IOExceptio
}

if (context.parent().isNested()) {
if (context.mappingLookup().isSourceSynthetic() && context.getClonedSource() == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a nested object has already been added to ignored source, then I think we don't need to do anything for that json object contains an array? Since it has already been added to ignored source as part of the object?


boolean booleanValue = randomBoolean();
int intValue = randomInt();
var syntheticSource = syntheticSource(documentMapper, b -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also assert the Lucene document that get created here? I think it can be useful to assert the actual IndexableField instances for _ignored_source field. Maybe the syntheticSource(...) method can optionally accept a Consumer<LuceneDocument> parameter, that allows to the test to assert the field instances?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm not sure what would be the benefit, the contents are encoded. The decoded version should show up in the printed source that we already assert on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, even if for some reason we end up with duplicate ignored stored fields. (duplicate array elements would be printed)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kkrik-es kkrik-es merged commit 7e09df7 into elastic:main May 22, 2024
15 checks passed
@kkrik-es kkrik-es deleted the fix/synthetic-source/nested-object branch May 23, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants