Add toXContent and fromXContent for DiscoveryNode and DiscoveryNodes #13576

shiv0408 · 2024-05-07T10:14:16Z

Description

We need to upload the DiscoveryNodes object to remote to enable the cluster state publication flow from remote. To do that we need to parse DiscoveryNodes object to and from XContent.

Related Issues

Resolves #13691
Meta issue #13683

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)
Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2024-05-07T10:50:12Z

❌ Gradle check result for 5870c05: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-05-07T12:02:27Z

❌ Gradle check result for 7d1a599: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-05-07T13:29:29Z

❕ Gradle check result for 93d13e2: UNSTABLE

TEST FAILURES:

      1 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=search.aggregation/230_composite/composite aggregation date_histogram profile shows filter rewrite info}

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

shiv0408 · 2024-05-07T13:31:50Z

SmokeTestMultiNodeClientYamlTestSuiteIT test has been identified as flaky previously in #9332

dbwiddis

A few comments.

Also I've seen the method you call fromXContent called parse(). Since it's a static method and not enforced by the interface, in theory you can make it whatever you want... I see a lot of both throughout the codebase so either is probably fine, but I'd suggest making this consistent with other code around where it will be used... if that's fromXContent() that's fine, I haven't dug into the context.

libs/core/src/main/java/org/opensearch/core/common/transport/TransportAddress.java

dbwiddis · 2024-05-09T06:14:26Z

server/src/main/java/org/opensearch/cluster/node/DiscoveryNode.java

+        // throw new IllegalArgumentException("expected object but got a " + token);
+        // }
+        XContentParser.Token token;
+        while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {


This doesn't seem to align with your implementation of ToXContentFragment (which doesn't require start/end object) as well as the code on lines 595-597 which implies the START_OBJECT is optional. You'd hit the end of the parser without getting the END_OBJECT. (Same comment on DiscoveryNodes class below.)

I believe there might be some misunderstanding here. I agree that DiscoveryNode object's XContent does not need to have a start and end object, but here we are checking for the end object which is added in line 581 above

But for DiscoveryNodes, this comment does make sense. I will modify the class to implement ToXContentObject.

I believe there might be some misunderstanding here. I agree that DiscoveryNode object's XContent does not need to have a start and end object

I'm not sure this is the desired state here. Why don't you want to extend ToXContentObject instead of ToXContentFragment?

but here we are checking for the end object which is added in line 581 above

You are also adding a start object. Which is fine as long as they're paired. But you can't guarantee (based on the interface type) that you're always receiving the same object you wrote. It's possible someone else may make a larger object and include this inline -- the purpose of ToXContentFragment.

You make the start object optional here:

if (parser.currentToken() == XContentParser.Token.START_OBJECT) { parser.nextToken(); }

So if someone else trusts your object inheritance and places the object inline without start and end objects then you'd never get to the end and would keep iterating parser.nextToken(). Eventually that will be null, and null won't equal end object, so inside the loop you'll throw an NPE on token.isValue().

So you need to either test for a null nextToken() in addition to the END_OBJECT as a loop stopping condition, or you need to implement/extend ToXContentObject which requires start and end objects. I suspect you probably want the latter.

Actually, DiscoveryNode is part of the DiscoveryNodes toXContent response and getting used as a fragment like you mentioned and I plan to use DiscoveryNodes toXContent in ClusterState going forward. That's the reason for keeping them Fragment. Anyway, I have pushed a commit modifying the toXContent method. If you still believe we can hit a NPE in the current method. Can you please provide a sample json, I would write a unit test based on that to ensure that NPE is never thrown from our function.

you are implementing ToXContentFragment but you really should implement ToXContentObject here.

OpenSearch/server/src/main/java/org/opensearch/cluster/node/DiscoveryNode.java

Line 69 in 9b9158e

public class DiscoveryNode implements Writeable, ToXContentFragment {

I am not changing the implementing ToXContentFragment interface in the DiscoveryNode. It was already an implementation of the this interface and changing it to a ToXContentObject might causing breaking changes in our multiple API responses. As this object is already getting used as a inline object in cluster state API, and other cat APIs.

No, it's being used as an object. Your example in a comment below shows how it's used inline, and it includes th e { (START_OBJECT) and } (END_OBJECT). It's a nested complete object, not a fragment.

Even a nested object is a fragment only as there is no start and end object around it. According to our documentation here, a fragment needs to have start and end object externally.
The example of you gave around fragment being used for creating array. My use case is also similar, where instead of create a field which has a START_ARRAY, our object has START_OBJECT.

The JSON you have provided is a invalid. And we don't expect to parse XContent not created from our method. Even if our object is getting used inline, like I am using in DiscoveryNodes class, the XContent is created from our toXContent method and object is parsed from our fromXContent method.

you must match a required end object to a required start object. You require one but make the other one optional

I am matching start and end object which are being added in our toXContent method. The line 590-592 you are referring, is to enable parsing of following object. This is a complete XContent object, but even if we don't receive the complete object (i.e. without start and end), we will be able to parse that inline object.

{ "node_1" : { "name" : "name_1", "ephemeral_id" : "3Q3xRwYKScWqBgVCrWmNCQ", "transport_address" : "0.0.0.0:2", "attributes" : { "custom" : "PKU" } } }

I am not changing the implementing ToXContentFragment interface in the DiscoveryNode. It was already an implementation of the this interface and changing it to a ToXContentObject might causing breaking changes

Agreed, you shouldn't change this.

Even a nested object is a fragment only as there is no start and end object around it.

Agreed.... but then you check for the end object in the line prompting this comment chain:

while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT)

If there is no end object, this loop will never reach its stopping condition.

The JSON you have provided is a invalid.

I know. But the parsing error thrown would have been somewhere unrelated to the invalid JSON, in some follow-on code much more confusing to debug. I just threw up a quick example to show that in the case you do not require a start object, and thus shouldn't require an end object (because they should match) you end up with the code as it was written hitting some later end object. Or possibly getting a NPE if you're just parsing a small fragment with no start or end.

you must match a required end object to a required start object. You require one but make the other one optional

I am matching start and end object which are being added in our toXContent method.

When you create the JSON you can match them. If you are reading the JSON created elsewhere you do not control that. Someone else could be creating that JSON outside of core.

The code around the loop looks better because you're matching the start (lines 603-604) to the end (the stopping condition for the loop in 605).

But there is still an optional START_OBJECT present in lines 590-592 without a corresponding end object. I still think this is wrong.

It should be legal for anyone parsing this JSON to parse their own start/end outside of your parsing. SO taking your JSON above:

{ "node_1" : { "name" : "name_1", "ephemeral_id" : "3Q3xRwYKScWqBgVCrWmNCQ", "transport_address" : "0.0.0.0:2", "attributes" : { "custom" : "PKU" } } }

it should be legal to pass this (valid JSON) string as a fragment which completely encapsulates all the information in the object, lacking only its "field name" which is only relevant for DiscoveryNodes.

{ "name" : "name_1", "ephemeral_id" : "3Q3xRwYKScWqBgVCrWmNCQ", "transport_address" : "0.0.0.0:2", "attributes" : { "custom" : "PKU" } }

This fails, however, because your fragment behavior is not just the value, it's a key-value pair expected by DiscoveryNodes parsing and you require the format to be:

"node_1" : { "name" : "name_1", "ephemeral_id" : "3Q3xRwYKScWqBgVCrWmNCQ", "transport_address" : "0.0.0.0:2", "attributes" : { "custom" : "PKU" } }

Essentially, the optional start object IS read when parsing a standalone object but is NOT read when parsing the specific key/value format you include in DiscoveryNodes which omits the end object.

So ultimately my chief complaint here is that having the "field name" as part of this object is wrong; I should be able to have a standalone object that works without needing DiscoveryNodes wrapped around it.

To be more clear/succinct, your representation of DiscoveryNode in JSON is this (optionally wrapped in curly braces) key value pair.

{ "node_1" : { "name" : "name_1", "ephemeral_id" : "3Q3xRwYKScWqBgVCrWmNCQ", "transport_address" : "0.0.0.0:2", "attributes" : { "custom" : "PKU" } } }

or this because the object wrapper is optional, but requires the method caller to know that it's expecting a field name and object:

"node_1" : { "name" : "name_1", "ephemeral_id" : "3Q3xRwYKScWqBgVCrWmNCQ", "transport_address" : "0.0.0.0:2", "attributes" : { "custom" : "PKU" } }

This is not the expectation I've seen anywhere for XContentFragments which are usually standalone values (which may or may not be objects).

I believe it should either be:

{ "name" : "name_1", "id" : "node_1", "ephemeral_id" : "3Q3xRwYKScWqBgVCrWmNCQ", "transport_address" : "0.0.0.0:2", "attributes" : { "custom" : "PKU" } }

or your original format with the object wrappers mandatory (even if you keep the fragment interface).

You're mixing-and-matching pairing of start object which relies heavily on implementation details of other classes.

@dbwiddis Thanks for being patient to write the detailed comment. Got your point on that we should be able to pass the inner nested object without the fieldname. But we need to create a parser which takes id as a parameter.

DiscoveryNode fromXContent(XContentParser parser, String nodeId)

I am assuming this will help address your concern.

OK, @shiv0408 thanks for the explanation. I see now that this format is expected. It's just very confusing for a maintainer. You've obviously spent more time digging into this and I agree that your implementation is the necessary way to do it.

So my only request now is that you write a javadoc for the fromXContent() method indicating the expected format, specifically either a key-object pair standalone, or a key-object pair wrapped inside start and end objects, so the next person to come along and look at this code won't be as confused as I was.

server/src/main/java/org/opensearch/cluster/node/DiscoveryNode.java

server/src/test/java/org/opensearch/cluster/node/DiscoveryNodeTests.java

server/src/test/java/org/opensearch/cluster/node/DiscoveryNodesTests.java

lukas-vlcek · 2024-05-13T11:30:14Z

@shiv0408 Would you mind modifying issue description and adding a ticket number that requires this change please? I would be interested in learning more about why exactly this is needed (I assume this is relevant to remote storage?).

github-actions · 2024-05-15T08:57:13Z

❌ Gradle check result for 9bc7716: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-05-15T09:30:02Z

❌ Gradle check result for 68add26: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-05-15T11:51:52Z

❌ Gradle check result for f151e00: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Shivansh Arora <hishiv@amazon.com>

shiv0408 · 2024-05-15T22:14:32Z

@lukas-vlcek apologies for late response, I have modified the description of the PR to include the issue for this. And yes, this effort is required for remote state publication.

shiv0408 · 2024-05-15T22:17:42Z

@dbwiddis Thanks for reviewing the changes. I have pushed a commit to address your concerns. Can you please take another look at the changes?

Also I've seen the method you call fromXContent called parse()

We are consistently using fromXContent method name for de-serializing the object for all the objects with are getting used in our remote cluster state flow.

github-actions · 2024-05-15T22:59:50Z

❌ Gradle check result for 6a54fb1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

- Add java docs for fromString method in TransportAddress - Cleaned up toXContent methods in DiscoveryNode, DiscoveryNodes Signed-off-by: Shivansh Arora <hishiv@amazon.com>

github-actions · 2024-05-16T01:13:41Z

❌ Gradle check result for 08a10cb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

soosinha · 2024-05-27T08:39:58Z

server/src/main/java/org/opensearch/cluster/node/DiscoveryNode.java

+                        ephemeralId = parser.text();
+                        break;
+                    case KEY_TRANSPORT_ADDRESS:
+                        transportAddress = TransportAddress.fromString(parser.text());


Would this be able to handle ipv6 addresses?

Don't think we will be able to do that now. I will modify this and add a test to ensure that TransportAddress.fromString can parse both ipv4 and ipv6 addresses.

shiv0408 requested review from anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, dbwiddis, dreamer-89, gbbafna, kotwanikunal, mch2, msfroh, nknize, owaiskazi19, reta, Rishikesh1159, sachinpkale, saratvemulapalli, shwetathareja, sohami, tlfeng and VachaShah as code owners May 7, 2024 10:14

shiv0408 added the skip-changelog label May 7, 2024

shiv0408 force-pushed the xcontent-discovery-node branch from 5870c05 to 7d1a599 Compare May 7, 2024 11:29

shiv0408 force-pushed the xcontent-discovery-node branch from 7d1a599 to 93d13e2 Compare May 7, 2024 12:29

dbwiddis reviewed May 9, 2024

View reviewed changes

shiv0408 force-pushed the xcontent-discovery-node branch 2 times, most recently from 9bc7716 to 68add26 Compare May 15, 2024 08:55

shiv0408 force-pushed the xcontent-discovery-node branch from 68add26 to f151e00 Compare May 15, 2024 11:21

lukas-vlcek mentioned this pull request May 15, 2024

[META] Cluster State Publication using Remote State #13683

Open

9 tasks

Add toXContent and fromXContent for DiscoveryNode and DiscoveryNodes

24d8ee7

Signed-off-by: Shivansh Arora <hishiv@amazon.com>

shiv0408 force-pushed the xcontent-discovery-node branch from f151e00 to 6a54fb1 Compare May 15, 2024 22:04

github-actions bot added Cluster Manager enhancement Enhancement or improvement to existing feature or request labels May 15, 2024

Address PR comments

08a10cb

- Add java docs for fromString method in TransportAddress - Cleaned up toXContent methods in DiscoveryNode, DiscoveryNodes Signed-off-by: Shivansh Arora <hishiv@amazon.com>

shiv0408 force-pushed the xcontent-discovery-node branch from 6a54fb1 to 08a10cb Compare May 16, 2024 00:22

soosinha reviewed May 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add toXContent and fromXContent for DiscoveryNode and DiscoveryNodes #13576

Add toXContent and fromXContent for DiscoveryNode and DiscoveryNodes #13576

shiv0408 commented May 7, 2024 •

edited

github-actions bot commented May 7, 2024

github-actions bot commented May 7, 2024

github-actions bot commented May 7, 2024

shiv0408 commented May 7, 2024

dbwiddis left a comment

dbwiddis May 9, 2024

shiv0408 May 13, 2024

shiv0408 May 13, 2024

dbwiddis May 13, 2024

shiv0408 May 15, 2024

shiv0408 May 22, 2024

dbwiddis Jun 3, 2024

dbwiddis Jun 3, 2024 •

edited

shiv0408 Jun 3, 2024

dbwiddis Jun 3, 2024

lukas-vlcek commented May 13, 2024

github-actions bot commented May 15, 2024

github-actions bot commented May 15, 2024

github-actions bot commented May 15, 2024

shiv0408 commented May 15, 2024

shiv0408 commented May 15, 2024 •

edited

github-actions bot commented May 15, 2024

github-actions bot commented May 16, 2024

soosinha May 27, 2024

shiv0408 May 29, 2024 •

edited

Add toXContent and fromXContent for DiscoveryNode and DiscoveryNodes #13576

Are you sure you want to change the base?

Add toXContent and fromXContent for DiscoveryNode and DiscoveryNodes #13576

Conversation

shiv0408 commented May 7, 2024 • edited

Description

Related Issues

Check List

github-actions bot commented May 7, 2024

github-actions bot commented May 7, 2024

github-actions bot commented May 7, 2024

shiv0408 commented May 7, 2024

dbwiddis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbwiddis Jun 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukas-vlcek commented May 13, 2024

github-actions bot commented May 15, 2024

github-actions bot commented May 15, 2024

github-actions bot commented May 15, 2024

shiv0408 commented May 15, 2024

shiv0408 commented May 15, 2024 • edited

github-actions bot commented May 15, 2024

github-actions bot commented May 16, 2024

Choose a reason for hiding this comment

shiv0408 May 29, 2024 • edited

Choose a reason for hiding this comment

shiv0408 commented May 7, 2024 •

edited

dbwiddis Jun 3, 2024 •

edited

shiv0408 commented May 15, 2024 •

edited

shiv0408 May 29, 2024 •

edited