Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new baseclasses in opensearch for ccr classloader issue #13615

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

aggarwalShivani
Copy link
Contributor

@aggarwalShivani aggarwalShivani commented May 9, 2024

Description

(Apologies in advance, for a long description ahead ;) )

Background:

  • This change is related to a new feature being added in ism - unfollow-action #726 - integrating ccr and ism plugins.

  • It requires ism plugin to invoke stop-replication action from ccr. As both ccr and ism need to invoke some common code, the common code could be moved to common-utils, as this has been done previously for other plugins too. For ex. notifications plugin.

  • However, sharing of libraries also leads to a type-cast and class-loading issue - previously seen with notification plugin.
    Some highlights from that PR's description ->
    OpenSearch loads each plugin by different class loader separately. When two plugins need to communicate by transport action, it’s common to put request and response class into a common module shared by both.
    However, as per the forum post, this may cause problem because OpenSearch does "optimization to avoid serialization" for local request between plugins on same JVM.

  • As suggested by @bowenlan-amzn, this means as followed for notification and other plugins, the request object needs to be of a higher-level class from opensearch-core like ActionRequest and later be recreated into required StopIndexReplicationRequest type.
    To achieve this, TransportStopIndexReplicationAction's doExecute method needs to be overridden, to accept a request of different type (a class from opensearch-core) instead of StopIndexReplicationRequest.

Issue:

  • The method doExecute in TransportStopIndexReplicationAction is not overridable with change in type of request parameter.
  • Reason: TransportStopIndexReplicationAction extends class TransportClusterManagerNodeAction from opensearch-core - which mandates the request to be of a class that satisfies this condition
    extends ClusterManagerNodeRequest<Request>
    No class in opensearch-core meets this condition. So we cannot override the method with any existing class from the opensearch-core.

Change description:
Because the generics constraints are very strict for ClusterManagerNodeRequest and TransportClusterManagerNodeAction, I propose to introduce two new classes in Opensearch, similar to ClusterManagerNodeRequest and TransportClusterManagerNodeAction, but primarily change only in the generics.
This can be used in future, by other plugins too, that use TransportClusterManagerNodeAction and need to use a different request type.

Proposed solution:

  1. Opensearch: Add new baseclasses in Opensearch-core - with a more flexible generic type.
  2. Common-utils: Move common code of stop-replication from ccr to common-utils
  3. CCR: Modify ccr plugin to consume stop-replication libs from common-utils. Also, modify TransportStopIndexReplicationAction to inherit new baseclasses built in opensearch-core in point 1.
  4. ISM: Add new action in ism

This is an initial PR catering to point 1. I've not yet worked on the test coverage part as I would request reviewers to give feedback on this approach, so we can proceed accordingly.

Related Issues

It is a pre-req required for opensearch-project/index-management#726.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
  • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [ ] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Copy link
Contributor

github-actions bot commented May 9, 2024

❌ Gradle check result for 4aeb5fa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@aggarwalShivani aggarwalShivani marked this pull request as draft May 9, 2024 13:45
@aggarwalShivani
Copy link
Contributor Author

Hi @dblock and @bowenlan-amzn,
This is a follow-up to our discussion on chat. Request you to please have a look at this and share feedback.

Copy link
Contributor

github-actions bot commented May 9, 2024

❌ Gradle check result for 71852dd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn
Copy link
Member

First time saw such error

* What went wrong:
Execution failed for task ':server:filepermissions'.
> Found invalid file permissions:
  Source file is executable: /home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/action/support/clustermanager/ClusterManagerNodeAckRequest.java
  Source file is executable: /home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeExtendedAction.java

You want to change the file permission of these 2 files and push up

➜  clustermanager git:(ccr-classloader) ls -alh                                   
total 168
drwxr-xr-x@ 13 bowenlan  staff   416B May  9 15:51 .
drwxr-xr-x@ 34 bowenlan  staff   1.1K May  7 15:30 ..
-rwxr-xr-x@  1 bowenlan  staff   2.8K May  9 15:51 ClusterManagerNodeAckRequest.java        <--
-rw-r--r--@  1 bowenlan  staff   3.4K Jan  3 16:05 ClusterManagerNodeOperationRequestBuilder.java
-rw-r--r--@  1 bowenlan  staff   2.2K Jan  3 16:05 ClusterManagerNodeReadOperationRequestBuilder.java
-rw-r--r--@  1 bowenlan  staff   2.4K Jan  3 16:05 ClusterManagerNodeReadRequest.java
-rw-r--r--   1 bowenlan  staff   4.4K May  7 15:30 ClusterManagerNodeRequest.java
-rw-r--r--   1 bowenlan  staff    22K May  7 15:30 TransportClusterManagerNodeAction.java
-rwxr-xr-x@  1 bowenlan  staff    21K May  9 15:51 TransportClusterManagerNodeExtendedAction.java         <--
-rw-r--r--   1 bowenlan  staff   4.1K May  7 15:30 TransportClusterManagerNodeReadAction.java
drwxr-xr-x@  6 bowenlan  staff   192B Jan  3 16:05 info
-rw-r--r--@  1 bowenlan  staff   307B May 19  2023 package-info.java
drwxr-xr-x   7 bowenlan  staff   224B May  7 15:30 term

Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Copy link
Contributor

❌ Gradle check result for 48738a6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Copy link
Contributor

❕ Gradle check result for e35b20b: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.test.rest.ClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/string profiler via global ordinals}

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 0% with 162 lines in your changes are missing coverage. Please review.

Project coverage is 71.55%. Comparing base (b15cb0c) to head (e35b20b).
Report is 275 commits behind head on main.

❗ Current head e35b20b differs from pull request most recent head 1bf2e04. Consider uploading reports for the commit 1bf2e04 to get more accurate results

Files Patch % Lines
...ger/TransportClusterManagerNodeExtendedAction.java 0.00% 142 Missing ⚠️
...t/clustermanager/ClusterManagerNodeAckRequest.java 0.00% 20 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13615      +/-   ##
============================================
+ Coverage     71.42%   71.55%   +0.13%     
- Complexity    59978    61145    +1167     
============================================
  Files          4985     5054      +69     
  Lines        282275   287301    +5026     
  Branches      40946    41619     +673     
============================================
+ Hits         201603   205582    +3979     
- Misses        63999    64748     +749     
- Partials      16673    16971     +298     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

❌ Gradle check result for 1bf2e04: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants