Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Routing Table] Initial commit for index routing table manifest #13577

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

himshikha
Copy link

Description

This PR is based off on the PR by @Bukhtawar for index routing stream setup #13255
This fixes some of the bugs in input stream creation in the original PR. Additionally, this moves checksum creation to end of file and we will create single checksum for complete file rather than having individual checksum.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Bukhtawar and others added 6 commits May 7, 2024 16:18
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Signed-off-by: Himshikha Gupta <himshikh@amazon.com>
Copy link
Contributor

github-actions bot commented May 7, 2024

❌ Gradle check result for 441f520: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Comment on lines +70 to +71
this.bytesStreamOutput = new BytesStreamOutput();
this.out = new BufferedChecksumStreamOutput(bytesStreamOutput);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we require them to be properties of the class. All that we should care about is the byte array that acts as a buffer to get reused across operations

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BytesStreamOutput already uses a ByteArray reference which kind of defeats the the purpose of having buf

Copy link
Author

@himshikha himshikha May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the checksum to be added for complete file, for this we need to ensure we reuse the same BufferedChecksumStreamOutput throughout the write operations.
We are resetting the BytesStreamOutput once we copy the content in buf, this ensures the BytesStreamOutput doesn't accumulate the whole data and we manage the buffer size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly so BufferedChecksumStreamOutput should be used only within the context of a method or specific operation as a temp storage rather than exposing it as a class property

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with keeping BufferedChecksumStreamOutput as a temp storage is we will then need to retain and update checksum independently.
The usage of this class is something like:
inputStream = new IndexRoutingTableInputStream() -> which calls initialFill where we put some content in buffer.
Post this inputStream.read() will be called in a loop till we have content in buffer.
In both these methods we need to retain the checksum and add to it with the next content read. BufferedChecksumStreamOutput does not currently expose any methods to add checksum to an initial value.

The way I see we have 2 options here:

  1. Keep using BufferedChecksumStreamOutput as class property which takes care of updating checksum across reads.
  2. Retain checksum from one read and update BufferedChecksumStreamOutput to be able to use an initial value in the next read.

Please suggest if there is a better way of handling checksum or if I am missing something here.

@@ -30,7 +30,7 @@
* GitHub history for details.
*/

package org.opensearch.index.translog;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split package issue (Java package org.opensearch.comm being used by both :server and :libs/common) is a problem and a blocker for adopting JPMS. I think we should avoid exacerbating that problem by adding more things in org.opensearch.common in the :server module. I suggest either not moving it, moving this elsewhere :server, or move it to org.opensearch.core.common in :libs/core.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't all commons go into libs/common?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

github-actions bot commented May 8, 2024

❌ Gradle check result for ee70dca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 9efd85b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Himshikha Gupta <himshikh@amazon.com>
@himshikha himshikha force-pushed the remote_routing_input_stream branch from 9efd85b to 7aeecc8 Compare May 14, 2024 06:04
Copy link
Contributor

❌ Gradle check result for 7aeecc8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for 7b2bc79: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for 88d266f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Himshikha Gupta <himshikh@amazon.com>
@himshikha himshikha force-pushed the remote_routing_input_stream branch from 1ef13ae to 4c97869 Compare May 14, 2024 10:56
Copy link
Contributor

❌ Gradle check result for 1ef13ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4c97869: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants