Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19180. EC: Fix calculation errors caused by special index order #6813

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

zhengchenyu
Copy link
Contributor

Description of PR

I found that if the erasedIndexes distribution is such that the parity index is in front of the data index, ec will produce wrong results when decoding.

In fact, HDFS-15186 has described this problem, but does not fundamentally solve it.

The reason is that the code assumes that erasedIndexes is preceded by the data index and followed by parity index. If there is a parity index placed in front of the data index in the incoming code, a calculation error will occur.

How was this patch tested?

The TestErasureCodingEncodeAndDecode unit test and the erasure_code_test binary were executed on different machines. The test machines include those with isa-l installed and those without isa-l installed.

For code changes:

  • Make erasedIndexes support arbitrary index distribution.

@slfan1989
Copy link
Contributor

slfan1989 commented May 14, 2024

@zhangshuyan0 @haiyang1987 Could you help review this PR? I'm not very familiar with EC, but I've noticed that you have submitted quite a few improvements related to EC. Thank you very much! @zhengchenyu Thank for the contribution!

Copy link
Contributor

@zhangshuyan0 zhangshuyan0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhengchenyu Thanks for your report! If there is a parity index smaller than data index, there will be a bug in decoding calculation. Is this what you mean?
Firstly, I am curious when parity index will be smaller?
Secondly, if this situation occurs, the reading of EC files may also be affected. So, I think we won't actively modify the code to make this happen.

@zhengchenyu
Copy link
Contributor Author

zhengchenyu commented May 14, 2024

@zhangshuyan0
Thanks for your review!
I don't means that parity index is smaller. Parity index and data index is fixed number, we can't update it.
The reproduce case:
When we call RawErasureDecoder::decode, and if the parameter erasedIndexes is in special order. The special order is that the parity index precedes the data index. For example, if erasedIndexes is [8,0], will reproduce this problem.
And you can run unit tests directly and reproduce this easily.
I printed erasedIndexes for all errors in the unit test in wrongindex.txt, all meet this characteristic.

@zhangshuyan0
Copy link
Contributor

@zhengchenyu Thanks for your explanation! I got it. This PR LGTM.
I'm just curious about one thing now: when does the parity index precede the data index in current code? I haven't found one.

@zhengchenyu
Copy link
Contributor Author

zhengchenyu commented May 16, 2024

@zhengchenyu Thanks for your explanation! I got it. This PR LGTM. I'm just curious about one thing now: when does the parity index precede the data index in current code? I haven't found one.

HDFS-15186 has described this problem, but does not fundamentally solve it.
I think we should not expect to find this scenario, and if it occurs, it will cause problems with unrecoverable data.
In fact, as mentioned in HDFS-15186, there is no fundamental reason to find out, which also leads to some data developer not trusting to put a large amount of data into the EC.

@zhangshuyan0
Copy link
Contributor

@zhengchenyu Thanks for your reply. I agree.

Copy link
Contributor

@ZanderXu ZanderXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

Maybe you should move this ticket to HADOOP and move the UT to common.

@zhengchenyu
Copy link
Contributor Author

zhengchenyu commented May 17, 2024

@ZanderXu
Yes, I agree! This PR should be move to common. Should I create another HADOOP JIAR? At least I can't change HDFS jira to Hadoop jira.

@zhengchenyu zhengchenyu changed the title HDFS-17521. Erasure Coding: Fix calculation errors caused by special index order HDFS-17521. EC: Fix calculation errors caused by special index order May 17, 2024
@haiyang1987
Copy link
Contributor

@ZanderXu Yes, I agree! This PR should be move to common. Should I create another HADOOP JIAR? At least I can't change HDFS jira to Hadoop jira.

you can move hdfs to common

image

@zhengchenyu zhengchenyu changed the title HDFS-17521. EC: Fix calculation errors caused by special index order HADOOP-19180. EC: Fix calculation errors caused by special index order May 17, 2024
@github-actions github-actions bot removed the HDFS label May 17, 2024
@zhengchenyu
Copy link
Contributor Author

@haiyang1987 thank you for your reminder!
I have move this to hadoop common!

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 37m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 47m 59s trunk passed
+1 💚 compile 16m 4s trunk passed
+1 💚 checkstyle 1m 18s trunk passed
+1 💚 mvnsite 1m 44s trunk passed
+1 💚 javadoc 1m 4s trunk passed
+1 💚 spotbugs 2m 40s trunk passed
+1 💚 shadedclient 35m 0s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 35m 25s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 58s the patch passed
+1 💚 compile 15m 29s the patch passed
+1 💚 cc 15m 29s the patch passed
+1 💚 golang 15m 29s the patch passed
+1 💚 javac 15m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 16s the patch passed
+1 💚 mvnsite 1m 48s the patch passed
+1 💚 javadoc 0m 58s the patch passed
+1 💚 spotbugs 2m 48s the patch passed
+1 💚 shadedclient 35m 26s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 21m 8s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 asflicense 1m 7s The patch does not generate ASF License warnings.
227m 11s
Reason Tests
Failed junit tests hadoop.service.launcher.TestServiceInterruptHandling
hadoop.crypto.TestCryptoCodec
hadoop.crypto.TestCryptoStreamsWithOpensslSm4CtrCryptoCodec
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6813/2/artifact/out/Dockerfile
GITHUB PR #6813
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc golang
uname Linux 262f84e34d74 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 757cc1f
Default Java Red Hat, Inc.-1.8.0_412-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6813/2/testReport/
Max. process+thread count 3151 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6813/2/console
versions git=2.9.5 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@zhengchenyu
Copy link
Contributor Author

The reported error "Doesn't support SM4 CTR." seems to be not related to this PR. Perhaps the compiler environment has changed?

@zhengchenyu zhengchenyu requested a review from ZanderXu May 20, 2024 02:17
@zhengchenyu
Copy link
Contributor Author

@ZanderXu Can you please review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants