Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4540: Reading proto data more than 2GB from multiple splits fails #334

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Aggarwal-Raghav
Copy link

Refer to this: HIVE-28026 and apache/hive#5033

@tez-yetus

This comment was marked as outdated.

Comment on lines 99 to 101
if (din.in != in) {
cin.resetSizeCounter();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc of CodedInputStream#setSizeLimit says the following:

If you want to read several messages from a single CodedInputStream, you could call resetSizeCounter() after each one to avoid hitting the size limit.

Based on that I would be inclined to reset the counter after every single message otherwise it still seems feasible to hit the same error if the DataInput is sufficiently large.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @zabetak.
I missed this Java doc statement. I was suspecting that resetting the totalBytesRetired after every message read might have unexpected impact therefore, I resetted it after every hdfs split read. But based on the Javadoc, I think we can reset the counter after every mesage read. Will modify the patch.

Thanks.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 15s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 17m 26s master passed
+1 💚 compile 0m 29s master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 compile 0m 28s master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 checkstyle 1m 17s master passed
+1 💚 javadoc 0m 35s master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 22s master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+0 🆗 spotbugs 1m 13s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 11s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 17s the patch passed
+1 💚 compile 0m 17s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 17s the patch passed
+1 💚 compile 0m 16s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 javac 0m 16s the patch passed
+1 💚 checkstyle 0m 8s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 8s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 9s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 findbugs 0m 38s the patch passed
_ Other Tests _
+1 💚 unit 0m 32s tez-protobuf-history-plugin in the patch passed.
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
25m 42s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/artifact/out/Dockerfile
GITHUB PR #334
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 006060d13f5e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 5e1cdee
Default Java Private Build-1.8.0_392-8u392-ga-1~22.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~22.04-b08
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/testReport/
Max. process+thread count 105 (vs. ulimit of 5500)
modules C: tez-plugins/tez-protobuf-history-plugin U: tez-plugins/tez-protobuf-history-plugin
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for pushing this forward @Aggarwal-Raghav ! My approval is not binding so you will have to ping a Tez committer to merge this.

@Aggarwal-Raghav
Copy link
Author

@abstractdog @harishjp. Can you please help get this in tez 0.10.3

@abstractdog
Copy link
Contributor

@abstractdog @harishjp. Can you please help get this in tez 0.10.3

thanks @Aggarwal-Raghav for the patch, let me check soon
I'm really sorry but tez 0.10.3 rc1 is currently being released, so we cannot add this

@abstractdog
Copy link
Contributor

CodedInputStream.totalBytesRetired can be easily checked by CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads at least twice with ProtoMessageWritable and validates that cin.resetSizeCounter() was indeed called?

@Aggarwal-Raghav
Copy link
Author

CodedInputStream.totalBytesRetired can be easily checked by CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads at least twice with ProtoMessageWritable and validates that cin.resetSizeCounter() was indeed called?

Have added a basic UT for checking cin.resetSizeCounter() is called.

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 14m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 15m 10s master passed
+1 💚 compile 0m 20s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 compile 0m 20s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 checkstyle 1m 8s master passed
+1 💚 javadoc 0m 30s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 javadoc 0m 15s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+0 🆗 spotbugs 1m 4s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 2s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 11s the patch passed
+1 💚 compile 0m 12s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 javac 0m 12s the patch passed
+1 💚 compile 0m 10s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 javac 0m 10s the patch passed
-0 ⚠️ checkstyle 0m 5s tez-plugins/tez-protobuf-history-plugin: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 7s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 javadoc 0m 7s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 findbugs 0m 27s the patch passed
_ Other Tests _
+1 💚 unit 0m 27s tez-protobuf-history-plugin in the patch passed.
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
35m 49s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/Dockerfile
GITHUB PR #334
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 012dcf99c519 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / b5b6226
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/diff-checkstyle-tez-plugins_tez-protobuf-history-plugin.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/testReport/
Max. process+thread count 107 (vs. ulimit of 5500)
modules C: tez-plugins/tez-protobuf-history-plugin U: tez-plugins/tez-protobuf-history-plugin
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Aggarwal-Raghav
Copy link
Author

@abstractdog, can you please help with the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants