Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19167 Bug Fix: Change of Codec configuration does not work #6807

Merged
merged 2 commits into from
May 17, 2024

Conversation

skyskyhu
Copy link
Contributor

@skyskyhu skyskyhu commented May 9, 2024

HADOOP-19167 Change of Codec configuration does not work

Description of PR

In one of my projects, I need to dynamically adjust compression level for different files.
However, I found that in most cases the new compression level does not take effect as expected, the old compression level continues to be used.
Here is the relevant code snippet:

ZStandardCodec zStandardCodec = new ZStandardCodec();
zStandardCodec.setConf(conf);
conf.set("io.compression.codec.zstd.level", "5"); // level may change dynamically
conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
writer = SequenceFile.createWriter(conf, SequenceFile.Writer.file(sequenceFilePath),
                                SequenceFile.Writer.keyClass(LongWritable.class),
                                SequenceFile.Writer.valueClass(BytesWritable.class),
                                SequenceFile.Writer.compression(CompressionType.BLOCK));

Take my unit test as another example:

    DefaultCodec codec1 = new DefaultCodec();
    Configuration conf = new Configuration();
    ZlibFactory.setCompressionLevel(conf, CompressionLevel.TWO);
    codec1.setConf(conf);
    Compressor comp1 = CodecPool.getCompressor(codec1);
    CodecPool.returnCompressor(comp1);

    DefaultCodec codec2 = new DefaultCodec();
    Configuration conf2 = new Configuration();
    CompressionLevel newCompressionLevel = CompressionLevel.THREE;
    ZlibFactory.setCompressionLevel(conf2, newCompressionLevel);
    codec2.setConf(conf2);
    Compressor comp2 = CodecPool.getCompressor(codec2);

In the current code, the compression level of comp2 is 2, rather than the intended level of 3.

The reason is SequenceFile.Writer.init() method will call CodecPool.getCompressor(codec) to get a compressor, eventually CodecPool.getCompressor(codec, null) will be called.
If the compressor is a reused instance, the conf is not applied because it is passed as null:

public static Compressor getCompressor(CompressionCodec codec, Configuration conf) {
Compressor compressor = borrow(compressorPool, codec.getCompressorType());
if (compressor == null) {
  compressor = codec.createCompressor(); 
  LOG.info("Got brand-new compressor ["+codec.getDefaultExtension()+"]"); 
} else {
compressor.reinit(conf);   //conf is null here
......

Please also refer to my unit test to reproduce the bug.
To address this bug, I modified the code to ensure that the configuration is read back from the codec when a compressor is reused.

How was this patch tested?

unit test

HDFS-17510 Change of Codec configuration does not work
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 48s trunk passed
+1 💚 compile 17m 24s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 15m 54s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 1m 18s trunk passed
+1 💚 mvnsite 1m 42s trunk passed
+1 💚 javadoc 1m 16s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 54s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 2m 36s trunk passed
+1 💚 shadedclient 35m 28s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 57s the patch passed
+1 💚 compile 16m 36s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 16m 36s the patch passed
+1 💚 compile 15m 46s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 15m 46s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 14s the patch passed
+1 💚 mvnsite 1m 40s the patch passed
+1 💚 javadoc 1m 10s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 53s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 2m 46s the patch passed
+1 💚 shadedclient 35m 39s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 20m 39s hadoop-common in the patch passed.
+1 💚 asflicense 1m 3s The patch does not generate ASF License warnings.
224m 13s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6807/2/artifact/out/Dockerfile
GITHUB PR #6807
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 00683afafac5 5.15.0-101-generic #111-Ubuntu SMP Tue Mar 5 20:16:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / cbd328a
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6807/2/testReport/
Max. process+thread count 1429 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6807/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@ZanderXu ZanderXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @skyskyhu for your report.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 01s No case conflicting files found.
+0 🆗 spotbugs 0m 00s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 85m 54s trunk passed
+1 💚 compile 37m 46s trunk passed
+1 💚 checkstyle 4m 25s trunk passed
-1 ❌ mvnsite 4m 14s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 4m 43s trunk passed
+1 💚 shadedclient 143m 33s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 40s the patch passed
+1 💚 compile 36m 18s the patch passed
+1 💚 javac 36m 17s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 4m 38s the patch passed
-1 ❌ mvnsite 4m 30s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 4m 35s the patch passed
+1 💚 shadedclient 149m 58s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 5m 19s The patch does not generate ASF License warnings.
471m 10s
Subsystem Report/Notes
GITHUB PR #6807
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname MINGW64_NT-10.0-17763 a388fdfcc139 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / cbd328a
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6807/2/testReport/
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6807/2/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@skyskyhu
Copy link
Contributor Author

@ZanderXu I do not have permission to merge my commit. Could you help merge the commit? Thanks!

@skyskyhu
Copy link
Contributor Author

@steveloughran, @jojochuang, @Hexiaoqiao, Can you help review and merge the commit when you have free time~ Thanks a lot.

@steveloughran
Copy link
Contributor

@ZanderXu should have the right permissions

@skyskyhu
Copy link
Contributor Author

@ZanderXu Could you help merge the commit when you have time? Thanks a lot

@ZanderXu ZanderXu merged commit 3c00093 into apache:trunk May 17, 2024
3 of 6 checks passed
@ZanderXu
Copy link
Contributor

Merged. Thanks @skyskyhu for your contribution.

@skyskyhu
Copy link
Contributor Author

Thanks to @ZanderXu for your kind help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants