Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom configuration property reader for segment metadata files #12440

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

abhioncbr
Copy link
Contributor

@abhioncbr abhioncbr commented Feb 18, 2024

This is the first set of changes for the issue.

In this PR, we introduce a new custom property reader for the segment metadata files. This custom property reader is based on the commons-configuration2 custom property reader/writer(here is the link). We are introducing this custom property reader to avoid unescaping operation happen for the key in default implementation.

cc: @klsince @Jackie-Jiang @itschrispeck

@abhioncbr abhioncbr changed the title Added custom configuration property reader for segment metadata. Custom configuration property reader for segment metadata files Feb 18, 2024
@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2024

Codecov Report

Attention: Patch coverage is 86.36364% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 62.16%. Comparing base (59551e4) to head (fbd5450).
Report is 522 commits behind head on master.

Files Patch % Lines
...pache/pinot/spi/env/CommonsConfigurationUtils.java 85.71% 4 Missing and 1 partial ⚠️
...rg/apache/pinot/spi/env/PropertyIOFactoryKind.java 70.00% 3 Missing ⚠️
.../apache/pinot/spi/env/VersionedPropertyReader.java 91.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12440      +/-   ##
============================================
+ Coverage     61.75%   62.16%   +0.41%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2540     +104     
  Lines        133233   139455    +6222     
  Branches      20636    21554     +918     
============================================
+ Hits          82274    86697    +4423     
- Misses        44911    46265    +1354     
- Partials       6048     6493     +445     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.12% <86.36%> (+0.41%) ⬆️
java-21 62.05% <86.36%> (+0.42%) ⬆️
skip-bytebuffers-false 62.15% <86.36%> (+0.41%) ⬆️
skip-bytebuffers-true 62.02% <86.36%> (+34.29%) ⬆️
temurin 62.16% <86.36%> (+0.41%) ⬆️
unittests 62.16% <86.36%> (+0.41%) ⬆️
unittests1 46.75% <86.36%> (-0.14%) ⬇️
unittests2 27.72% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to keep backward compatibility, we will need to introduce versioning into the segment metadata file. Imagine we use the new property reader to read a old metadata file, it will fail to unescape the values, thus reading the wrong values.

We should introduce a new segment metadata version, and ensure there is no escaping in the new version, then we can use this new property reader for the new version. For the old version, we'll need to fall back to the old property reader.

@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from a67b71a to 15601b7 Compare March 1, 2024 02:53
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test to ensure it can read the old format properly? We can put a old format file into the resource folder, and use the new reader to read it

@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from 157b780 to 86a2fff Compare March 11, 2024 11:02
@abhioncbr
Copy link
Contributor Author

Can we add a test to ensure it can read the old format properly? We can put a old format file into the resource folder, and use the new reader to read it

Added the required tests.

@abhioncbr
Copy link
Contributor Author

@Jackie-Jiang / @klsince, any further comments on the implementation side. I am going to start work on it's usage based on the cluster configuration or table configuration.

@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from c6ae01b to 1bc252a Compare March 15, 2024 06:05
@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from d8104f6 to 2b0357a Compare March 21, 2024 19:01
@abhioncbr
Copy link
Contributor Author

Based on the offline discussion, we decided to have pair of separate custom segment metadata reader/writer. We will decide to use custom reader/writer based on the version of the segment metadata.

@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from 2b0357a to 0f1dec9 Compare March 22, 2024 05:07
@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from 0f1dec9 to 7afcc03 Compare April 5, 2024 00:13
@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from 4a6b4c3 to 18e562b Compare April 9, 2024 19:48
@abhioncbr
Copy link
Contributor Author

@Jackie-Jiang / @klsince, please review it once more. Thanks

Copy link
Contributor

@klsince klsince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, w/ a few more questions and nits.

@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from 18e562b to 5d89c55 Compare April 19, 2024 20:03
@abhioncbr abhioncbr force-pushed the 12433-segement-metadata-custom-reader branch from 0552e9e to 3bdbae3 Compare May 24, 2024 00:23
CommonsConfigurationUtils.VERSIONED_CONFIG_SEPARATOR));

String[] keyValue = line.split(getPropertySeparator());
Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check might fail if the value happen to have the separator

The line.split() does regex matching internally, so better use StringUtils.split(), and StringUtils.split() can take a max param to return just two parts to handle the issue said above.

And since you have get separator above, you can save the calls of getPropertySeparator() on L48 and L52.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants