Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hotfix] Upsert compaction doesn't acknowledge maxLength info and trims string values #13157

Merged

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented May 15, 2024

label:
bugfix

Recently, we found that post-compaction, string fields with length greater than default 512 characters got trimmed. On debugging, I found that we were using schema generated from metadata.properties file and not passing schema from ZK to SegmentGenerator class. This hotfix is related to passing ZK schema to this.

One of the other root-causes is metadata.properties file missing schema max length information for columns. Also while creating fieldSpec from metadata properties we don't pass maxlength anywhere --

fieldSpec = new DimensionFieldSpec(column, dataType, isSingleValue, defaultNullValueString);

This leads to falling back to default 512 for string fields.

Sample metadata.properties values for a column:

column.Attr.cardinality = -2147483648
column.Attr.totalDocs = 1322868
column.Attr.dataType = STRING
column.Attr.bitsPerElement = 31
column.Attr.lengthOfEachEntry = 0
column.Attr.columnType = DIMENSION
column.Attr.isSorted = false
column.Attr.hasDictionary = false
column.Attr.isSingleValues = true
column.Attr.maxNumberOfMultiValues = 0
column.Attr.totalNumberOfEntries = 1322868
column.Attr.isAutoGenerated = false
column.Attr.minValue = null
column.Attr.minMaxValueInvalid = true
column.Attr.defaultNullValue = null

No schemaMaxLength field present here^^.

Longer term fix I am planning as a follow-up is to persist maxLength info in metadata properties file and parsing that in the ColumnMetadataImpl file while creating FieldSpec.

cc @snleee @Jackie-Jiang

@codecov-commenter
Copy link

codecov-commenter commented May 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 62.13%. Comparing base (59551e4) to head (75a3fb1).
Report is 444 commits behind head on master.

Files Patch % Lines
...upsertcompaction/UpsertCompactionTaskExecutor.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13157      +/-   ##
============================================
+ Coverage     61.75%   62.13%   +0.37%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2515      +79     
  Lines        133233   137861    +4628     
  Branches      20636    21335     +699     
============================================
+ Hits          82274    85654    +3380     
- Misses        44911    45814     +903     
- Partials       6048     6393     +345     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 ?
integration2 0.00% <0.00%> (ø)
java-11 62.11% <0.00%> (+0.40%) ⬆️
java-21 27.79% <0.00%> (-33.83%) ⬇️
skip-bytebuffers-false 62.11% <0.00%> (+0.37%) ⬆️
skip-bytebuffers-true 27.78% <0.00%> (+0.05%) ⬆️
temurin 62.13% <0.00%> (+0.37%) ⬆️
unittests 62.12% <0.00%> (+0.37%) ⬆️
unittests1 46.68% <ø> (-0.21%) ⬇️
unittests2 27.79% <0.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang Jackie-Jiang merged commit 8206ce8 into apache:master May 15, 2024
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants