Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protobuf codegen decoder #12980

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

rseetham
Copy link
Contributor

@rseetham rseetham commented Apr 22, 2024

resolves #12679

Design doc: https://docs.google.com/document/d/186mVLISgYEmljla3-IHrFNJ2dGMJ21qJ-MWFtsx-ayI/

Test resources have the generated code we expect for the different test proto files.

@codecov-commenter
Copy link

codecov-commenter commented Apr 22, 2024

Codecov Report

Attention: Patch coverage is 84.50292% with 53 lines in your changes are missing coverage. Please review.

Project coverage is 62.14%. Comparing base (59551e4) to head (2ccb920).
Report is 440 commits behind head on master.

Files Patch % Lines
...not/plugin/inputformat/protobuf/ProtoBufUtils.java 59.72% 22 Missing and 7 partials ⚠️
...n/inputformat/protobuf/codegen/MessageCodeGen.java 94.94% 3 Missing and 6 partials ⚠️
...format/protobuf/ProtoBufCodeGenMessageDecoder.java 78.94% 8 Missing ⚠️
...ugin/inputformat/protobuf/ProtoBufSchemaUtils.java 87.03% 3 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12980      +/-   ##
============================================
+ Coverage     61.75%   62.14%   +0.39%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2518      +82     
  Lines        133233   138194    +4961     
  Branches      20636    21371     +735     
============================================
+ Hits          82274    85887    +3613     
- Misses        44911    45892     +981     
- Partials       6048     6415     +367     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 ?
integration2 0.00% <0.00%> (ø)
java-11 62.15% <84.50%> (+0.44%) ⬆️
java-21 0.00% <0.00%> (-61.63%) ⬇️
skip-bytebuffers-false 62.14% <84.50%> (+0.39%) ⬆️
skip-bytebuffers-true 0.00% <0.00%> (-27.73%) ⬇️
temurin 62.14% <84.50%> (+0.39%) ⬆️
unittests 62.14% <84.50%> (+0.39%) ⬆️
unittests1 46.68% <ø> (-0.21%) ⬇️
unittests2 27.90% <84.50%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rseetham rseetham force-pushed the protbuf_codegen branch 2 times, most recently from 4615a6c to b64cfa1 Compare April 22, 2024 15:08
@rseetham rseetham marked this pull request as ready for review April 22, 2024 16:57
@rseetham rseetham changed the title Add protobuf codegen decoder with unit tests Add protobuf codegen decoder Apr 24, 2024
@rseetham
Copy link
Contributor Author

@KKcorps Could you take a look.

@chenboat
Copy link
Contributor

chenboat commented May 1, 2024

Can you link the overall design doc for this in the summary section?

// This is needed since the generated class name is not always the same as the proto file name.
// The descriptor that we get from the jar drops the first prefix of the proto class name.
// For example, insead of com.data.example.ExampleProto we get data.example.ExampleProto.
// Copied from Flink codebase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we import/use Flink method instead of copy? Does Flink allow such copy of their code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can import the flink code. We would have to add a dependency to flink's protobuf module just for these utils functions. They are just needed to get the full package name from a classLoader object. It's pretty straightforward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for copied code you can add comments like

/**
 * Copied from the presto TimeZoneKey. It basically caches the Joda Chronologies corresponding to each of the
 * timezones listed in the zone-index.properties
 * The zone-index.properties is kept in sync with the presto zone index properties.
 */ 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenboat There's other instances in Pinot where code was copied:

* =================================================================================================================
* THIS CLASS IS COPIED FROM Calcite's {@link ChainedSqlOperatorTable} and modified the function lookup to terminate
* early once found from ordered SqlOperatorTable list. This is to avoid some hard-coded casting assuming all Sql
* identifier looked-up are of the same SqlOperator type.
* =================================================================================================================

// This is needed since the generated class name is not always the same as the proto file name.
// The descriptor that we get from the jar drops the first prefix of the proto class name.
// For example, insead of com.data.example.ExampleProto we get data.example.ExampleProto.
// Copied from Flink codebase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for copied code you can add comments like

/**
 * Copied from the presto TimeZoneKey. It basically caches the Joda Chronologies corresponding to each of the
 * timezones listed in the zone-index.properties
 * The zone-index.properties is kept in sync with the presto zone index properties.
 */ 

public static final String NULLABLE_DOUBLE_FIELD = "nullable_double_field";
public static final String NULLABLE_FLOAT_FIELD = "nullable_float_field";
public static final String NULLABLE_BOOL_FIELD = "nullable_bool_field";
public static final String NULLABLE_BYTES_FIELD = "nullable_bytes_field";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the supported data types if pinot doesn't support the type natively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: We support all the protobuf descriptor types at present in the version we are using. If in the future new types get added, the codegen will skip that filed and log an error.
https://github.com/apache/pinot/pull/12980/files#diff-84564c2ff3bb09e23abe31143340f4e5a2aea0613588e1131311cb3ea17d4a5eR159-R177

@rseetham rseetham force-pushed the protbuf_codegen branch 2 times, most recently from 228811e to 8cadb65 Compare May 8, 2024 21:40
@rseetham rseetham force-pushed the protbuf_codegen branch 2 times, most recently from 003bf3f to 8bf1d6d Compare May 10, 2024 18:34
Descriptors.FieldDescriptor keyFd =
fd.getMessageType().findFieldByName("key");
Descriptors.FieldDescriptor valueFd =
fd.getMessageType().findFieldByName("value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

}
}

private static void addFieldToPinotSchema(Schema pinotSchema, FieldSpec.DataType dataType, String name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this code re-used from somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is from AvroUtils and JsonUtils. They both have this function:
AvroUtils
JsonUtils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add it to a common package and use it. I can do that in a follow up pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support decoding protobuf input using code generation to call native methods
5 participants