Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery/storage/managedwriter/adapt): add NormalizeDescriptor #4681

Merged
merged 15 commits into from Aug 27, 2021

Conversation

shollyman
Copy link
Contributor

@shollyman shollyman commented Aug 26, 2021

This functionality supports the "bring your own proto" case for writing
data.

Differences from the java equivalent:

  • NormalizeDescriptor return a DescriptorProto rather than ProtoSchema, to adhere to the option builder pattern (e.g. managedwriter.WithSchemaDescriptor).

Towards: #4366

@shollyman shollyman requested a review from a team August 26, 2021 22:01
@shollyman shollyman requested a review from a team as a code owner August 26, 2021 22:01
@shollyman shollyman requested a review from tswast August 26, 2021 22:01
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Aug 26, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 26, 2021
@shollyman shollyman requested a review from codyoss August 27, 2021 00:17
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but a few questions

var normalizationSkipList = []string{
".google.protobuf.DoubleValue",
".google.protobuf.FloatValue",
".google.protobuf.Int64Value",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support well known types now? Or is this to make sure they're supported when the backend adds support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're well known, so the backend shouldn't have issues resolving them without repacking. Getting them to behave as expected for the proto3 case is still the ongoing discussion. I could do some more integration testing on this front, but I wanted to get some other changes in to the main client before adding them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are skipped, what is happening? Maybe it is good to have a e2e test to show the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added integration testing, turns out the backend won't do the resolution:

integration_test.go:617: error in response: rpc error: code = InvalidArgument desc = Invalid proto schema: BqMessage.proto: testdata_WithWellKnownTypes.wrapped_int64: ".google.protobuf.Int64Value" is not defined.
        BqMessage.proto: testdata_WithWellKnownTypes.wrapped_string: ".google.protobuf.StringValue" is not defined. Entity: projects/shollyman-demo-test/datasets/managedwriter_test_dataset_20210827_73498187205915_0001/tables/table_20210827_73498187245922_0002/_default

Yiru, you want a bug for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you want is a direct support on wrappers, which we currently don't have. So it should be a dup to the same bug. Since Pavan suggested a way of not needing an annotation, we could make the decision based on the BQ table schema (which means I can fix it sooner). But again, such behavior needs to be consistent across client libs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want direct support on wrappers, but that's tied up in the other proto3 topics. I did expect that the well known types would just work out of the box, even if the api treated them as record types with a single value field.

return nil, fmt.Errorf("error converting message %s: %v", inField.FullName(), err)
}
root.NestedType = append(root.NestedType, dp)
visitedTypes.delete(msgFullName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we delete this? I guess because "visited" is more about detecting recursive types? I wonder if there's a more descriptive name we could use? (Ancestor types?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the names to keep it aligned with the other implementations, but its definitely something I think we could consider. Currently the internal C++ impl is the baseline, so we'd want to start there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. IMO the C++ impl could have better names, then. :-)

enumFullName := string(inField.Enum().FullName())
enclosingTypeName := normalizeName(enumFullName) + "_E"
enumName := string(inField.Enum().Name())
actualFullName := fmt.Sprintf("%s.%s", enclosingTypeName, enumName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just enums that get the dots? Why is that? Oh, is it because we actually generated the enclosing struct so we know it's properly nested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, gets deep into enum internals.

From the proto guide:

You can also use an enum type declared in one message as the type of a field in a different message, using the syntax MessageType.EnumType.

It's basically ensuring the enums don't collide by being explicit about the struct wrapping.

@yirutang
Copy link
Contributor

yirutang commented Aug 27, 2021 via email

@yirutang
Copy link
Contributor

yirutang commented Aug 27, 2021 via email

@shollyman shollyman added the automerge Merge the pull request once unit tests and other checks pass. label Aug 27, 2021
@shollyman shollyman merged commit c54aa74 into googleapis:master Aug 27, 2021
@shollyman shollyman deleted the proto-normalizer branch August 27, 2021 22:50
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants