Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery/storage/managedwriter): improve protobuf support #4589

Merged
merged 14 commits into from Aug 16, 2021

Conversation

shollyman
Copy link
Contributor

@shollyman shollyman commented Aug 10, 2021

After some ongoing discussions with the Storage API team, this PR
improves support for proto2/proto3 syntax in protocol buffer code.

  • Updates testdata so that we have a proto2 and proto3 form of
    our SimpleMessage data.

  • Add reference schemas to testdata.

  • Updates proto conversion code in the adapt package so it's creating
    proto2 messages by default, but supports generation of proto3 style
    descriptors.

  • Adds benchmarks for dynamic schema generation and static
    serialization, to aid in some internal discussions. The GithubArchive schema
    has all fields fully nullable, so it should provide a good comparison when having
    to represent data using well known wrapper types.

Towards: #4366

Benchmark results thus far:

goos: linux
goarch: amd64
pkg: cloud.google.com/go/bigquery/storage/managedwriter/benchmarks
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkStorageSchemaToDescriptor
BenchmarkStorageSchemaToDescriptor/SingleField-proto2
BenchmarkStorageSchemaToDescriptor/SingleField-proto2-8                    21765             54810 ns/op
BenchmarkStorageSchemaToDescriptor/SingleField-proto3
BenchmarkStorageSchemaToDescriptor/SingleField-proto3-8                    21514             55778 ns/op
BenchmarkStorageSchemaToDescriptor/NestedRecord-proto2
BenchmarkStorageSchemaToDescriptor/NestedRecord-proto2-8                    8998            135923 ns/op
BenchmarkStorageSchemaToDescriptor/NestedRecord-proto3
BenchmarkStorageSchemaToDescriptor/NestedRecord-proto3-8                    8666            137683 ns/op
BenchmarkStorageSchemaToDescriptor/SimpleMessage-proto2
BenchmarkStorageSchemaToDescriptor/SimpleMessage-proto2-8                  20750             57315 ns/op
BenchmarkStorageSchemaToDescriptor/SimpleMessage-proto3
BenchmarkStorageSchemaToDescriptor/SimpleMessage-proto3-8                  20738             58077 ns/op
BenchmarkStorageSchemaToDescriptor/GithubArchiveSchema-proto2
BenchmarkStorageSchemaToDescriptor/GithubArchiveSchema-proto2-8             5151            241598 ns/op
BenchmarkStorageSchemaToDescriptor/GithubArchiveSchema-proto3
BenchmarkStorageSchemaToDescriptor/GithubArchiveSchema-proto3-8             4711            252906 ns/op
BenchmarkStaticProtoSerialization
BenchmarkStaticProtoSerialization/SimpleMessageProto2
    protobenchmarks_test.go:216: N=1, avg bytes/message: 36
    protobenchmarks_test.go:216: N=100, avg bytes/message: 36
    protobenchmarks_test.go:216: N=10000, avg bytes/message: 36
    protobenchmarks_test.go:216: N=1000000, avg bytes/message: 36
    protobenchmarks_test.go:216: N=1780552, avg bytes/message: 36
BenchmarkStaticProtoSerialization/SimpleMessageProto2-8                  1780552               669.1 ns/op
BenchmarkStaticProtoSerialization/SimpleMessageProto3
    protobenchmarks_test.go:216: N=1, avg bytes/message: 38
    protobenchmarks_test.go:216: N=100, avg bytes/message: 38
    protobenchmarks_test.go:216: N=10000, avg bytes/message: 38
    protobenchmarks_test.go:216: N=1000000, avg bytes/message: 38
    protobenchmarks_test.go:216: N=1605812, avg bytes/message: 38
BenchmarkStaticProtoSerialization/SimpleMessageProto3-8                  1605812               751.4 ns/op
BenchmarkStaticProtoSerialization/GithubArchiveProto2
    protobenchmarks_test.go:216: N=1, avg bytes/message: 331
    protobenchmarks_test.go:216: N=100, avg bytes/message: 330
    protobenchmarks_test.go:216: N=10000, avg bytes/message: 329
    protobenchmarks_test.go:216: N=278527, avg bytes/message: 329
BenchmarkStaticProtoSerialization/GithubArchiveProto2-8                   278527              4258 ns/op
BenchmarkStaticProtoSerialization/GithubArchiveProto2_Sparse
    protobenchmarks_test.go:216: N=1, avg bytes/message: 23
    protobenchmarks_test.go:216: N=100, avg bytes/message: 23
    protobenchmarks_test.go:216: N=10000, avg bytes/message: 23
    protobenchmarks_test.go:216: N=1000000, avg bytes/message: 23
    protobenchmarks_test.go:216: N=1955377, avg bytes/message: 23
BenchmarkStaticProtoSerialization/GithubArchiveProto2_Sparse-8           1955377               615.5 ns/op
BenchmarkStaticProtoSerialization/GithubArchiveProto3
    protobenchmarks_test.go:216: N=1, avg bytes/message: 363
    protobenchmarks_test.go:216: N=100, avg bytes/message: 367
    protobenchmarks_test.go:216: N=10000, avg bytes/message: 366
    protobenchmarks_test.go:216: N=206368, avg bytes/message: 366
BenchmarkStaticProtoSerialization/GithubArchiveProto3-8                   206368              6052 ns/op
BenchmarkStaticProtoSerialization/GithubArchiveProto3_Sparse
    protobenchmarks_test.go:216: N=1, avg bytes/message: 25
    protobenchmarks_test.go:216: N=100, avg bytes/message: 25
    protobenchmarks_test.go:216: N=10000, avg bytes/message: 25
    protobenchmarks_test.go:216: N=1000000, avg bytes/message: 25
    protobenchmarks_test.go:216: N=1655194, avg bytes/message: 25
BenchmarkStaticProtoSerialization/GithubArchiveProto3_Sparse-8           1655194               725.6 ns/op

After some ongoing discussions with the Storage API team, this PR
improves support for proto2/proto3 syntax in protocol buffer code.

* Updates testdata so that we have a proto2 and proto3 form of
  our SimpleMessage data.

* Add reference schemas to testdata.

* Updates proto conversion code in the adapt package so it's creating
  proto2 messages by default.  The code paths for doing proto3
  conversions are present, but not exported yet as the storage API
  doesn't properly handle proto3 expectations for conversion. Namely,
  conversion doesn't properly account for default values and use of
  wrapper types.

* Adds benchmarks for dynamic schema generation and static
  serialization, to aid in some internal discussions.
@shollyman shollyman requested a review from a team August 10, 2021 22:54
@shollyman shollyman requested a review from a team as a code owner August 10, 2021 22:54
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Aug 10, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 10, 2021
@shollyman shollyman added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Aug 10, 2021
@shollyman shollyman requested review from tswast and removed request for loferris August 12, 2021 17:31
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good to see that proto3 isn't all that much slower than proto2.

bigquery/storage/managedwriter/adapt/protoconversion.go Outdated Show resolved Hide resolved
{Name: "four", Value: 1},
{Name: "five", Value: 2},
var testSimpleData = []*testdata.SimpleMessageProto2{
{Name: proto.String("one"), Value: proto.Int64(1)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these proto.String and proto.Int64 wrapper types in proto2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structs take pointers to strings rather than strings directly, the proto.String, proto.Int64 etc funcs are just wrappers to take a value and return a pointer to the value.

@shollyman shollyman removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Aug 12, 2021
@shollyman shollyman requested a review from tswast August 12, 2021 20:15
Copy link
Member

@codyoss codyoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one small suggestion.

@shollyman shollyman merged commit a455082 into googleapis:master Aug 16, 2021
@shollyman shollyman deleted the protocompare branch August 16, 2021 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants