Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it a well-known issue that MessageSaver by default will just ignore the zero values? #354

Open
liufuyang opened this issue Oct 23, 2023 · 3 comments

Comments

@liufuyang
Copy link

So for example, if I have some proto Message/Struct that has some zero values (such as an int=0 or a boolean = false), then those will not be able to store in BQ side but instead null values will be written in?

image

This is a demo PR shows the error:
#353

@liufuyang
Copy link
Author

Adding more info:

Seeing from https://cloud.google.com/bigquery/docs/write-api#go_client, and testing with Google's write-api directly by using such helper code here, we can see that:

With such a proto:

message Row {
	string Name = 1;
	int32 Age = 2;
	int64 LastSeen = 3;  // timestamp in bigquery
}

Plus such golang code to creating a message:

         var rows = []*Row{
		{
			Name:     "John",
			Age:      104,
			LastSeen: time.Now().UnixMicro(),
		},
		{
			Name:     "Jane",
			Age:      0,
			LastSeen: time.Now().UnixMicro(),
		},
		{
			Name: "Adam",
		},
	}

By using the official write API, we get this written into the table:
image

and the table schema is defined as:
image

So perhaps we might want to keep the same behavior as the official package does here.

@liufuyang
Copy link
Author

Double checked with our implementation:

package main

import (
	"cloud.google.com/go/bigquery"
	"context"
	"go.einride.tech/protobuf-bigquery/encoding/protobq"
)

const (
	project = "whatever"
	dataset = "fuyang_test"
	table   = "fuyang_table"
)

func main() {
	ctx := context.Background()
	// Connect to BigQuery.
	client, err := bigquery.NewClient(ctx, project)
	if err != nil {
		panic(err)
	}
	table := client.Dataset(dataset).Table(table)
	// Insert the protobuf messages.
	inserter := table.Inserter()

	// the data we will stream to bigquery
	var rows = []*Row{
		{
			Name:     "John - written by go.einride.tech/protobuf-bigquery",
			Age:      104,
			// LastSeen: time.Now().UnixMicro(),
		},
		{
			Name: "Jane - written by go.einride.tech/protobuf-bigquery",
			Age:  0,
			// LastSeen: time.Now().UnixMicro(),
		},
		{
			Name: "Adam - written by go.einride.tech/protobuf-bigquery",
			// Age:      0,
			// LastSeen: time.Now().UnixMicro(),
		},
	}

	for _, r := range rows {
		if err := inserter.Put(ctx, &protobq.MessageSaver{
			Message: r,
			// InsertID: strconv.Itoa(i), // include an optional insert ID
		}); err != nil {
			panic(err)
		}
	}
}

Got this result (row 4 to 6) in the same table:

image

And seems the int type input of time_stamp gives some error (off-topic though), so if I include the lines of LastSeen: time.Now().UnixMicro(), above, it outputs:

panic: 1 row insertion failed (insertion of row [insertID: "S0tgCBWG45JQaNyeJR4CAgkDuuO"; insertIndex: 0] failed with error: {Location: "lastseen"; Message: "Epoch second value 1698072348178947 out of range"; Reason: "invalid"})

@cchatfield
Copy link

@liufuyang
Adding optional to the proto definition seems to work for proto3 - https://protobuf.dev/reference/go/go-generated/#singular-scalar-proto3

optional int32 offset = 2;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants