Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner/spannertest): Support generated columns #4742

Merged
merged 17 commits into from Sep 20, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 3 additions & 1 deletion spanner/spannertest/README.md
Expand Up @@ -22,7 +22,9 @@ by ascending esotericism:
- more aggregation functions
- SELECT HAVING
- more literal types
- generated columns
- expressions that return null for generated columns
- generated columns referencing other generated columns
olavloite marked this conversation as resolved.
Show resolved Hide resolved
- checking dependencies on a generated column before deleting a column
- expression type casting, coercion
- multiple joins
- subselects
Expand Down
47 changes: 42 additions & 5 deletions spanner/spannertest/db.go
Expand Up @@ -66,11 +66,12 @@ type table struct {

// colInfo represents information about a column in a table or result set.
type colInfo struct {
Name spansql.ID
Type spansql.Type
NotNull bool // only set for table columns
AggIndex int // Index+1 of SELECT list for which this is an aggregate value.
Alias spansql.PathExp // an alternate name for this column (result sets only)
Name spansql.ID
Type spansql.Type
Generated spansql.Expr
NotNull bool // only set for table columns
AggIndex int // Index+1 of SELECT list for which this is an aggregate value.
Alias spansql.PathExp // an alternate name for this column (result sets only)
}

// commitTimestampSentinel is a sentinel value for TIMESTAMP fields with allow_commit_timestamp=true.
Expand Down Expand Up @@ -395,6 +396,9 @@ func (d *database) writeValues(tx *transaction, tbl spansql.ID, cols []spansql.I
for j, v := range vs.Values {
i := colIndexes[j]

if t.cols[i].Generated != nil {
return status.Error(codes.InvalidArgument, "values can't be written to a generated column")
}
x, err := valForType(v, t.cols[i].Type)
if err != nil {
return err
Expand All @@ -414,6 +418,32 @@ func (d *database) writeValues(tx *transaction, tbl spansql.ID, cols []spansql.I
if err := f(t, colIndexes, r); err != nil {
return err
}

// Get row again after potential update merge to ensure we compute
// generated columns with fresh data.
pk := r[:t.pkCols]
rowNum, found := t.rowForPK(pk)
// this should never fail as the row was just inserted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this -> This

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the missing period at the end ... inserted.

if !found {
return status.Error(codes.Internal, "row failed to be inserted")
}
row := t.rows[rowNum]
ec := evalContext{
cols: t.cols,
row: row,
}

// TODO: We would need to do a topological sort on dependencies (i.e. what other columns the expression references)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a convention to limit the line length of comments to 80 characters.

// to ensure we can handle generated columns which reference other generated columns
for i, col := range t.cols {
if col.Generated != nil {
res, err := ec.evalExpr(col.Generated)
if err != nil {
return err
}
row[i] = res
}
}
}

return nil
Expand Down Expand Up @@ -634,6 +664,10 @@ func (t *table) addColumn(cd spansql.ColumnDef, newTable bool) *status.Status {
// TODO: what happens in this case?
return status.Newf(codes.Unimplemented, "can't add NOT NULL columns to non-empty tables yet")
}
if cd.Generated != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This check should be extended to check whether the table is empty or not. Adding a generated column to an empty tables should be safe in this case.

This PR should also add a check to dropColumn to prevent a column that is a dependent of a generated column from being dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checking the table is non empty, as for checking if its a dependency of a generated column do you still think its worth adding that as we talked about dropping all dependency calculations above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that it was already checking it.

As for the dependency checking:

  1. I don't think you need any dependency checking here.
  2. We should preferably use the dependency checking in dropColumn to prevent columns that are used by a generated column from being dropped. But preferably that dependency checking can be done using the method that also does the evaluation, so that we don't have to duplicate that code. Would it for example be possible to use default values for the evaluation context when we only want to check the dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that definitely sounds like the right approach, having a way to run the evaluator in with default values in some "complete" way (i.e. we'd have to check in an OR what is required to evaluate the RHS even if the LHS evaluates to true). It feels out of scope for me right now, given it will be a fair bit of extra work to get going well and I'm not overly confident with the evaluation code and typing. For now I'll just come up with the solution for adding columns as any other features should be able to be built on top without changes so doesn't feel wasteful

// TODO: should backfill the data to maintain behaviour with real spanner
return status.Newf(codes.Unimplemented, "can't add generated columns to non-empty tables yet")
}
for i := range t.rows {
t.rows[i] = append(t.rows[i], nil)
}
Expand All @@ -643,6 +677,9 @@ func (t *table) addColumn(cd spansql.ColumnDef, newTable bool) *status.Status {
Name: cd.Name,
Type: cd.Type,
NotNull: cd.NotNull,
// TODO: We should figure out what columns the Generator expression relies on and check validate it at this time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please well format the comment.

// currently it will just fail when writing data instead.
Generated: cd.Generated,
})
t.colIndex[cd.Name] = len(t.cols) - 1
if !newTable {
Expand Down
19 changes: 19 additions & 0 deletions spanner/spannertest/db_eval.go
Expand Up @@ -237,12 +237,16 @@ func (ec evalContext) evalBoolExpr(be spansql.BoolExpr) (*bool, error) {
}

func (ec evalContext) evalArithOp(e spansql.ArithOp) (interface{}, error) {
// TODO: Better NULL handling
switch e.Op {
case spansql.Neg:
rhs, err := ec.evalExpr(e.RHS)
if err != nil {
return nil, err
}
if rhs == nil {
return nil, nil
}
switch rhs := rhs.(type) {
case float64:
return -rhs, nil
Expand All @@ -255,6 +259,9 @@ func (ec evalContext) evalArithOp(e spansql.ArithOp) (interface{}, error) {
if err != nil {
return nil, err
}
if rhs == nil {
return nil, nil
}
switch rhs := rhs.(type) {
case int64:
return ^rhs, nil
Expand Down Expand Up @@ -285,10 +292,16 @@ func (ec evalContext) evalArithOp(e spansql.ArithOp) (interface{}, error) {
if err != nil {
return nil, err
}
if lhs == nil {
return nil, nil
}
rhs, err := ec.evalExpr(e.RHS)
if err != nil {
return nil, err
}
if rhs == nil {
return nil, nil
}
i1, ok1 := lhs.(int64)
i2, ok2 := rhs.(int64)
if ok1 && ok2 {
Expand Down Expand Up @@ -322,10 +335,16 @@ func (ec evalContext) evalArithOp(e spansql.ArithOp) (interface{}, error) {
if err != nil {
return nil, err
}
if lhs == nil {
return nil, nil
}
rhs, err := ec.evalExpr(e.RHS)
if err != nil {
return nil, err
}
if rhs == nil {
return nil, nil
}
i1, ok1 := lhs.(int64)
i2, ok2 := rhs.(int64)
if ok1 && ok2 {
Expand Down
86 changes: 86 additions & 0 deletions spanner/spannertest/db_test.go
Expand Up @@ -22,10 +22,12 @@ import (
"fmt"
"io"
"reflect"
"strings"
"sync"
"testing"

"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

structpb "github.com/golang/protobuf/ptypes/struct"

Expand Down Expand Up @@ -361,6 +363,90 @@ func TestConcurrentReadInsert(t *testing.T) {
}
}

func TestGeneratedColumn(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add tests for:

  1. Adding a generated column to an existing table.
  2. Altering a generated column.
  3. Dropping a column that is a dependant of a generated column.

sql := `CREATE TABLE Songwriters (
Id INT64 NOT NULL,
Name STRING(20),
Age INT64,
Over18 BOOL AS (Age > 18) STORED,
) PRIMARY KEY (Id);`
var db database

ddl, err := spansql.ParseDDL("filename", sql)
if err != nil {
t.Fatalf("%s: Bad DDL", err)
}
for _, stmt := range ddl.List {
if st := db.ApplyDDL(stmt); st.Code() != codes.OK {
t.Fatal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add error message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the error message.

}
}

addColSQL := `ALTER TABLE Songwriters ADD COLUMN CanonicalName STRING(20) AS (LOWER(Name)) STORED;`
ddl, err = spansql.ParseDDL("filename", addColSQL)
if err != nil {
t.Fatalf("%s: Bad DDL", err)
}
if st := db.ApplyDDL(ddl.List[0]); st.Code() != codes.OK {
t.Fatalf("Should have been able to add a generated column to empty table\n status: %v", st)
}

tx := db.NewTransaction()
err = db.Insert(tx, "Songwriters",
[]spansql.ID{"Id", "Over18"},
[]*structpb.ListValue{
listV(stringV("3"), boolV(true)),
})
if err == nil || status.Code(err) != codes.InvalidArgument {
t.Fatal("Should have failed to insert to generated column")
}

err = db.Insert(tx, "Songwriters",
[]spansql.ID{"Id"},
[]*structpb.ListValue{
listV(stringV("1")),
})
if err != nil {
t.Fatalf("Should have succeeded to insert to with no dependent columns: %v", err)
}

name := "Famous Writer"
err = db.Insert(tx, "Songwriters",
[]spansql.ID{"Id", "Name", "Age"},
[]*structpb.ListValue{
listV(stringV("3"), stringV(name), stringV("40")),
})
if err != nil {
t.Fatalf("Should have succeeded to insert to without generated column: %v", err)
}

var kr keyRangeList
iter, err := db.Read("Songwriters", []spansql.ID{"Id", "CanonicalName", "Over18"},
[]*structpb.ListValue{
listV(stringV("3")),
}, kr, 0)
if err != nil {
t.Fatalf("failed to read: %v", err)
}
rows := slurp(t, iter)
if rows[0][1].(string) != strings.ToLower(name) {
t.Fatalf("Generated value for CanonicalName mismatch\n Got: %v\n Want: %v", rows[0][1].(string), strings.ToLower(name))
}
if !rows[0][2].(bool) {
t.Fatalf("Generated value for Over18 mismatch\n Got: %v\n Want: true", rows[0][2].(bool))
}

addColSQL = `ALTER TABLE Songwriters ADD COLUMN Under18 BOOL AS (Age < 18) STORED;`
ddl, err = spansql.ParseDDL("filename", addColSQL)
if err != nil {
t.Fatalf("%s: Bad DDL", err)
}
if st := db.ApplyDDL(ddl.List[0]); st.Code() == codes.OK {
t.Fatalf("Should have failed to add a generated column to non-empty table\n status: %v", st)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add tests for update and delete?

}

func slurp(t *testing.T, ri rowIter) (all [][]interface{}) {
t.Helper()
for {
Expand Down
14 changes: 14 additions & 0 deletions spanner/spannertest/funcs.go
Expand Up @@ -52,6 +52,20 @@ var functions = map[string]function{
return strings.HasPrefix(s, prefix), spansql.Type{Base: spansql.Bool}, nil
},
},
"LOWER": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change belongs in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move it to a separate one, just smuggled it in here as I was testing with it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I would prefer to have it in a separate PR, unless it is actually used in one of the test cases (I don't think it is at this moment, right?)

Eval: func(values []interface{}) (interface{}, spansql.Type, error) {
if len(values) != 1 {
return nil, spansql.Type{}, status.Error(codes.InvalidArgument, "No matching signature for function LOWER for the given argument types")
}
if values[0] == nil {
return nil, spansql.Type{Base: spansql.String}, nil
}
if _, ok := values[0].(string); !ok {
return nil, spansql.Type{}, status.Error(codes.InvalidArgument, "No matching signature for function LOWER for the given argument types")
}
return strings.ToLower(values[0].(string)), spansql.Type{Base: spansql.String}, nil
},
},
}

type aggregateFunc struct {
Expand Down
100 changes: 100 additions & 0 deletions spanner/spannertest/integration_test.go
Expand Up @@ -1262,6 +1262,106 @@ func TestIntegration_ReadsAndQueries(t *testing.T) {
}
}

func TestIntegration_GeneratedColumns(t *testing.T) {
client, adminClient, _, cleanup := makeClient(t)
defer cleanup()

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

tableName := "SongWriters"

err := updateDDL(t, adminClient,
`CREATE TABLE `+tableName+` (
Name STRING(50) NOT NULL,
NumSongs INT64,
EstimatedSales INT64 NOT NULL,
CanonicalName STRING(50) AS (LOWER(Name)) STORED,
) PRIMARY KEY (Name)`)
if err != nil {
t.Fatalf("Setting up fresh table: %v", err)
}
err = updateDDL(t, adminClient,
`ALTER TABLE `+tableName+` ADD COLUMN TotalSales INT64 AS (NumSongs * EstimatedSales) STORED`)
if err != nil {
t.Fatalf("Adding new column: %v", err)
}

// Insert some data.
_, err = client.Apply(ctx, []*spanner.Mutation{
spanner.Insert(tableName,
[]string{"Name", "EstimatedSales", "NumSongs"},
[]interface{}{"Average Writer", 10, 10}),
spanner.Insert(tableName,
[]string{"Name", "EstimatedSales"},
[]interface{}{"Great Writer", 100}),
spanner.Insert(tableName,
[]string{"Name", "EstimatedSales", "NumSongs"},
[]interface{}{"Poor Writer", 1, 50}),
})
if err != nil {
t.Fatalf("Applying mutations: %v", err)
}

err = updateDDL(t, adminClient,
`ALTER TABLE `+tableName+` ADD COLUMN TotalSales2 INT64 AS (NumSongs * EstimatedSales) STORED`)
if err == nil {
t.Fatalf("Should have failed to add a generated column to non empty table")
}

ri := client.Single().Query(ctx, spanner.NewStatement(
`SELECT CanonicalName, TotalSales FROM `+tableName+` ORDER BY Name`,
))
all, err := slurpRows(t, ri)
if err != nil {
t.Errorf("Read rows failed: %v", err)
}

// Great writer has nil because NumSongs is nil
want := [][]interface{}{
{"average writer", int64(100)},
{"great writer", nil},
{"poor writer", int64(50)},
}
if !reflect.DeepEqual(all, want) {
t.Errorf("Expected values are wrong.\n got %v\nwant %v", all, want)
}

// Test modifying the generated values and nulling one
_, err = client.Apply(ctx, []*spanner.Mutation{
spanner.Update(tableName,
[]string{"Name", "NumSongs"},
[]interface{}{"Average Writer", 50}),
spanner.Update(tableName,
[]string{"Name", "NumSongs"},
[]interface{}{"Great Writer", 10}),
spanner.Update(tableName,
[]string{"Name", "NumSongs"},
[]interface{}{"Poor Writer", nil}),
})
if err != nil {
t.Fatalf("Applying mutations: %v", err)
}

ri = client.Single().Query(ctx, spanner.NewStatement(
`SELECT CanonicalName, TotalSales FROM `+tableName+` ORDER BY Name`,
))
all, err = slurpRows(t, ri)
if err != nil {
t.Errorf("Read rows failed: %v", err)
}

// Great writer has nil because NumSongs is nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should be Poor writer, right?

want = [][]interface{}{
{"average writer", int64(500)},
{"great writer", int64(1000)},
{"poor writer", nil},
}
if !reflect.DeepEqual(all, want) {
t.Errorf("Expected values are wrong.\n got %v\nwant %v", all, want)
}
}

func dropTable(t *testing.T, adminClient *dbadmin.DatabaseAdminClient, table string) error {
t.Helper()
err := updateDDL(t, adminClient, "DROP TABLE "+table)
Expand Down