Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner/spannertest): Support generated columns #4742

Merged
merged 17 commits into from Sep 20, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 3 additions & 1 deletion spanner/spannertest/README.md
Expand Up @@ -22,7 +22,9 @@ by ascending esotericism:
- more aggregation functions
- SELECT HAVING
- more literal types
- generated columns
- expressions that return null for generated columns
- generated columns referencing other generated columns
olavloite marked this conversation as resolved.
Show resolved Hide resolved
- checking dependencies on a generated column before deleting a column
- expression type casting, coercion
- multiple joins
- subselects
Expand Down
56 changes: 51 additions & 5 deletions spanner/spannertest/db.go
Expand Up @@ -24,6 +24,7 @@ package spannertest
import (
"bytes"
"encoding/base64"
"errors"
"fmt"
"sort"
"strconv"
Expand Down Expand Up @@ -66,11 +67,12 @@ type table struct {

// colInfo represents information about a column in a table or result set.
type colInfo struct {
Name spansql.ID
Type spansql.Type
NotNull bool // only set for table columns
AggIndex int // Index+1 of SELECT list for which this is an aggregate value.
Alias spansql.PathExp // an alternate name for this column (result sets only)
Name spansql.ID
Type spansql.Type
Generated spansql.Expr
NotNull bool // only set for table columns
AggIndex int // Index+1 of SELECT list for which this is an aggregate value.
Alias spansql.PathExp // an alternate name for this column (result sets only)
}

// commitTimestampSentinel is a sentinel value for TIMESTAMP fields with allow_commit_timestamp=true.
Expand Down Expand Up @@ -395,6 +397,9 @@ func (d *database) writeValues(tx *transaction, tbl spansql.ID, cols []spansql.I
for j, v := range vs.Values {
i := colIndexes[j]

if t.cols[i].Generated != nil {
return status.Error(codes.InvalidArgument, "values can't be written to a generated column")
}
x, err := valForType(v, t.cols[i].Type)
if err != nil {
return err
Expand All @@ -414,6 +419,40 @@ func (d *database) writeValues(tx *transaction, tbl spansql.ID, cols []spansql.I
if err := f(t, colIndexes, r); err != nil {
return err
}

// Get row again after potential update merge to ensure we compute
// generated columns with fresh data.
pk := r[:t.pkCols]
rowNum, found := t.rowForPK(pk)
// this should never fail as the row was just inserted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this -> This

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the missing period at the end ... inserted.

if !found {
return status.Error(codes.Internal, "row failed to be inserted")
}
row := t.rows[rowNum]
ec := evalContext{
cols: t.cols,
row: row,
}

// TODO: We would need to do a topological sort on dependencies (i.e. what other columns the expression references)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a convention to limit the line length of comments to 80 characters.

// to ensure we can handle generated columns which reference other generated columns
for i, col := range t.cols {
if col.Generated != nil {
res, err := ec.evalExpr(col.Generated)
if err != nil {
// We assume that if the expression ended up with a type error a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the evaluation of the different expressions, and the problem that you are running into is that those evaluations do not take into account that some of the input parameters could be NULL. So I think that this is a reasonable workaround for this for now, but we should probably try to fix the evaluation of expressions that may return null values and then remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly, its very much a best effort but dumb approach. Cheers

// dependent column is null when it should not be so we can skip
// calculating this value for now. This is only valid if the column
// is nullable
if !col.NotNull && errors.Is(err, errIncorrectType) {
row[i] = nil
continue
}
return status.Errorf(codes.Internal, "evaluating generator expression failed: %v", err)
}
row[i] = res
}
}
}

return nil
Expand Down Expand Up @@ -634,6 +673,10 @@ func (t *table) addColumn(cd spansql.ColumnDef, newTable bool) *status.Status {
// TODO: what happens in this case?
return status.Newf(codes.Unimplemented, "can't add NOT NULL columns to non-empty tables yet")
}
if cd.Generated != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This check should be extended to check whether the table is empty or not. Adding a generated column to an empty tables should be safe in this case.

This PR should also add a check to dropColumn to prevent a column that is a dependent of a generated column from being dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checking the table is non empty, as for checking if its a dependency of a generated column do you still think its worth adding that as we talked about dropping all dependency calculations above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that it was already checking it.

As for the dependency checking:

  1. I don't think you need any dependency checking here.
  2. We should preferably use the dependency checking in dropColumn to prevent columns that are used by a generated column from being dropped. But preferably that dependency checking can be done using the method that also does the evaluation, so that we don't have to duplicate that code. Would it for example be possible to use default values for the evaluation context when we only want to check the dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that definitely sounds like the right approach, having a way to run the evaluator in with default values in some "complete" way (i.e. we'd have to check in an OR what is required to evaluate the RHS even if the LHS evaluates to true). It feels out of scope for me right now, given it will be a fair bit of extra work to get going well and I'm not overly confident with the evaluation code and typing. For now I'll just come up with the solution for adding columns as any other features should be able to be built on top without changes so doesn't feel wasteful

// TODO: should backfill the data to maintain behaviour with real spanner
return status.Newf(codes.Unimplemented, "can't add generated columns to non-empty tables yet")
}
for i := range t.rows {
t.rows[i] = append(t.rows[i], nil)
}
Expand All @@ -643,6 +686,9 @@ func (t *table) addColumn(cd spansql.ColumnDef, newTable bool) *status.Status {
Name: cd.Name,
Type: cd.Type,
NotNull: cd.NotNull,
// TODO: We should figure out what columns the Generator expression relies on and check validate it at this time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please well format the comment.

// currently it will just fail when writing data instead.
Generated: cd.Generated,
})
t.colIndex[cd.Name] = len(t.cols) - 1
if !newTable {
Expand Down
17 changes: 12 additions & 5 deletions spanner/spannertest/db_eval.go
Expand Up @@ -20,6 +20,7 @@ package spannertest

import (
"bytes"
"errors"
"fmt"
"regexp"
"strconv"
Expand Down Expand Up @@ -53,6 +54,11 @@ type coercedValue struct {
orig spansql.Expr
}

var (
// errIncorrectType represent when an expression evaluates to the wrong type.
errIncorrectType = errors.New("incorrect type")
)

func (cv coercedValue) SQL() string { return cv.orig.SQL() }

func (ec evalContext) evalExprList(list []spansql.Expr) ([]interface{}, error) {
Expand Down Expand Up @@ -237,6 +243,7 @@ func (ec evalContext) evalBoolExpr(be spansql.BoolExpr) (*bool, error) {
}

func (ec evalContext) evalArithOp(e spansql.ArithOp) (interface{}, error) {
// TODO: Better NULL handling
switch e.Op {
case spansql.Neg:
rhs, err := ec.evalExpr(e.RHS)
Expand All @@ -249,7 +256,7 @@ func (ec evalContext) evalArithOp(e spansql.ArithOp) (interface{}, error) {
case int64:
return -rhs, nil
}
return nil, fmt.Errorf("RHS of %s evaluates to %T, want FLOAT64 or INT64", e.SQL(), rhs)
return nil, fmt.Errorf("%w: RHS of %s evaluates to %T, want FLOAT64 or INT64", errIncorrectType, e.SQL(), rhs)
case spansql.BitNot:
rhs, err := ec.evalExpr(e.RHS)
if err != nil {
Expand All @@ -265,7 +272,7 @@ func (ec evalContext) evalArithOp(e spansql.ArithOp) (interface{}, error) {
}
return b, nil
}
return nil, fmt.Errorf("RHS of %s evaluates to %T, want INT64 or BYTES", e.SQL(), rhs)
return nil, fmt.Errorf("%w: RHS of %s evaluates to %T, want INT64 or BYTES", errIncorrectType, e.SQL(), rhs)
case spansql.Div:
lhs, err := ec.evalFloat64(e.LHS)
if err != nil {
Expand Down Expand Up @@ -341,7 +348,7 @@ func (ec evalContext) evalArithOp(e spansql.ArithOp) (interface{}, error) {
b1, ok1 := lhs.([]byte)
b2, ok2 := rhs.([]byte)
if !ok1 || !ok2 {
return nil, fmt.Errorf("arguments of %s evaluate to (%T, %T), want (INT64, INT64) or (BYTES, BYTES)", e.SQL(), lhs, rhs)
return nil, fmt.Errorf("%w: arguments of %s evaluate to (%T, %T), want (INT64, INT64) or (BYTES, BYTES)", errIncorrectType, e.SQL(), lhs, rhs)
}
if len(b1) != len(b2) {
return nil, fmt.Errorf("arguments of %s evaluate to BYTES of unequal lengths (%d vs %d)", e.SQL(), len(b1), len(b2))
Expand Down Expand Up @@ -393,7 +400,7 @@ func (ec evalContext) evalFloat64(e spansql.Expr) (float64, error) {
func asFloat64(e spansql.Expr, v interface{}) (float64, error) {
switch v := v.(type) {
default:
return 0, fmt.Errorf("expression %s evaluates to %T, want FLOAT64 or INT64", e.SQL(), v)
return 0, fmt.Errorf("%w: expression %s evaluates to %T, want FLOAT64 or INT64", errIncorrectType, e.SQL(), v)
case float64:
return v, nil
case int64:
Expand Down Expand Up @@ -503,7 +510,7 @@ func (ec evalContext) evalExpr(e spansql.Expr) (interface{}, error) {
}
arr, ok := rhs.([]interface{})
if !ok {
return nil, fmt.Errorf("UNNEST argument evaluated as %T, want array", rhs)
return nil, fmt.Errorf("%w: UNNEST argument evaluated as %T, want array", errIncorrectType, rhs)
}
for _, rhs := range arr {
// == isn't okay here.
Expand Down
77 changes: 77 additions & 0 deletions spanner/spannertest/db_test.go
Expand Up @@ -22,10 +22,12 @@ import (
"fmt"
"io"
"reflect"
"strings"
"sync"
"testing"

"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

structpb "github.com/golang/protobuf/ptypes/struct"

Expand Down Expand Up @@ -361,6 +363,81 @@ func TestConcurrentReadInsert(t *testing.T) {
}
}

func TestGeneratedColumn(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add tests for:

  1. Adding a generated column to an existing table.
  2. Altering a generated column.
  3. Dropping a column that is a dependant of a generated column.

sql := `CREATE TABLE Songwriters (
Id INT64 NOT NULL,
Name STRING(20),
Age INT64,
Over18 BOOL AS (Age > 18) STORED,
) PRIMARY KEY (Id);`
var db database

ddl, err := spansql.ParseDDL("filename", sql)
if err != nil {
t.Fatalf("%s: Bad DDL", err)
}
for _, stmt := range ddl.List {
if st := db.ApplyDDL(stmt); st.Code() != codes.OK {
t.Fatal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add error message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the error message.

}
}

addColSql := `ALTER TABLE Songwriters ADD COLUMN CanonicalName STRING(20) AS (LOWER(Name)) STORED;`
ddl, err = spansql.ParseDDL("filename", addColSql)
if err != nil {
t.Fatalf("%s: Bad DDL", err)
}
if st := db.ApplyDDL(ddl.List[0]); st.Code() != codes.OK {
t.Fatalf("Should have been able to add a generated column to empty table\n status: %v", st)
}

tx := db.NewTransaction()
err = db.Insert(tx, "Songwriters",
[]spansql.ID{"Id", "Over18"},
[]*structpb.ListValue{
listV(stringV("3"), boolV(true)),
})
if err == nil || status.Code(err) != codes.InvalidArgument {
t.Fatal("Should have failed to insert to generated column")
}

name := "Famous Writer"
err = db.Insert(tx, "Songwriters",
[]spansql.ID{"Id", "Name", "Age"},
[]*structpb.ListValue{
listV(stringV("3"), stringV(name), stringV("40")),
})
if err != nil {
t.Fatalf("Should have succeeded to insert to without generated column: %v", err)
}

var kr keyRangeList
iter, err := db.Read("Songwriters", []spansql.ID{"Id", "CanonicalName", "Over18"},
[]*structpb.ListValue{
listV(stringV("3")),
}, kr, 0)
if err != nil {
t.Fatalf("failed to read: %v", err)
}
rows := slurp(t, iter)
if rows[0][1].(string) != strings.ToLower(name) {
t.Fatalf("Generated value for CanonicalName mismatch\n Got: %v\n Want: %v", rows[0][1].(string), strings.ToLower(name))
}
if !rows[0][2].(bool) {
t.Fatalf("Generated value for Over18 mismatch\n Got: %v\n Want: true", rows[0][2].(bool))
}

addColSql = `ALTER TABLE Songwriters ADD COLUMN Under18 BOOL AS (Age < 18) STORED;`
ddl, err = spansql.ParseDDL("filename", addColSql)
if err != nil {
t.Fatalf("%s: Bad DDL", err)
}
if st := db.ApplyDDL(ddl.List[0]); st.Code() == codes.OK {
t.Fatalf("Should have failed to add a generated column to non-empty table\n status: %v", st)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add tests for update and delete?

}

func slurp(t *testing.T, ri rowIter) (all [][]interface{}) {
t.Helper()
for {
Expand Down
8 changes: 8 additions & 0 deletions spanner/spannertest/funcs.go
Expand Up @@ -52,6 +52,14 @@ var functions = map[string]function{
return strings.HasPrefix(s, prefix), spansql.Type{Base: spansql.Bool}, nil
},
},
"LOWER": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change belongs in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move it to a separate one, just smuggled it in here as I was testing with it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I would prefer to have it in a separate PR, unless it is actually used in one of the test cases (I don't think it is at this moment, right?)

Eval: func(values []interface{}) (interface{}, spansql.Type, error) {
if len(values) != 1 {
return nil, spansql.Type{}, status.Error(codes.InvalidArgument, "No matching signature for function LOWER for the given argument types")
}
return strings.ToLower(values[0].(string)), spansql.Type{Base: spansql.String}, nil
},
},
}

type aggregateFunc struct {
Expand Down
100 changes: 100 additions & 0 deletions spanner/spannertest/integration_test.go
Expand Up @@ -1262,6 +1262,106 @@ func TestIntegration_ReadsAndQueries(t *testing.T) {
}
}

func TestIntegration_GeneratedColumns(t *testing.T) {
client, adminClient, _, cleanup := makeClient(t)
defer cleanup()

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

tableName := "SongWriters"

err := updateDDL(t, adminClient,
`CREATE TABLE `+tableName+` (
Name STRING(50) NOT NULL,
NumSongs INT64,
EstimatedSales INT64 NOT NULL,
CanonicalName STRING(50) AS (LOWER(Name)) STORED,
) PRIMARY KEY (Name)`)
if err != nil {
t.Fatalf("Setting up fresh table: %v", err)
}
err = updateDDL(t, adminClient,
`ALTER TABLE `+tableName+` ADD COLUMN TotalSales INT64 AS (NumSongs * EstimatedSales) STORED`)
if err != nil {
t.Fatalf("Adding new column: %v", err)
}

// Insert some data.
_, err = client.Apply(ctx, []*spanner.Mutation{
spanner.Insert(tableName,
[]string{"Name", "EstimatedSales", "NumSongs"},
[]interface{}{"Average Writer", 10, 10}),
spanner.Insert(tableName,
[]string{"Name", "EstimatedSales"},
[]interface{}{"Great Writer", 100}),
spanner.Insert(tableName,
[]string{"Name", "EstimatedSales", "NumSongs"},
[]interface{}{"Poor Writer", 1, 50}),
})
if err != nil {
t.Fatalf("Applying mutations: %v", err)
}

err = updateDDL(t, adminClient,
`ALTER TABLE `+tableName+` ADD COLUMN TotalSales2 INT64 AS (NumSongs * EstimatedSales) STORED`)
if err == nil {
t.Fatalf("Should have failed to add a generated column to non empty table")
}

ri := client.Single().Query(ctx, spanner.NewStatement(
`SELECT CanonicalName, TotalSales FROM `+tableName+` ORDER BY Name`,
))
all, err := slurpRows(t, ri)
if err != nil {
t.Errorf("Read rows failed: %v", err)
}

// Great writer has nil because NumSongs is nil
want := [][]interface{}{
{"average writer", int64(100)},
{"great writer", nil},
{"poor writer", int64(50)},
}
if !reflect.DeepEqual(all, want) {
t.Errorf("Expected values are wrong.\n got %v\nwant %v", all, want)
}

// Test modifying the generated values and nulling one
_, err = client.Apply(ctx, []*spanner.Mutation{
spanner.Update(tableName,
[]string{"Name", "NumSongs"},
[]interface{}{"Average Writer", 50}),
spanner.Update(tableName,
[]string{"Name", "NumSongs"},
[]interface{}{"Great Writer", 10}),
spanner.Update(tableName,
[]string{"Name", "NumSongs"},
[]interface{}{"Poor Writer", nil}),
})
if err != nil {
t.Fatalf("Applying mutations: %v", err)
}

ri = client.Single().Query(ctx, spanner.NewStatement(
`SELECT CanonicalName, TotalSales FROM `+tableName+` ORDER BY Name`,
))
all, err = slurpRows(t, ri)
if err != nil {
t.Errorf("Read rows failed: %v", err)
}

// Great writer has nil because NumSongs is nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should be Poor writer, right?

want = [][]interface{}{
{"average writer", int64(500)},
{"great writer", int64(1000)},
{"poor writer", nil},
}
if !reflect.DeepEqual(all, want) {
t.Errorf("Expected values are wrong.\n got %v\nwant %v", all, want)
}
}

func dropTable(t *testing.T, adminClient *dbadmin.DatabaseAdminClient, table string) error {
t.Helper()
err := updateDDL(t, adminClient, "DROP TABLE "+table)
Expand Down