Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema fingerprint verifier #292

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Manan007224
Copy link
Contributor

@Manan007224 Manan007224 commented Jul 4, 2021

Introduction

Introducing Schema Verifier

This componnent currently collects source and target schema fingerprints periodically (1 minute by default) and verifies that the schema has not been changed on the source and target while the migration is still going on.

Technical details

I don't think there is anything special about the code written, most if it is self-explanotary.

Performace

Collecting Schema fingerprint every 1 minute should not affect performance in any sort for the migration as the interval is just too high, although I did some performance testing by setting the PeriodicallyVerifySchemaFingerprintInterval = 50ms, just for curiosity on how would the performance affect.

I collected for 30s cpuprofile (using pprof ) and offcpuprofile using (offcputime-bpfcc), but didn't notice any performance issues at all.

offcpu profile

offcpuprofile

cpu profile

profile001

@Manan007224 Manan007224 force-pushed the schema-fingerprinting-inlineverifier branch from 4399e6d to acf14fa Compare July 8, 2021 04:12
@Manan007224 Manan007224 force-pushed the schema-fingerprinting-inlineverifier branch from aaa1229 to d83ec42 Compare July 8, 2021 07:30
@Manan007224 Manan007224 changed the title continously check databasse schema from inline_verifier schema fingerprint verifier Jul 8, 2021
@Manan007224 Manan007224 force-pushed the schema-fingerprinting-inlineverifier branch from 8192736 to 031bd4c Compare July 8, 2021 12:01
@shuhaowu
Copy link
Contributor

shuhaowu commented Jul 8, 2021

You imported some svg and png files. Is this intended?

@Manan007224
Copy link
Contributor Author

You imported some svg and png files. Is this intended?

Yeah, these files depict the performance graphs for offcpu and and cpu profiles in case anyone wants to take a look. Will delete it once the PR is ready to merge.

@shuhaowu
Copy link
Contributor

shuhaowu commented Jul 8, 2021

You can just paste it directly in either a comment here or the PR description, would make it easier for people to see.

ferry.go Outdated
@@ -686,6 +716,13 @@ func (f *Ferry) Run() {
}()
}

schemaFingerVerifierPrintWg := &sync.WaitGroup{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably just reuse the supportingServicesWg instead of having another one here.

ferry.go Outdated
Comment on lines 950 to 952
var (
binlogVerifyStore *BinlogVerifyStore = nil
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for my own knowledge: what prompted this change? A linting error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems like it. Will change it back.


type SchemaFingerPrintVerifier struct {
SourceDB *sql.DB
TableRewrites map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need DatabaseRewrites as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused as to what the purpose of TableRewrites is. Actually I only DatabaseRewrites.

return schemaFingerPrints, err
}

_, isIgnored := table.IgnoredColumnsForVerification[string(rowData[3].([]byte))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. We ignore certain columns for verification for a very specific reason (see below). We shouldn't ignore it here as we can corrupt data if that column definition changed.

ghostferry/config.go

Lines 688 to 699 in e605f11

// This config is also for inline verification for the same special case of
// Ghostferry as documented with the CompressedColumnsForVerification option:
//
// - If you're copying a table where the data is partially already on the
// the target through some other means.
// - A difference in a particular column could be acceptable.
// - An example would be a table with a data field and a created_at field.
// Maybe the created_at field is not important for data integrity as long
// as the data field is correct.
// - Putting the column in this config will cause the InlineVerifier to skip
// this column for verification.
IgnoredColumnsForVerification ColumnIgnoreConfig


schemaData := [][]interface{}{}
for rows.Next() {
rowData, err := ScanGenericRow(rows, 21)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you write a comment about this magic 21? I think this is the number of columns in information_schema.columns, but if you have a comment here guiding the reader, then the reader doesn't have to go look on their own.

dbSet[table.Schema] = struct{}{}

query := fmt.Sprintf("SELECT * FROM information_schema.columns WHERE table_schema = '%s' ORDER BY table_name, column_name", table.Schema)
rows, err := sf.SourceDB.Query(query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do this for the target DB too, if we want to be generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, we should check both source and target.

}

hash := md5.Sum([]byte(schemaDataInBytes))
schemaFingerPrints[table.Schema] = hex.EncodeToString(hash[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we store a single hash, instead of an array of hash? This map is stored into the state dump. If we have a lot of tables, it can be pretty big. A single hash is most likely good enough and save a lot of space.

ferry.go Outdated
Comment on lines 251 to 254
fingerPrint := map[string]string{}
if f.StateToResumeFrom != nil && f.StateToResumeFrom.SchemaFingerPrint != nil {
fingerPrint = f.StateToResumeFrom.SchemaFingerPrint
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this code gives me a bit of pause. There's currently a race condition I think:

  • Ghostferry starts copying rows
  • within 1 second, the table schema changed
  • The schema finger print verifier reads the schema and thinks the schema is unchanged.

Did I miss something and this situation is not possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for clarity currently the following happens when you run the migration with copydb

Also just to avoid confusion by table schema I mean the information we retrieve from information_schema.columns table from mysql and not the TableSchema struct we use internally in ghostferry.

I am not sure I understand your point about after ghostferry starts copying rows why would the table schema would have changed. Copying rows shouldn't alter the schema structure of the table. Here is what an example output is when we try to retrieve the schema of a table.

mysql> SELECT * FROM information_schema.columns WHERE table_schema = 'sakila' ORDER BY table_name, column_name LIMIT 1\G
*************************** 1. row ***************************
           TABLE_CATALOG: def
            TABLE_SCHEMA: sakila
              TABLE_NAME: actor
             COLUMN_NAME: actor_id
        ORDINAL_POSITION: 1
          COLUMN_DEFAULT: NULL
             IS_NULLABLE: NO
               DATA_TYPE: smallint
CHARACTER_MAXIMUM_LENGTH: NULL
  CHARACTER_OCTET_LENGTH: NULL
       NUMERIC_PRECISION: 5
           NUMERIC_SCALE: 0
      DATETIME_PRECISION: NULL
      CHARACTER_SET_NAME: NULL
          COLLATION_NAME: NULL
             COLUMN_TYPE: smallint unsigned
              COLUMN_KEY: PRI
                   EXTRA: auto_increment
              PRIVILEGES: select,insert,update,references
          COLUMN_COMMENT:
   GENERATION_EXPRESSION:
                  SRS_ID: NULL
1 row in set (0.00 sec)

@Manan007224
Copy link
Contributor Author

You can just paste it directly in either a comment here or the PR description, would make it easier for people to see.

Have posted the CPU profile, but since the off CPU profile (basically flamegraph) is in svg format, I had to attach it as a file.

@shuhaowu
Copy link
Contributor

in svg format, I had to attach it as a file.

What I usually do is either convert it into a rasterized format like png, or take a screenshot of the SVG and paste it in.

fixes

more fixes
@Manan007224 Manan007224 force-pushed the schema-fingerprinting-inlineverifier branch from 7fa52dc to 2afb0c3 Compare July 14, 2021 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants