Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigtable/cbt): cbt 'import' cmd to parse a .csv file and write to CBT #5072

Merged
merged 16 commits into from Nov 8, 2021

Conversation

markduffett
Copy link
Contributor

Add 'import' command to 'cbt' and unit/emulator tests

  • parses and imports .csv files only for now
  • should be accompanied/shortly followed by a tutorial + sample.csv
  • csv file formatting assumes a column family row is present unless flag 'column-family' is set, then it will use that family for all columns
  • configurable batch-size(500 rows) and worker threads(1) defaults

Runtime expectations:
2MB file with 12588 rows

  • 1s with 20 workers, 5s with the default 1

190MB file in 4s @ 47.5MB/s when using 'batch-size=5' 'workers=20'

  • 50%+ of 1 Node/tabletservers' CPU
  • 1288 rows, 1000 cols, 5kB data populated in 3% of cells
  • 200MB/s+ depending on mutation size and number of CBT nodes

@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 3, 2021
@crwilcox crwilcox changed the title cbt 'import' cmd to parse a .csv file and write to CBT feat(bigtable/cbt): cbt 'import' cmd to parse a .csv file and write to CBT Nov 3, 2021
@crwilcox crwilcox added the kokoro:run Add this label to force Kokoro to re-run the tests. label Nov 3, 2021
@kokoro-team kokoro-team removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Nov 3, 2021
@markduffett markduffett force-pushed the csv-to-cbt-importer branch 4 times, most recently from 9d8f2d3 to f767730 Compare November 3, 2021 17:39
@crwilcox crwilcox added the kokoro:run Add this label to force Kokoro to re-run the tests. label Nov 3, 2021
@kokoro-team kokoro-team removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Nov 3, 2021
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the Bigtable API. label Nov 4, 2021
@crwilcox
Copy link
Contributor

crwilcox commented Nov 4, 2021

Looks like another linter error showed up.

bigtable/cmd/cbt/cbt_test.go:384:10: if block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary)

Also, if you'd like, the script that is running is /internal/kokoro/vet.sh. You can get pretty close by running golint ./..., gofmt, goimports, though there are a few linter instances where we ignore failures (linting was added later)

bigtable/cmd/cbt/cbt.go Show resolved Hide resolved
bigtable/cmd/cbt/cbt.go Outdated Show resolved Hide resolved
bigtable/cmd/cbt/cbt_test.go Outdated Show resolved Hide resolved
bigtable/cmd/cbt/cbt_test.go Outdated Show resolved Hide resolved
bigtable/cmd/cbt/cbt_test.go Show resolved Hide resolved
Comment on lines 279 to 288
if !tc.fail && err != nil {
t.Errorf("parseCsvHeaders() failed. input:%+v, error:%s", tc, err)
continue
}
if tc.fail && err == nil {
t.Errorf("parseImportArgs() did not fail. input:%+v, error:%s", tc, err)
continue
}
if tc.fail {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think more typically the tests just catch this via an OR or two separate conditionals, collapsing this a bit? I may be missing nuance here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case would allow continuation since the fail is expected, but we don't want to do any comparisons below. It can't be paired with any of the above because they add errors were as this is expected.

I can put the if err == nil {t.Error} inside the tc.fail block, just wasn't sure if go-style preferred the nested ifs.

if tc.fail {
    if err == nil {
	t.Errorf("parseImportArgs() did not fail. input:%+v, error:%s", tc, err)
    }
    continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this to me seems like we are trying to check if fail end err match. I don't see why they would have different messages. Also one of these has 'parseImportArgs' which I assume is leftover from a copy paste?

Are we in essence checking this

if tc.fail != (err != nil)

Alternatively, rather than storing tc.fail as a bool, store the err string to do a deeper check: possibly kind of like https://github.com/googleapis/google-cloud-go/blob/master/datastore/datastore_test.go#L1987

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to error matching pattern.

@crwilcox crwilcox merged commit 5a2ed6b into googleapis:master Nov 8, 2021
telpirion pushed a commit that referenced this pull request Jan 10, 2022
…o CBT (#5072)

Co-authored-by: Christopher Wilcox <crwilcox@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants