New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery): Add support for ML model export #7451
Conversation
* Add model support to Project#extract and #extract_job * Add ExtractJob#model? * Add ExtractJob#ml_tf_saved_model? * Add ExtractJob#ml_xgboost_booster? * Add Model#extract and #extract_job closes: googleapis#7061
3e1312d
to
eabcbfe
Compare
93961be
to
de92305
Compare
Tempfile.open "temp_extract_model" do |tmp| | ||
extract_url = "gs://#{bucket.name}/#{model_id}" | ||
|
||
# sut |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just shorthand for the method being tested
# | ||
# @return [Boolean] `true` when `CSV`, `false` otherwise. | ||
# | ||
def csv? | ||
val = @gapi.configuration.extract.destination_format | ||
return true if val.nil? | ||
return true if table? && val.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make explicit that false
is returned when extracting models? (i.e. documenting it, and maybe the code could say return false unless table?
instead which would be more clear than this which assumes that val is nil
when extracting models).
Similar comment for the similar methods surrounding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -253,7 +319,7 @@ def location= value | |||
end | |||
|
|||
## | |||
# Sets the compression type. | |||
# Sets the compression type. Not applicable when extracting models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it cause any problem if set when extracting models? (Should we just leave it be, or throw an exception?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our usual policy is to pass it through to the backend service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a good example of letting the backend sort it out, as they may add support for this down the line.
## | ||
# Exports the model to a Google Cloud Storage file using | ||
# an asynchronous method. In this method, an {ExtractJob} is immediately | ||
# returned. The caller may poll the service by repeatedly calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe: Exports the model to a Google CloudStorage file asynchronously, immediately returning an {ExtractJob} that can be used to track the progress of the export job. The caller may poll the service...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much improved, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
# The geographic location for the job ("US", "EU", etc.) can be set via | ||
# {ExtractJob::Updater#location=} in a block passed to this method. If | ||
# the model is a full resource representation (see {#resource_full?}), | ||
# the location of the job will be automatically set to the location of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ...will automatically be set...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
# be used. | ||
# @param [Hash] labels A hash of user-provided labels associated with | ||
# the job. You can use these to organize and group your jobs. Label | ||
# keys and values can be no longer than 63 characters, can only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ...63 characters, and can only...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
# @param [Hash] labels A hash of user-provided labels associated with | ||
# the job. You can use these to organize and group your jobs. Label | ||
# keys and values can be no longer than 63 characters, can only | ||
# contain lowercase letters, numeric characters, underscores and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other param descriptions you include letters (a-z)
, etc. examples. Include them here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to copy in the entirety of the latest version of the label requirements from https://cloud.google.com/bigquery/docs/labels-intro#requirements
# the job. You can use these to organize and group your jobs. Label | ||
# keys and values can be no longer than 63 characters, can only | ||
# contain lowercase letters, numeric characters, underscores and | ||
# dashes. International characters are allowed. Label values are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does international characters mean? (e.g. does this mean only "lower case" international characters? Does that exclude alphabets that do not have the concept of upper and lower case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I'll see if the official documentation offers anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out the link below is outdated and now goes to the wrong page. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to copy in the entirety of the latest version of the label requirements from https://cloud.google.com/bigquery/docs/labels-intro#requirements, and omit the link.
# labels](https://cloud.google.com/bigquery/docs/creating-managing-labels#requirements). | ||
# @param [Boolean] dryrun If set, don't actually run this job. Behavior | ||
# is undefined however for non-query jobs and may result in an error. | ||
# Deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is a new one being added. Given that, can we just omit this argument rather than creating it for the first time as already deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@shollyman PTAL, I believe this PR is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this all together. The testing scenarios have great coverage, checking for the the TF saved model etc.
I feel somehow guilty for how the level of repetition this request induced. So many rewrites of the label validation rule comments. :)
@@ -253,7 +319,7 @@ def location= value | |||
end | |||
|
|||
## | |||
# Sets the compression type. | |||
# Sets the compression type. Not applicable when extracting models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a good example of letting the backend sort it out, as they may add support for this down the line.
closes: #7061
Project#extract
andProject#extract_job
ExtractJob#model?
ExtractJob#table?
ExtractJob#ml_tf_saved_model?
ExtractJob#ml_xgboost_booster?
Model#extract
Model#extract_job