Skip to content
This repository has been archived by the owner on Dec 31, 2023. It is now read-only.

docs: add samples from tables/automl #54

Merged
merged 45 commits into from Aug 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
057a325
Tables Notebooks [(#2090)](https://github.com/GoogleCloudPlatform/pyt…
sirtorry Apr 8, 2019
6c8a34f
remove the reference to a bug [(#2100)](https://github.com/GoogleClou…
merla18 Apr 8, 2019
ac5f06a
delete this file. [(#2102)](https://github.com/GoogleCloudPlatform/py…
merla18 Apr 8, 2019
dad4ebf
rename file name [(#2103)](https://github.com/GoogleCloudPlatform/pyt…
merla18 Apr 8, 2019
be272ea
trying to fix images [(#2101)](https://github.com/GoogleCloudPlatform…
merla18 Apr 8, 2019
3e1eae6
remove typo in installation [(#2110)](https://github.com/GoogleCloudP…
merla18 Apr 13, 2019
eed69e3
Rename census_income_prediction.ipynb to getting_started_notebook.ipy…
merla18 May 1, 2019
bef66e7
added back missing file package import [(#2150)](https://github.com/G…
merla18 May 20, 2019
d7498ca
added back missing file import [(#2145)](https://github.com/GoogleClo…
merla18 May 20, 2019
4e29670
remove incorrect reference to Iris dataset [(#2203)](https://github.c…
emmby Jun 10, 2019
81f2a34
conversion to jupyter/colab [(#2340)](https://github.com/GoogleCloudP…
merla18 Sep 5, 2019
6d7ec03
updated for the Jupyter support [(#2337)](https://github.com/GoogleCl…
merla18 Sep 5, 2019
482211a
updated readme for support Jupyter [(#2336)](https://github.com/Googl…
merla18 Sep 5, 2019
cbb9685
conversion to jupyer/colab [(#2339)](https://github.com/GoogleCloudPl…
merla18 Sep 5, 2019
aaea837
conversion of notebook for jupyter/Colab [(#2338)](https://github.com…
merla18 Sep 5, 2019
7c23e1d
[BLOCKED] AutoML Tables: Docs samples updated to use new (pending) cl…
lwander Sep 6, 2019
48bc7d2
add product recommendation for automl tables notebook [(#2257)](https…
TheMichaelHu Sep 18, 2019
142261e
AutoML Tables: Notebook samples updated to use new tables client [(#2…
lwander Oct 5, 2019
af31274
fix users bug and emphasize kernal restart [(#2407)](https://github.c…
TheMichaelHu Oct 7, 2019
fe2e911
fix problems with automl docs [(#2501)](https://github.com/GoogleClou…
alefhsousa Nov 19, 2019
47e5801
Fix typo in GCS URI parameter [(#2459)](https://github.com/GoogleClou…
lwander Nov 20, 2019
d0a2d74
fix: fix tables notebook links and bugs [(#2601)](https://github.com/…
sirtorry Dec 12, 2019
aa86fbc
feat(tables): update samples to show explainability [(#2523)](https:/…
sirtorry Dec 18, 2019
59bd0cb
Auto-update dependencies. [(#2005)](https://github.com/GoogleCloudPla…
dpebot Dec 21, 2019
12e24d4
Update dependency google-cloud-automl to v0.10.0 [(#3033)](https://gi…
renovate-bot Mar 6, 2020
b119f72
Simplify noxfile setup. [(#2806)](https://github.com/GoogleCloudPlatf…
kurtisvg Apr 2, 2020
184930a
chore: some lint fixes [(#3750)](https://github.com/GoogleCloudPlatfo…
May 13, 2020
f87fc01
automl: tables code sample clean-up [(#3571)](https://github.com/Goog…
Strykrol May 13, 2020
1224e5e
add example of creating AutoML Tables client with non-default endpoin…
amygdala Jun 5, 2020
a13cdb2
Replace GCLOUD_PROJECT with GOOGLE_CLOUD_PROJECT. [(#4022)](https://g…
kurtisvg Jun 9, 2020
9b4d162
chore(deps): update dependency google-cloud-automl to v1 [(#4127)](ht…
renovate-bot Jun 19, 2020
7bea599
[tables/automl] fix: update the csv file and the dataset name [(#4188…
Jun 26, 2020
a690cba
samples: Automl table batch test [(#4267)](https://github.com/GoogleC…
munkhuushmgl Jul 9, 2020
aa48046
samples: fixed wrong format on GCS input Uri [(#4270)](https://github…
munkhuushmgl Jul 10, 2020
4f6f978
chore(deps): update dependency pytest to v5.4.3 [(#4279)](https://git…
renovate-bot Jul 12, 2020
784d0cc
Update automl_tables_predict.py with batch_predict_bq sample [(#4142)…
evil-shrike Jul 17, 2020
cab6955
Update dependency pytest to v6 [(#4390)](https://github.com/GoogleClo…
renovate-bot Aug 1, 2020
b6a236d
chore: exclude notebooks
busunkim96 Aug 7, 2020
c5720e8
chore: update templates
busunkim96 Aug 7, 2020
c398641
chore: add codeowners and fix tests
busunkim96 Aug 13, 2020
f0362bb
chore: ignore warnings from sphinx
busunkim96 Aug 13, 2020
b83ea49
chore: fix tables client
busunkim96 Aug 13, 2020
d0e251d
Merge branch 'master' into add-tables-samples
busunkim96 Aug 13, 2020
ebb30b0
test: fix unit tests
busunkim96 Aug 13, 2020
d0efc69
Merge branch 'add-tables-samples' of github.com:busunkim96/python-aut…
busunkim96 Aug 13, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
@@ -0,0 +1,8 @@
# Code owners file.
# This file controls who is tagged for review for any given pull request.
#
# For syntax help see:
# https://help.github.com/en/github/creating-cloning-and-archiving-repositories/about-code-owners#codeowners-syntax


/samples/**/*.py @telpirion @sirtorry @googleapis/python-samples-owners
6 changes: 5 additions & 1 deletion google/cloud/automl_v1beta1/tables/tables_client.py
Expand Up @@ -2762,6 +2762,7 @@ def batch_predict(
region=None,
credentials=None,
inputs=None,
params={},
**kwargs
):
"""Makes a batch prediction on a model. This does _not_ require the
Expand Down Expand Up @@ -2828,6 +2829,9 @@ def batch_predict(
The `model` instance you want to predict with . This must be
supplied if `model_display_name` or `model_name` are not
supplied.
params (Optional[dict]):
Additional domain-specific parameters for the predictions,
any string must be up to 25000 characters long.

Returns:
google.api_core.operation.Operation:
Expand Down Expand Up @@ -2886,7 +2890,7 @@ def batch_predict(
)

op = self.prediction_client.batch_predict(
model_name, input_request, output_request, **kwargs
model_name, input_request, output_request, params, **kwargs
)
self.__log_operation_info("Batch predict", op)
return op
306 changes: 306 additions & 0 deletions samples/tables/automl_tables_dataset.py
@@ -0,0 +1,306 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""This application demonstrates how to perform basic operations on dataset
with the Google AutoML Tables API.

For more information, the documentation at
https://cloud.google.com/automl-tables/docs.
"""

import argparse
import os


def create_dataset(project_id, compute_region, dataset_display_name):
"""Create a dataset."""
# [START automl_tables_create_dataset]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Create a dataset with the given display name
dataset = client.create_dataset(dataset_display_name)

# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
print("Dataset metadata:")
print("\t{}".format(dataset.tables_dataset_metadata))
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))

# [END automl_tables_create_dataset]

return dataset


def list_datasets(project_id, compute_region, filter_=None):
"""List all datasets."""
result = []
# [START automl_tables_list_datasets]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# filter_ = 'filter expression here'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# List all the datasets available in the region by applying filter.
response = client.list_datasets(filter_=filter_)

print("List of datasets:")
for dataset in response:
# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
metadata = dataset.tables_dataset_metadata
print(
"Dataset primary table spec id: {}".format(
metadata.primary_table_spec_id
)
)
print(
"Dataset target column spec id: {}".format(
metadata.target_column_spec_id
)
)
print(
"Dataset target column spec id: {}".format(
metadata.target_column_spec_id
)
)
print(
"Dataset weight column spec id: {}".format(
metadata.weight_column_spec_id
)
)
print(
"Dataset ml use column spec id: {}".format(
metadata.ml_use_column_spec_id
)
)
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))
print("\n")

# [END automl_tables_list_datasets]
result.append(dataset)

return result


def get_dataset(project_id, compute_region, dataset_display_name):
"""Get the dataset."""
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Get complete detail of the dataset.
dataset = client.get_dataset(dataset_display_name=dataset_display_name)

# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
print("Dataset metadata:")
print("\t{}".format(dataset.tables_dataset_metadata))
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))

return dataset


def import_data(project_id, compute_region, dataset_display_name, path):
"""Import structured data."""
# [START automl_tables_import_data]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME'
# path = 'gs://path/to/file.csv' or 'bq://project_id.dataset.table_id'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

response = None
if path.startswith("bq"):
response = client.import_data(
dataset_display_name=dataset_display_name, bigquery_input_uri=path
)
else:
# Get the multiple Google Cloud Storage URIs.
input_uris = path.split(",")
response = client.import_data(
dataset_display_name=dataset_display_name,
gcs_input_uris=input_uris,
)

print("Processing import...")
# synchronous check of operation status.
print("Data imported. {}".format(response.result()))

# [END automl_tables_import_data]


def update_dataset(
project_id,
compute_region,
dataset_display_name,
target_column_spec_name=None,
weight_column_spec_name=None,
test_train_column_spec_name=None,
):
"""Update dataset."""
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
# target_column_spec_name = 'TARGET_COLUMN_SPEC_NAME_HERE' or None
# weight_column_spec_name = 'WEIGHT_COLUMN_SPEC_NAME_HERE' or None
# test_train_column_spec_name = 'TEST_TRAIN_COLUMN_SPEC_NAME_HERE' or None

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

if target_column_spec_name is not None:
response = client.set_target_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=target_column_spec_name,
)
print("Target column updated. {}".format(response))
if weight_column_spec_name is not None:
response = client.set_weight_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=weight_column_spec_name,
)
print("Weight column updated. {}".format(response))
if test_train_column_spec_name is not None:
response = client.set_test_train_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=test_train_column_spec_name,
)
print("Test/train column updated. {}".format(response))


def delete_dataset(project_id, compute_region, dataset_display_name):
"""Delete a dataset"""
# [START automl_tables_delete_dataset]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Delete a dataset.
response = client.delete_dataset(dataset_display_name=dataset_display_name)

# synchronous check of operation status.
print("Dataset deleted. {}".format(response.result()))
# [END automl_tables_delete_dataset]


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
subparsers = parser.add_subparsers(dest="command")

create_dataset_parser = subparsers.add_parser(
"create_dataset", help=create_dataset.__doc__
)
create_dataset_parser.add_argument("--dataset_name")

list_datasets_parser = subparsers.add_parser(
"list_datasets", help=list_datasets.__doc__
)
list_datasets_parser.add_argument("--filter_")

get_dataset_parser = subparsers.add_parser(
"get_dataset", help=get_dataset.__doc__
)
get_dataset_parser.add_argument("--dataset_display_name")

import_data_parser = subparsers.add_parser(
"import_data", help=import_data.__doc__
)
import_data_parser.add_argument("--dataset_display_name")
import_data_parser.add_argument("--path")

update_dataset_parser = subparsers.add_parser(
"update_dataset", help=update_dataset.__doc__
)
update_dataset_parser.add_argument("--dataset_display_name")
update_dataset_parser.add_argument("--target_column_spec_name")
update_dataset_parser.add_argument("--weight_column_spec_name")
update_dataset_parser.add_argument("--ml_use_column_spec_name")

delete_dataset_parser = subparsers.add_parser(
"delete_dataset", help=delete_dataset.__doc__
)
delete_dataset_parser.add_argument("--dataset_display_name")

project_id = os.environ["PROJECT_ID"]
compute_region = os.environ["REGION_NAME"]

args = parser.parse_args()
if args.command == "create_dataset":
create_dataset(project_id, compute_region, args.dataset_name)
if args.command == "list_datasets":
list_datasets(project_id, compute_region, args.filter_)
if args.command == "get_dataset":
get_dataset(project_id, compute_region, args.dataset_display_name)
if args.command == "import_data":
import_data(
project_id, compute_region, args.dataset_display_name, args.path
)
if args.command == "update_dataset":
update_dataset(
project_id,
compute_region,
args.dataset_display_name,
args.target_column_spec_name,
args.weight_column_spec_name,
args.ml_use_column_spec_name,
)
if args.command == "delete_dataset":
delete_dataset(project_id, compute_region, args.dataset_display_name)