Skip to content

Repo for cloud image processing workflow codelab (uses Google Drive, Cloud Storage, Cloud Vision, Sheets)

License

Notifications You must be signed in to change notification settings

googlecodelabs/analyze_gsimg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloud image processing workflow:

Image archive, analysis, and report generation with Google Workspace (formerly G Suite) & GCP

In the intermediate codelab tutorial, developers build a cloud-based image processing workflow in Python along with Google Cloud REST APIs from GCP and Google Workspace (formerly G Suite). The exercise imagines an enterprise scenario where an organization can backup data (image files, for example) to the cloud, analyze them with machine learning, and report results formatted for consumption by management. This repo provides code solutions for each step through the tutorial plus alternate versions featuring other libraries and/or authorization schemes.

This is an intermediate codelab. If you're new to using Google APIs, specifically Google Workspace (formerly G Suite) and/or GCP APIs, we recommend completing the introductory codelabs (listed at the bottom of this page) first. You can read more about this code sample and codelab in this Google Developers blog post or this equivalent post on the Google Cloud blog.

Prerequisites

  • A Google account (Google Workspace/G Suite accounts may require administrator approval)
  • A Google Cloud Platform project with an active billing account
  • Familiarity with operating system terminal/shell commands
  • Basic skills in Python 2 or 3 (other languages supported)
  • Experience using Google APIs may be helpful but not required

NOTE for GCP developers: The codelab does not use GCP product client libraries nor service account authorization — instead it uses the lower-level platform client libraries (because non-Cloud APIs don't have product libraries yet) and user account authorization (because the target file starts out in Google Drive). However, solutions featuring GCP product client libraries as well as service accounts are available as alternatives in the alt folder.

Description

The primary objective is to analyze Google Workspace images... everything else (archiving, report generation) is a bonus. It starts with the image file on Google Drive, archives it to Google Cloud Storage, analyzes it with Cloud Vision, and writes a "results" row into a Google Sheet. Each step of the tutorial builds successively on the previous step, adding one feature at a time. Each of the step* directories represent the state the application should be in upon successful completion of that corresponding step in the codelab, culminating with a refactor step to arrive at the final version.

  1. Download image from Google Drive The first step utilizes the Google Drive API to search for the image file and downloads the first match. Along with the filename and binary payload, the file's MIMEtype, last modification timestamp, and size in bytes are also returned.

  2. Backup image to Google Cloud Storage The next step is to upload the image as a "blob" object to Google Cloud Storage (GCS), performing an "insert" to the given bucket. Once data is in GCS, it can then be used by other GCP tools. GCS also supports cheaper, "colder" storage, meaning the less often you access objects, the lower the cost, as described on the storage class page. NOTE: "/" in GCS filenames is merely a visual cue as GCS doesn't support "folders." Our solution features an optional PARENT folder to help organize images in the destination bucket. (The GCP client libraries prep the data for GCS, so we need the platform client library MediaIoBaseUpload convenience object to help with the upload using the platform library.)

  3. Send image to Cloud Vision for analysis Since we have the image binary data, let's also send it to Cloud Vision for analysis. Using its API, request object detection/identification (called label annotation), but ask only for the top 5 labels for a faster response. Each label returned includes a confidence score the label applies to the image.

  4. Add results to Google Sheets The last new feature is report generation: add a spreadsheet row to visualize results via the Google Sheets API. The row includes the Cloud Vision output and the file's GCS archive hyperlinked location.

  5. *Refactor The final, yet optional, step involves refactoring with best practices: move the "main" body into a separate function and supporting command-line options to provide user flexibility.

Authorization scheme and alternative versions

We've selected to use user account authorization (instead of service account authorization), platform client libraries (instead of product client libraries since those aren't available for Google Workspace (formerly G Suite) APIs), and older auth libraries for readability, consistency, greater Python 2-3 compatibility, and automated OAuth2 token management. This provides what we hope is the least complex user experience. Alternative versions (of the final application) using service accounts, product client libraries, and newer currently-supported auth libraries, are found in the alt subdirectory. See its README for more information.

Summary and further study

The goal of the codelab sample app is to help developers envision possible business scenarios. A secondary goal is showing how to use GCP and Google Workspace (formerly G Suite) APIs together for one solution. Problems with either the codelab or code in this repo? File an issue (do a search first).

References