Skip to content

Latest commit

 

History

History

sql-pdf-python

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

SQL over Docs

The goal of this demo is to run a BigQuery SQL and extract information from documents.

Requirements

  • Ensure the GCP user is allowed to create service accounts and assign roles
  • BQ object tables need to be enabled (as of 12/5/2022 they were in private preview), if you do not have access to enable object tables manually create and load the BQ tables

Architecture

SQL Doc

Setting up the demo

1) In Cloud Shell or other environment where you have the gcloud SDK installed, execute the following commands:

gcloud components update
cd $HOME

git clone https://github.com/GoogleCloudPlatform/document-ai-samples.git
cd ~/document-ai-samples/sql-pdf-python
chmod +x *.sh

2) Edit config.sh - In your editor of choice update the variables in config.sh to reflect your desired gcp project.

3) Next execute the command below

sh setup_sa.sh

4) Next execute the command below

sh deploy_cf.sh

If the shell script has executed successfully,have a dataset docai and and a BQ object table repos should be created under your project in BigQuery along with a function doc_extractor

Note: Your script will fail in creation of the BQ table project is not enabled to use object tables. Then you need to manually create the table and load pointers to the PDFs in GCS



Congrats! You just executed BigQuery SQL over Documents