Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default values not initialized when job loading data with the Python API #1495

Open
blieusong opened this issue Feb 11, 2023 · 4 comments
Open
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@blieusong
Copy link

When issuing a job with load_table_from_json, fields in the Schema with a default values are not populated.

I use the following table for testing purposes:

schema = [
    bigquery.SchemaField("client_id", "INTEGER"),
    bigquery.SchemaField(
        "create_date", "DATETIME", default_value_expression="CURRENT_DATETIME()"
    ),
    bigquery.SchemaField("number", "INTEGER", default_value_expression=2, mode="NULLABLE"),
]

Note that I have tried with mode="REQUIRED" but that the loading only fails in such case.

I have also tried creating the table before launching the job and to let the Python load job handle it: none of that worked.

I don't have this issue when streaming instead of using jobs.

Environment details

  • macOS 10.15.7
  • Python version: 3.11.1
  • pip version: 23.0
  • google-cloud-bigquery version: 3.5.0

Steps to reproduce

  1. Run code below
  2. Query the table. Columns create_date and number are empty

Code example

from google.cloud import bigquery

client = bigquery.Client()

schema = [
    bigquery.SchemaField("client_id", "INTEGER"),
    bigquery.SchemaField(
        "create_date", "DATETIME", default_value_expression="CURRENT_DATETIME()"
    ),
    bigquery.SchemaField("number", "INTEGER", default_value_expression=2, mode="NULLABLE"),
]

lines = [
    {"client_id": 1},
    {"client_id": 2},
    {"client_id": 3},
    {"client_id": 4},
]

job_config = bigquery.LoadJobConfig()
job_config.schema = schema
job_config.create_disposition = "CREATE_IF_NEEDED"
job_config.write_disposition = "WRITE_TRUNCATE"

table = "whatever_table"

try:
    load_job = client.load_table_from_json(
        lines,
        destination=table,
        location="europe-west2",
        job_config=job_config,
    )
    print("Load job: %s [%s]" % (load_job.job_id, table))
except Exception as e:
    print("Oops: %s" % (e))

load_job.result()

Stack trace

None. Insertion happens but without the default columns populated.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 11, 2023
@chalmerlowe
Copy link
Contributor

I attempted to run this with google-cloud-bigquery=3.4.0 and 3.10.0.
In both cases, I was able to replicate the error on my system.
Thanks for a detailed code example to work from.

@meredithslota meredithslota added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jun 3, 2023
@meredithslota meredithslota added priority: p3 Desirable enhancement or fix. May not be included in next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jun 12, 2023
@H-Max
Copy link

H-Max commented Oct 6, 2023

Encountered the issue too, and this goes against what the documentation says here.

Tried to unsert a few lines with a INSERT INTO statement and the default value was correctly used.

On the other hand, trying the same thing with a LOAD DATA sql statement does not work either (it leaves the missing field from the JSON file to null and not set to the default value)

@leonardomira
Copy link

leonardomira commented Nov 21, 2023

Facing the same issue using parquet files from cloud storage and load_table_from_uri function.

Can provide a code sample, if needed


EDIT:
In our case, to make this scenario work, we created a table schema and explicitly called for table creation.
After it, we created a new schema WITHOUT columns with default value, and passed that schema for load job.

Code snippet

        table_schema = [
                bigquery.SchemaField("store_id", "INTEGER"),
                bigquery.SchemaField("seller_id", "INTEGER"),
                bigquery.SchemaField("rec_cre_tms", "TIMESTAMP", default_value_expression="CURRENT_TIMESTAMP()"),
            ]
        table_id = f'{project_id}.{dataset}.{table_name}'
        table = bigquery.Table(table_id, schema=table_schema)
        bigquery_client.create_table(table)
...
        job_schema = [
                bigquery.SchemaField("store_id", "INTEGER"),
                bigquery.SchemaField("seller_id", "INTEGER"),
            ]

        job_config = bigquery.LoadJobConfig(
            source_format=bigquery.SourceFormat.PARQUET,
            write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
            schema=job_schema
        )

@BenK-RF
Copy link

BenK-RF commented Mar 14, 2024

same issue here!
any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants