Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

append to table with DATETIME column with generated schema #450

Open
tswast opened this issue Dec 22, 2021 · 3 comments
Open

append to table with DATETIME column with generated schema #450

tswast opened this issue Dec 22, 2021 · 3 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Collaborator

tswast commented Dec 22, 2021

Steps to reproduce

  1. Create a table with a DATETIME (not TIMESTAMP) column outside of pandas(-gbq).
  2. Attempt to append a dataframe with datetime data to that table.

Code example

import datetime

import google.cloud.bigquery
import pandas
import pandas_gbq


destination = "swast-scratch.my_dataset.pandas_datetime_20211222"
bqclient = google.cloud.bigquery.Client()
table = google.cloud.bigquery.Table(
    destination,
    schema=[
        google.cloud.bigquery.SchemaField("row_num", "INTEGER"),
        google.cloud.bigquery.SchemaField("some_datetime", "DATETIME"),
    ]
)
table.expires = (
    datetime.datetime.now(tz=datetime.timezone.utc)
    + datetime.timedelta(days=1)
)
bqclient.create_table(table, exists_ok=True)

df = pandas.DataFrame(
    {
        "row_num": [1, 2, 3],
        "some_datetime": [
            datetime.datetime(2021, 12, 22, 10, 11, 12),
            datetime.datetime(2011, 11, 11, 10, 11, 12),
            datetime.datetime(2000, 1, 2, 3, 4, 5),
        ]
    }
)
pandas_gbq.to_gbq(df, destination, if_exists="append")

Stack trace

(dev-3.9) ➜  scratch python 2021/12-pandas-gbq-datetime/upload_datetime.py
Traceback (most recent call last):
  File "/Users/swast/src/scratch/2021/12-pandas-gbq-datetime/upload_datetime.py", line 33, in <module>
    pandas_gbq.to_gbq(df, destination, if_exists="append")
  File "/Users/swast/src/github.com/googleapis/python-bigquery-pandas/pandas_gbq/gbq.py", line 1129, in to_gbq
    raise InvalidSchema(
pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
@tswast tswast added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Dec 22, 2021
@tswast tswast self-assigned this Dec 22, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Dec 22, 2021
@tswast
Copy link
Collaborator Author

tswast commented Dec 22, 2021

Currently, this failure happens locally due to generated schema != server-side schema. Even without that client-side check, I think this failure could still happen with api_method="load_csv", as it also uses the generated schema if one is not provided.

@tswast
Copy link
Collaborator Author

tswast commented Dec 22, 2021

Actually, there is a workaround. The user can manually specify a schema if they want DATETIME instead of TIMESTAMP.

...
pandas_gbq.to_gbq(
    df,
    destination,
    if_exists="append",
    table_schema=[
        {"name": "row_num", "type": "INTEGER"},
        {"name": "some_datetime", "type": "DATETIME"},
    ]
)

I might change this to a Feature Request, as I believe this is a known way of dealing with the ambiguity between TIMESTAMP and DATETIME.

@tswast tswast changed the title Cannot append to table with DATETIME column append to table with DATETIME column with generated schema Dec 22, 2021
@tswast tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Dec 22, 2021
@tswast tswast removed their assignment Dec 22, 2021
@jlynchMicron
Copy link

+1 for DateTime support instead of just Timestamp. Currently all my DateTime columns are being uploaded as Timestamps with the to_gbq function unless I specify the column is of type 'DATETIME' with the table_schema argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants