Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide json.dumps kwargs to load_table_from_json #1564

Open
fingoldo opened this issue May 4, 2023 · 3 comments · May be fixed by #1634
Open

Provide json.dumps kwargs to load_table_from_json #1564

fingoldo opened this issue May 4, 2023 · 3 comments · May be fixed by #1634
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@fingoldo
Copy link

fingoldo commented May 4, 2023

Hi, we are currently facing problems submitting json objects with a date field in them. It would make sense to provide custom json.dumps kwargs to load_table_from_json in python-bigquery/google/cloud/bigquery/client.py. We could then provide default parameter to the json.dumps within load_table_from_json.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label May 4, 2023
@chalmerlowe
Copy link
Contributor

I am not sure I am following what the ask is in this issue.
Providing a simple concrete example and/or code snippet would be useful.

Please elaborate on

  • what the problem is
  • what the result currently is
  • what the desired result would be/would look like

@HCA97
Copy link

HCA97 commented Jul 31, 2023

The problem is when passing a dictionary object containing a datetime object to json.dumps, which leads to failure in the load_table_from_json function.

A viable workaround is to convert each datetime object to isoformat strings, allowing BigQuery to correctly interpret them as date types. However, this solution may not be the most elegant one.

To avoid this problem altogether, we can provide additional arguments to json.dumps as demonstrated in this Stack Overflow post.

Below is a basic example illustrating the issue and the successful workaround:

import datetime as dt
from google.cloud.bigquery import Client, LoadJobConfig, SchemaField

project_id = "PROJECT_ID"
client = Client(project=project_id)

table_schema = [SchemaField("datetime", "DATETIME"), SchemaField('date', "DATE")]

job_config = LoadJobConfig(write_disposition="WRITE_APPEND", schema=table_schema)

table_id = f"{project_id}.test.test"

# Successful insertion using isoformat strings
data = [
    {
        "datetime": dt.datetime(2012, 11, 1, 12, 9, 10).isoformat(),  # Using isoformat() to convert datetime to string
        "date": dt.date(2020, 11, 1).isoformat(),  # Using isoformat() to convert date to string
    }
]

job = client.load_table_from_json(data, table_id, job_config=job_config)
job.result()

However, if we simply pass the datetime object without converting to isoformat strings:

data = [
    {
        "datetime": dt.datetime(2012, 11, 1, 12, 9, 10),  # datetime object without isoformat conversion
        "date": dt.date(2020, 11, 1),  # date object
    }
]

job = client.load_table_from_json(data, table_id, job_config=job_config)
job.result()

We get the following error:

Traceback (most recent call last):
  File "/example.py", line 23, in <module>
    job = client.load_table_from_json(data, table_id, job_config=job_config)
  File "/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 2845, in load_table_from_json
    data_str = "\n".join(json.dumps(item, ensure_ascii=False) for item in json_rows)
  File "lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 2845, in <genexpr>
    data_str = "\n".join(json.dumps(item, ensure_ascii=False) for item in json_rows)
  File "lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

Setup:

  • Python 3.9.17
  • google-cloud-bigquery=3.11.3
  • Ubuntu 22.04.2 LTS

@Linchin
Copy link
Contributor

Linchin commented Apr 9, 2024

Thanks @HCA97, if you are interested in loading json with non-string data types such as datetime, Client.insert_rows() and Client.insert_rows_json() are capable of doing it, utilizing internal conversion functions that ensure the strings have correct formats.

As to adding json.dumps kwargs, I think a better option is to use the same group of conversion functions in load_table_from_json() too. A major limitation is that with insert_rows(), the table already exists so there is schema; but with load_table_from_json() there's a chance that the table doesn't exist yet, and we may not even have schema (when autodetect=True). Along with other possibilities with load_table_from_json()- table exists or not, schema provided or not, autodetect or not, etc. - this addition will make the logic of this method even more complicated than it already is. So I think it would be better to restrict load_table_from_json() to JSON-serializable formats only.

If insert_rows()'s quota isn't sufficient for you, a workaround for now is to use Dataframe.from_dict() and then load_table_from_dataframe(). It may even be a good idea to add load_table_from_dict() in the future.

@Linchin Linchin added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants