Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery load_table_from_dataframe from string type will show null values #1737

Closed
superbeer opened this issue Nov 25, 2023 · 3 comments
Closed
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@superbeer
Copy link

superbeer commented Nov 25, 2023

Environment details

  • OS type and version: Mac os
  • Python version: python --version Python 3.11.6
  • pip version: pip --version pip 23.2.1

Name: google-cloud-bigquery
Version: 3.13.0

Steps to reproduce

Code example

from google.cloud import bigquery
import pandas as pd

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
table_id = "{project_id}.{dataset_id}.{table_id}"


df=pd.read_csv("data.csv")
print (df.dtypes)
df= df.astype(str) 
print (df.dtypes)
print(df)


job_config = bigquery.LoadJobConfig(

    write_disposition="WRITE_TRUNCATE",
)


job = client.load_table_from_dataframe(
    df, table_id, job_config=job_config
)  # Make an API request.
job.result()  # Wait for the job to complete.

Stack trace

data.csv

# example
col_a,col_b,col_c
a,,
aa,bb,cc
aaa,,ccc

it is show

 col_a col_b col_c
0     a   nan   nan
1    aa    bb    cc
2   aaa   nan   ccc

on bq show

 col_a col_b col_c
a   nan   nan
aa    bb    cc
aaa   nan   ccc

i think data on bq will show

 col_a col_b col_c
a   null   null
aa    bb    cc
aaa   null   ccc

ps if data is dtypes object and value NaN . it is work

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 25, 2023
@Linchin
Copy link
Contributor

Linchin commented Feb 2, 2024

Hi @superbeer, thank you for raising the issue. Could you clarify which value you are expecting - NaN or null?

@superbeer
Copy link
Author

Hi @superbeer, thank you for raising the issue. Could you clarify which value you are expecting - NaN or null?

in bigquery expecting null value

@Linchin Linchin self-assigned this May 7, 2024
@Linchin Linchin added type: question Request for information or clarification. Not an issue. priority: p3 Desirable enhancement or fix. May not be included in next release. labels May 7, 2024
@Linchin
Copy link
Contributor

Linchin commented May 7, 2024

I think this is a problem with pandas.read_csv(). The dataframe loaded by pandas is already using nan for null strings, so BigQuery is working as intended. It seems that many people have the same problem with loading null strings from csv, and adding a parameter can resolve the problem, like this: pandas.read_csv("data.csv", keep_default_na=False). (see stack overflow)

I'm closing the issue, but feel free to leave a comment or open a new issue, if you have any further questions. :)

@Linchin Linchin closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

2 participants