Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the emulator work with Pandas GBQ? #288

Open
jitendrawbd opened this issue Apr 3, 2024 · 3 comments
Open

Does the emulator work with Pandas GBQ? #288

jitendrawbd opened this issue Apr 3, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@jitendrawbd
Copy link

jitendrawbd commented Apr 3, 2024

What happened?

My python code uses pandas gbq's to_gbq function to write to the big query table. It works as expected when running the code. But when I use bigquery emulator in unit test, it throws the below error

GenericGBQException("Reason: {0}".format(ex)) from ex
pandas_gbq.exceptions.GenericGBQException: Reason: 400 POST http://localhost:9050/bigquery/v2/projects/local-project/jobs?prettyPrint=false: unspecified job configuration query

What did you expect to happen?

I expected the to_gbq function to write the data to the bigquery emulator table

How can we reproduce it (as minimally and precisely as possible)?

I am using the function as below

from pandas_gbq import to_gbq
to_gbq(df, destination_table=f"{dataset_id}.{table_name}", project_id=project_id, if_exists='append')

Created the dataset & table for the bigquery emulator. Testing the code in the unit test, but getting the below error

GenericGBQException("Reason: {0}".format(ex)) from ex
pandas_gbq.exceptions.GenericGBQException: Reason: 400 POST http://localhost:9050/bigquery/v2/projects/local-project/jobs?prettyPrint=false: unspecified job configuration query

Anything else we need to know?

No response

@jitendrawbd jitendrawbd added the bug Something isn't working label Apr 3, 2024
@jitendrawbd
Copy link
Author

Using even the simplest of loading doesn't seem to work

def testapi():
    df = pd.DataFrame(
        {
            "my_string": ["a", "b", "c"],
            "my_int64": [1, 2, 3],
            "my_float64": [4.0, 5.0, 6.0],
            "my_bool1": [True, False, True],
            "my_bool2": [False, True, False]
        }
    )
    to_gbq(df, 'test_dataset.test_table', project_id='gcp-cap-dsml-core-dev')

Testing the above with bigquery emulator results in error

GenericGBQException("Reason: {0}".format(ex)) from ex
pandas_gbq.exceptions.GenericGBQException: Reason: 400 POST http://localhost:9050/bigquery/v2/projects/local-project/jobs?prettyPrint=false: unspecified job configuration query

@jitendrawbd jitendrawbd changed the title Does this work with Pandas GBQ? Does the emulator work with Pandas GBQ? Apr 3, 2024
@ohaibbq
Copy link
Contributor

ohaibbq commented Apr 3, 2024

It looks like the jobs insert handler currently only handles query jobs, import from GCS, and extract to GCS jobs:
https://github.com/goccy/bigquery-emulator/blob/main/server/handler.go#L1372-L1391

to_gbq uses the BigQueryClient.load_table_from_dataframe method which POSTs a CSV / Parquet file to the API.

In our project, we use google_cloud.bigquery.Client.insert_rows to populate tables.

@jitendrawbd
Copy link
Author

jitendrawbd commented Apr 4, 2024

Ah, got it. Any plans to incorporate BigQueryClient.load_table_from_dataframe function in the future? For now, I will look for some workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants