Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrowTypeError: Expected a string or bytes dtype, got uint8 when running to_gbq with uint8 #616

Open
wnojopra opened this issue Mar 2, 2023 · 4 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@wnojopra
Copy link

wnojopra commented Mar 2, 2023

Environment details

  • OS type and version: Ubuntu 20.04.3 LTS
  • Python version: 3.7.12
  • pip version: 22.3.1
  • pandas-gbq version: 0.17.9

Steps to reproduce

  1. Create a dataframe that has a column of dtype uint8 (the default type that gets output by pandas.get_dummies, for example)
  2. Execute to_gbq on that dataframe and notice ArrowTypeError: Expected a string or bytes dtype, got uint8

Code example

my_df = pd.DataFrame({'col': [0, 1]}, dtype="uint8")
my_df.to_gbq(FULL_BQ_NAME, project_id=GOOGLE_PROJECT, if_exists = 'replace')

Stack trace

/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py in bq_to_arrow_array(series, bq_field)
    288     if field_type_upper in schema._STRUCT_TYPES:
    289         return pyarrow.StructArray.from_pandas(series, type=arrow_type)
--> 290     return pyarrow.Array.from_pandas(series, type=arrow_type)
    291 
    292 

/opt/conda/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.from_pandas()

/opt/conda/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.array()

/opt/conda/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array()

/opt/conda/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowTypeError: Expected a string or bytes dtype, got uint8
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Mar 2, 2023
@tswast
Copy link
Collaborator

tswast commented Mar 28, 2023

Thanks for the report! Thankfully uint8 fits inside int64, so it seems we should be using BigQuery INT64 columns for these types.

@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Mar 28, 2023
@anujsh61
Copy link

similar kind of issue Expected bytes, got a 'int' object

@tswast
Copy link
Collaborator

tswast commented Nov 20, 2023

@anujsh61 Can you confirm if you're creating a new table or writing to one that already exists?

@tswast
Copy link
Collaborator

tswast commented Nov 20, 2023

I think the fix for this issue needs to happen here:

I suspect uint8 is hitting out "string" fallback dtype.

Aside: I see we always are hitting the "table already exists" case in the google-cloud-bigquery library. Now that we're using BQ Load jobs, I think we can try removing all of our type inference logic from this library as well as the following logic to solve this issue:

table_ref = TableReference(
DatasetReference(self.project_id, self.dataset_id), table_id
)
table = Table(table_ref)
table.schema = pandas_gbq.schema.to_google_cloud_bigquery(schema)
try:
self.client.create_table(table)
except self.http_error as ex:
self.process_http_error(ex)

and

try:
# Try to get the table
table = bqclient.get_table(destination_table_ref)
except google_exceptions.NotFound:
# If the table doesn't already exist, create it
table_connector = _Table(
project_id_table,
dataset_id,
location=location,
credentials=connector.credentials,
)
table_connector.create(table_id, table_schema)
else:
if if_exists == "append":
# Convert original schema (the schema that already exists) to pandas-gbq API format
original_schema = pandas_gbq.schema.to_pandas_gbq(table.schema)
# Update the local `table_schema` so mode (NULLABLE/REQUIRED)
# matches. See: https://github.com/pydata/pandas-gbq/issues/315
table_schema = pandas_gbq.schema.update_schema(
table_schema, original_schema
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants