load_table_from_dataframe -- support alternative serialization formats #383

tswast · 2020-11-11T16:27:19Z

Currently, pandas-gbq doesn't use load_table_from_dataframe because it is using CSV for serialization, which has better support for TIME, DATE, DATETIME than Parquet. See: #56, #382

Struct / Array support could also be improved (though this is partially due to problems in the pyarrow library). #19

I propose supporting the CSV serialization type.

The text was updated successfully, but these errors were encountered:

tswast · 2020-11-17T17:39:34Z

To go from DataFrame to CSV, we can use the same tempfile logic as we have for parquet, but instead use to_csv + a few parameters to ensure proper data types:

https://github.com/pydata/pandas-gbq/blob/ac2d2fe4ac0025109f8df3723e3f03a337face94/pandas_gbq/load.py#L16-L30

tswast · 2020-11-17T17:41:23Z

This would also require adding CSV to the allowed serialization formats here:

python-bigquery/google/cloud/bigquery/client.py

Lines 2201 to 2206 in 809e4a2

    
           if job_config.source_format != job.SourceFormat.PARQUET: 
        
               raise ValueError( 
        
                   "Got unexpected source_format: '{}'. Currently, only PARQUET is supported".format( 
        
                       job_config.source_format 
        
                   ) 
        
               )

cguardia · 2020-11-24T09:45:13Z

@tswast I have the initial work for this in #399. Looking for comments about which are the best parameters to use in the to_csv call, and whether this is going in the right direction.

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 11, 2020

tswast self-assigned this Nov 11, 2020

tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Nov 11, 2020

tswast assigned cguardia and unassigned tswast Nov 17, 2020

cguardia mentioned this issue Nov 24, 2020

feat: support CSV format in load_table_from_dataframe pandas connector #399

Merged

3 tasks

tswast closed this as completed in #399 Dec 7, 2020

tswast mentioned this issue Dec 17, 2020

refactor to use more logic from google-cloud-bigquery googleapis/python-bigquery-pandas#339

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_table_from_dataframe -- support alternative serialization formats #383

load_table_from_dataframe -- support alternative serialization formats #383

tswast commented Nov 11, 2020 •

edited

tswast commented Nov 17, 2020

tswast commented Nov 17, 2020

cguardia commented Nov 24, 2020

load_table_from_dataframe -- support alternative serialization formats #383

load_table_from_dataframe -- support alternative serialization formats #383

Comments

tswast commented Nov 11, 2020 • edited

tswast commented Nov 17, 2020

tswast commented Nov 17, 2020

cguardia commented Nov 24, 2020

tswast commented Nov 11, 2020 •

edited