Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provided Schema does not match Table #75

Open
ThinkUser opened this issue Jun 6, 2021 · 11 comments
Open

Provided Schema does not match Table #75

ThinkUser opened this issue Jun 6, 2021 · 11 comments

Comments

@ThinkUser
Copy link

Hey

I am trying to download app data from bigquery and getting the above error.
Firebase is connected to bq and I have "events_" table there.

I am trying to download the data using this function
bqr_partition("events_2021", "ga_partition", my_project, my_dataset)

But I am getting this error and I dont understand how to solve it

Job had error:

Provided Schema does not match Table xxxx-native-app-33f08:analytics_153017681.ga_partition$20210601. Cannot add fields (field: event_params)
[1] "Provided Schema does not match Table xxxx-native-app-33f08:analytics_153017681.ga_partition$20210601. Cannot add fields (field: event_params)"

Do I need to specify the schema for this function?
I tried narrowing the function even more
bqr_partition("events_202106", "ga_partition", my_project, my_dataset)
Still getting the same error

Thanks

@MarkEdmondson1234
Copy link
Collaborator

bqr_partition doesn't download data, it creates a partitioned table. Do you need bqr_query() ?

@ThinkUser
Copy link
Author

Hey
Thanks

I have my app connected to bigquery and I want to download all events data to R and manipulate the data there.
I know that the data is saved in a daily events table and I want to get the data for all of them so I though I need the bqr_partition to combine them to one :-)

I need to use bqr_query() to download all events_YYYYMMDD table?

something like this?
bqr_query(my_project,my_dataset, "SELECT * FROM [native-app-33f08:analytics_153017681.events_]" ) ?

or do I need to change the events_ on a daily basis?
bqr_query(my_project,my_dataset, "SELECT * FROM [native-app-33f08:analytics_153017681.events_20210606]" )
bqr_query(my_project,my_dataset, "SELECT * FROM [native-app-33f08:analytics_153017681.events_20210607]" )

etc...

Thank you so much !

@MarkEdmondson1234
Copy link
Collaborator

Yes, I would try out the queries in the web console first then something like the examples above will work. If its a lot of data you may also want to do it asynchronously via bq_query_async().

You could also use googleAnalyticsR and use the Data API to download the events which is free.

@ThinkUser
Copy link
Author

Thank you.
It worked in the console and in R, now I just need to make it go over all the events_YYYY

Not sure what you meant by the Data API, this account is not an app + web (GA4)
Its firebase account linked to BQ.
How can I get the events data using this API?

Tried searching for it using those functions and its not there

ga4 <- ga_account_list("ga4")
data_accounts <- ga_account_list("data")

@MarkEdmondson1234
Copy link
Collaborator

Ah ok, fair enough then. I suggest using standardSQL (e.g. useLegacySql=FALSE in the function) and then you can use wildcards for querying many tables at once - FROM dataset.table_*

@ThinkUser
Copy link
Author

ThinkUser commented Jun 7, 2021

Thanks for the great help and feedback

job <- bqr_query_asynch(projectId = my_project, 
                        datasetId = my_dataset,
                        "SELECT * FROM `native-app-33f08.analytics_153017681.events_*`",
                        destinationTableId = "bigResultTable",
                        useLegacySql = FALSE
                       # ,writeDisposition = c("WRITE_EMPTY","WRITE_TRUNCATE", "WRITE_APPEND")
                        )

This Works fine, getting the data in BQ in a new table.
But trying to extract the data to google cloud is the next step and then get it from there to R back?

job_extract <- bqr_extract_data(projectId = my_project, 
                                datasetId = my_dataset,
                                tableId = "bigResultTable",
                                cloudStorageBucket = "bigquery_r",
                                filename = paste0("big-query-extract-", gsub(" |:|-", "", Sys.time()),"-*.csv")
                                )

jobtoextract <- bqr_get_job(projectId = my_project, jobId = job_extract$jobReference$jobId)
bqr_wait_for_job(jobtoextract, wait = 5)

The extraction doesnt work for some reason

Job had error:

Operation cannot be performed on a nested schema. Field: event_params
[1] "Operation cannot be performed on a nested schema. Field: event_params"
Warning message:
In bqr_wait_for_job(jobtest, wait = 5) :
Operation cannot be performed on a nested schema. Field: event_params

@MarkEdmondson1234
Copy link
Collaborator

The extract is expecting a flat table from your query, I guess your SQL is writing back nested formatted JSON

@ThinkUser
Copy link
Author

The firebase events__ table will by default be nested no?
because of the events_param and user_properties
image

@MarkEdmondson1234
Copy link
Collaborator

Yes, to make flat tables you typically use UNNEST() to extract out the rows.

@ThinkUser
Copy link
Author

oh ok,
not familiar with SQL that much, will try and look online for some resources about how to do it

Thank you so much for all your help.

@ThinkUser
Copy link
Author

job <- bqr_query_asynch(projectId = my_project, 
                        datasetId = my_dataset,
                        "SELECT event_name, param, event_date FROM `native-app-33f08.analytics_153017681.events_*`, UNNEST(event_params) as param",
                        destinationTableId = "bigResultTable6",
                        useLegacySql = FALSE
                        ,writeDisposition = c("WRITE_TRUNCATE")
)


job_extract <- bqr_extract_data(projectId = my_project, 
                                datasetId = my_dataset,
                                tableId = "bigResultTable6",
                                cloudStorageBucket = "gs://bigquery_r",
                                filename = paste0("big-query-extract-", gsub(" |:|-", "", Sys.time()),"-*.csv")
)

still give me Operation cannot be performed on a nested schema. Field: param

Kind of lost here if you guide me I would highly appreciate it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants