Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Emulator with PySpark #264

Open
lvijnck opened this issue Jan 29, 2024 · 1 comment
Open

Use Emulator with PySpark #264

lvijnck opened this issue Jan 29, 2024 · 1 comment
Labels

Comments

@lvijnck
Copy link

lvijnck commented Jan 29, 2024

What happened?

Hi all,

I'm trying to read from the emulator using PySpark (no Scala), however, I can't seem to figure out how to setup the anonymous credentials.

Any ideas?

Reading the dataframe as follows:

    # Load dataset
    return session.read.format("bigquery") \
        .option("parentProject", "test") \
        .option("table", "test.test") \
        .option("proxyAddress", "0.0.0.0:9060") \
        .load().show()

This gives the following error:

POST https://oauth2.googleapis.com/token
{
  "error": "invalid_grant",
  "error_description": "Bad Request"
}
@lvijnck lvijnck added the bug Something isn't working label Jan 29, 2024
@totem3 totem3 added bug Something isn't working question and removed bug Something isn't working labels Jan 29, 2024
@totem3
Copy link
Sponsor Collaborator

totem3 commented Jan 30, 2024

Hi there

I am not familiar with pyspark or spark-bigquery-connector, but I understand that the bigquery-emulator does not request permissions or provide authentication features. Therefore, it seems unlikely that this issue is related to the bigquery-emulator but rather a problem on the client side.
From what I can see in the spark-bigquery-connector's README and the error messages, it appears that the spark-bigquery-connector requires some form of valid access token. When using the Java SDK without authentication, I supporse NoCredentials is typically used. However, from the look of the configuration interface, it doesn't seem possible to use that here.

Additionally, it is another issue though, you seem to have set the proxyAddress. According to the README and the following PR, the proxy is intended for connecting to BigQuery through a forward proxy like squid. Therefore, it seems incorrect to specify the address of the bigquery-emulator there. (I haven’t used it myself, so I might not be completely accurate.)

If you were to configure it, perhaps you should look at bigQueryHttpEndpoint or bigQueryStorageGrpcEndpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants