Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Lake S3 connector parallelism and network speed falls to 0 #21872

Closed
dishkakrauch opened this issue May 8, 2024 · 1 comment
Closed

Comments

@dishkakrauch
Copy link

dishkakrauch commented May 8, 2024

Hello everybody.
We migrated our Trino K8S helm deployment to private cloud K8S cluster and faced really strange problem. Now we're using self-hosted S3 instred of on prem HDFS for Delta Lake connector.
Following these docs https://trino.io/docs/current/object-storage/file-formats.html and https://trino.io/docs/current/object-storage/file-system-s3.html we found out that native S3 config can't work for us.
This is our delta lake connector config:

additionalCatalogs:
    delta: |-
        connector.name=delta_lake
        hive.metastore.uri=thrift://hive-metastore.our.cloud.net
        hive.s3.aws-access-key=*****
        hive.s3.aws-secret-key=*****
        hive.s3.endpoint=https://storage.our.cloud.net
        hive.s3.path-style-access=true
        hive.s3.max-connections=1000
        delta.enable-non-concurrent-writes=true

Any CTAS query falls to zero speed at parallelism and network levels after a few minutes. Then query fails with error:

io.trino.spi.TrinoException: Could not communicate with the remote task. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes.

Can anybody help with suggestion or idea?

@dishkakrauch
Copy link
Author

Solved by changing endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant