Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push-down query API #241

Open
martin-traverse opened this issue Dec 16, 2022 · 0 comments
Open

Push-down query API #241

martin-traverse opened this issue Dec 16, 2022 · 0 comments
Milestone

Comments

@martin-traverse
Copy link
Contributor

Push ad-hoc queries down to the storage layer
For large datasets, this is considerably more efficient than fetching the entire dataset
For small datasets fetching is still cheaper, so ideally there should be a heuristic

Base implementation uses a long-lived instance of the runtime running as a service, which executes queries as SQL models. This has the advantage of being portable to deployment situations where native query capabilities are not easily available. Push down is achieved for large dataset using Spark and for small datasets using Arrow.

Optimised implementations can be added for the cloud providers and Hadoop, which all have technologies for creating query interfaces over files held in storage. Push-down is achieved by converting standard SQL into a query on the underlying data technology, the data service creates / destroys queryable tables in the infrastructure on demand and track them using updates to the storage definition.

@martin-traverse martin-traverse added this to the 0.8 milestone Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant