Push-down query API #241

martin-traverse · 2022-12-16T00:04:55Z

Push ad-hoc queries down to the storage layer
For large datasets, this is considerably more efficient than fetching the entire dataset
For small datasets fetching is still cheaper, so ideally there should be a heuristic

Base implementation uses a long-lived instance of the runtime running as a service, which executes queries as SQL models. This has the advantage of being portable to deployment situations where native query capabilities are not easily available. Push down is achieved for large dataset using Spark and for small datasets using Arrow.

Optimised implementations can be added for the cloud providers and Hadoop, which all have technologies for creating query interfaces over files held in storage. Push-down is achieved by converting standard SQL into a query on the underlying data technology, the data service creates / destroys queryable tables in the infrastructure on demand and track them using updates to the storage definition.

martin-traverse added this to the 0.8 milestone Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push-down query API #241

Push-down query API #241

martin-traverse commented Dec 16, 2022

Push-down query API #241

Push-down query API #241

Comments

martin-traverse commented Dec 16, 2022