Allow individual shards to be targeted during query execution [FEATURE] #1478

akuzin1 · 2023-03-27T20:41:15Z

Is your feature request related to a problem?

In MySQL one can retrieve partition information about a table, which can later be used to target specific partitions during query execution.

The following is an example of a query that can be used to retrieve partition information of a specific table.

"SELECT DISTINCT partition_name FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = <table_name> AND TABLE_SCHEMA = <table_schema> " +
            "AND partition_name IS NOT NULL"

Next is an example of a query targeting a specific partition in a table.

SELECT * FROM table PARTITION (partitionName);

When applying this to opensearch, partitions could be treated as the equivalent of shards for our use case.

What solution would you like?
It would be great to be able to treat shards in Opensearch as the equivalent to MySQL Partitions and be able to query individual shards.

What alternatives have you considered?
We've considered generating splits based on hashing a key or tuple of keys and then modulo that against some fixed number of splits that we want to generate.

Example:
For example for 3 splits:

split 1:
SELECT *
FROM some_unpartitioned_table
WHERE hash(col1, col2, col3) % 10 == 0

split 2:
SELECT *
FROM some_unpartitioned_table
WHERE hash(col1, col2, col3) % 10 == 1

split 3:
SELECT *
FROM some_unpartitioned_table
WHERE hash(col1, col2, col3) % 10 == 2

However, it doesn't seem like there is a hashing function like that available.

Therefore, the above mentioned solution would be a great behavior to add for all sql users, to more closely mimic the behavior and syntax of MySQL.

The text was updated successfully, but these errors were encountered:

dai-chen · 2023-03-27T22:18:35Z

@akuzin1 Thanks for the feature request! This recall an old request previously: opendistro-for-elasticsearch/sql#1151. @acarbonetto is adding OpenSearch meta field support to our SQL engine. I'm thinking can we support this as part of meta field work.

akuzin1 · 2023-03-29T16:17:03Z

Sounds good, that would be great to see. Is there an estimated timeline for when this feature would be released?
Thank you

dai-chen · 2023-03-29T16:46:52Z

@acarbonetto @Yury-Fridlyand @MaxKsyunz ^^

acarbonetto · 2023-03-30T21:05:27Z

This will be a dependency for #1441
The API on OpenSearch side is _routing which we should consider using instead of _shard (or both!)

acarbonetto · 2023-05-25T15:48:09Z

related: #339

akuzin1 · 2023-06-09T15:01:42Z

Hi! Saw that the label was changed to "Priority-High" which is great to see, so wanted to check in and see if there is an estimated date for delivering of this feature? Thank you.

acarbonetto · 2023-06-16T21:59:56Z

Hi! Saw that the label was changed to "Priority-High" which is great to see, so wanted to check in and see if there is an estimated date for delivering of this feature? Thank you.

Target release is 2.10 at the moment. Hoping to get this in a little sooner.

acarbonetto · 2023-06-30T21:22:22Z

Proposal for setting routing field on the search request. There's two ways forward that I'd like to propose. But first a little contexts:

Goal

The objective of the partition/routing shard is to include the routing ID in the SearchRequest builder.

Example:

new SearchRequest()
        .indices(indexName.getIndexNames())
        .routing(routingId.getIndexNames())
        .scroll(scrollTimeout)
        .source(sourceBuilder);

Getting the routing id(s) from the initial query into pushdown can take one of two obvious routes.

Proposal 1: Request Parameter ("routing")

Syntax addition, includes a new json parameter only available in the V2 engine.

{
    "query": "select _id, _index, _routing, int0 from calcs_routing limit 5",
    "routing": "FURNITURE"
}

This includes an opensearch-similar API to target individual or lists of shards. The string will accept a comma-separated list of shard ID targets.

PoC Architecture Change

To get the request into OpenSearchRequest as part of pushdown requires that we create an AbstractPlan, LogicalPlan and PhysicalPlan operator like the Paginate and LogicalPaginate operators. This would allow us to pushdown a routingId string into the OpenSearchIndexScanQueryBuilder during execution. We would create Partition operators in a similar manner as the Pagination operators without much business logic. Reference to how paginate works: https://github.com/opensearch-project/sql/blob/main/docs/dev/Pagination-v2.md#unresolved-query-plan

Considerations: we may consider combining Paginate and Partition into a single set of operators, and call them something like PushDownConfiguration and push all the parameters at once. This would allow us to scale by configuration values without needing to add more operators.

Pros

OpenSearch-specific query syntax is favourable to OpenSearch users adopting the SQL language, and can use existing documentation to understand the usage of the routing query parameter.
No specific change required for PPL vs SQL.

Cons

This syntax differs from other SQL syntax and will cause a disruption in the JDBC driver as the option is not built-in for SQL users.
Architecture changes are quite disruptive.
Is not specifically obvious what the benefit is for other storage options (e.g. Prometheus) since this is an OpenSearch-specific query syntax.
Won't work (initially) with the SQL-CLI

Proposal 2: SQL `FROM table PARTITION (key, key, ...)` syntax

As defined in https://dev.mysql.com/doc/refman/8.0/en/partitioning-selection.html, MySQL (and other SQL engines) allows a query to target a specific partition by id. This also allows for multiple ids in the PARTITION function.

{
    "query": "select _id, _index, _routing, int0 from calcs_routing PARTITON(\"FUNITURE, OFFICE SPACE\") limit 5"
}

PoC Architecture Change

Update the parser syntax to allow for the PARTITION function in the FROM clause.

Ultimately, to get the routing ID(s) into the SearchRequest, we need to add the PARTITON keys to to IndexScan (which extends Table). Then in the IndexScanBuilder, send the routing ID(s) to the SearchRequest.

To accomplish this, we could push the partition keys into the Table (OpenSearchIndexScan) as partition keys during the analysis phase. On pushdown, the partition keys would be part of the IndexScan and could semi-easily push down to the OpenSearchRequest as part of the Index.

OpenSearchIndex already has the concept of an Index, and Routing Keys would have the same architecture:

Pros

SQL PARTITION is a MySQL concept that is simple to adopt and expose to users. No specific JDBC change is required.
May also be useful for other storage options, or ignored by those engines.

Cons

PPL and SQL syntax is different and will require that we add PPL syntax to push the routing ids into the storage engine.
Architecture changes are less disruptive but also not very usable.

macohen · 2024-02-08T20:38:09Z

Will there be a new projected release target for this feature, @acarbonetto? I see it was targeted for 2.11 and 2.12 is being released soon. Can this make 2.13, perhaps?

akuzin1 added enhancement New feature or request untriaged labels Mar 27, 2023

dai-chen removed the untriaged label Mar 27, 2023

dai-chen mentioned this issue Mar 29, 2023

#639: allow metadata fields and score opensearch function (#228) #1456

Merged

6 tasks

acarbonetto added the Priority-High label Jun 1, 2023

acarbonetto added this to On Deck in SQL/PPL Epic Roadmap Jun 9, 2023

acarbonetto moved this from On Deck to In Release 2.9 in SQL/PPL Epic Roadmap Jun 13, 2023

acarbonetto moved this from In Release 2.9 to 2.10 in SQL/PPL Epic Roadmap Jun 13, 2023

acarbonetto mentioned this issue Jun 16, 2023

Add _routing to SQL includes list Bit-Quill/opensearch-project-sql#277

Merged

6 tasks

This was referenced Jun 21, 2023

Add _routing to SQL includes list (#277) #1762

Closed

Add _routing to SQL includes list (#277) #1771

Merged

This was referenced Jul 4, 2023

[FEATURE] Metafield support in PPL #1805

Open

[PoC] Set routing shard in SQL via the relation partition #1808

Closed

MaxKsyunz closed this as completed in #1771 Jul 11, 2023

acarbonetto reopened this Jul 12, 2023

github-actions bot added the untriaged label Jul 12, 2023

acarbonetto removed the untriaged label Jul 12, 2023

acarbonetto mentioned this issue Jul 26, 2023

Set target routing shard by partition key Bit-Quill/opensearch-project-sql#316

Merged

9 tasks

acarbonetto moved this from 2.10 to 2.11 in SQL/PPL Epic Roadmap Aug 14, 2023

acarbonetto linked a pull request Aug 23, 2023 that will close this issue

Set target routing shard by partition key (#316) #2030

Open

9 tasks

akuzin1 mentioned this issue Feb 5, 2024

added new connector for OpenSearch data source awslabs/aws-athena-query-federation#1335

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow individual shards to be targeted during query execution [FEATURE] #1478

Allow individual shards to be targeted during query execution [FEATURE] #1478

akuzin1 commented Mar 27, 2023 •

edited

dai-chen commented Mar 27, 2023

akuzin1 commented Mar 29, 2023

dai-chen commented Mar 29, 2023

acarbonetto commented Mar 30, 2023

acarbonetto commented May 25, 2023

akuzin1 commented Jun 9, 2023

acarbonetto commented Jun 16, 2023

acarbonetto commented Jun 30, 2023 •

edited

macohen commented Feb 8, 2024

Allow individual shards to be targeted during query execution [FEATURE] #1478

Allow individual shards to be targeted during query execution [FEATURE] #1478

Comments

akuzin1 commented Mar 27, 2023 • edited

dai-chen commented Mar 27, 2023

akuzin1 commented Mar 29, 2023

dai-chen commented Mar 29, 2023

acarbonetto commented Mar 30, 2023

acarbonetto commented May 25, 2023

akuzin1 commented Jun 9, 2023

acarbonetto commented Jun 16, 2023

acarbonetto commented Jun 30, 2023 • edited

Goal

Proposal 1: Request Parameter ("routing")

PoC Architecture Change

Pros

Cons

Proposal 2: SQL FROM table PARTITION (key, key, ...) syntax

PoC Architecture Change

Pros

Cons

macohen commented Feb 8, 2024

akuzin1 commented Mar 27, 2023 •

edited

acarbonetto commented Jun 30, 2023 •

edited

Proposal 2: SQL `FROM table PARTITION (key, key, ...)` syntax