Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature]: support sub query/join #882

Open
stone1100 opened this issue Jan 15, 2023 · 3 comments
Open

[feature]: support sub query/join #882

stone1100 opened this issue Jan 15, 2023 · 3 comments
Labels
feature New feature

Comments

@stone1100
Copy link
Member

Is your feature request related to a problem? Please describe.
support sub query/join

Describe the solution you'd like

  • support sub query
  • join with the metric's tags

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@stone1100 stone1100 added the feature New feature label Jan 15, 2023
@joyant
Copy link
Contributor

joyant commented Dec 1, 2023

If I make any mistakes, please feel free to interrupt me.

Joining different databases is not allowed unless there are sufficient reasons and a well-considered time alignment scheme.

Although the join condition involves only tags, having the same timestamp for both left and right metrics is an implicit condition. In other words:

select metric_a a, metric_b b, metric_c c where a.tag1 = b.tag1
and c.tag2 = b.tag2;

Is equivalent to:

select metric_a a, metric_b b, metric_c c where a.tag1 = b.tag1
and c.tag2 = b.tag2
and a.timestamp = b.timestamp -- No need to include this sentence; it's just an explanation.
and b.timestamp = c.timestamp;

Query amplification may be the most challenging aspect of join operations in a distributed scenario, and there are already many distributed join solutions. Fortunately, Lindb's CalcQueryInterval function has limited the interval to a certain range, and its minimum unit is seconds, determining that the data for a single query will not be too large (unless a metric has a large number of fields). The query result for a single metric will have at most a few thousand records.

Therefore, we can adopt a relatively simple approach to address this issue:

  1. The root node of the broker specifies N (N is the number of tables to be joined) execution plans, and each execution plan corresponds to a rewritten simple SQL query.
  2. The root node concurrently executes the query plans and waits for the results.
  3. After all query results are returned, perform merge join or hash join and return the results to the client.

Perhaps, before these steps, querying all tag keys and tag values for each metric can be considered, and each execution plan also includes tag conditions. However, this is not mandatory.

As for subqueries:

select sum(c.f1+c.f2) from (
select a.field1 as f1, b.field2 as f2
from metric_a a, metric_b b 
where a.tag1 = b.tag1
and time > time()-1M
) c where c.f1 > 100;

It must be a simple subquery, and subquery statements cannot be joined with external statements.

Subqueries may also contain subqueries, but the root must extract the innermost subquery statement, specify the query plan, and, similar to implementing join operations, after multiple execution plans return results, the root performs further filtering and aggregation before finally returning the data to the client.

@stone1100
Copy link
Member Author

Yes, just do time series join(metric name + tags) not include timestamp, because all time series must have same time range in all sub queries.

  1. root node plans statement and rewrites each sub query;
  2. root node picks execution nodes(intermediate/leaf nodes), sends request and waits for all result;
  3. after all results are returned, do time series join;

@joyant
Copy link
Contributor

joyant commented Feb 1, 2024

I'm marking this issue for now. We'll revisit and aim to implement this feature once task #1015 is completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

No branches or pull requests

2 participants