Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R2 Server "Rollup" Feature request/help. #4119

Open
alxn opened this issue Jun 8, 2022 · 0 comments
Open

R2 Server "Rollup" Feature request/help. #4119

alxn opened this issue Jun 8, 2022 · 0 comments

Comments

@alxn
Copy link

alxn commented Jun 8, 2022

We have some very high cardinality data that we are representing as an undirected graph.

In order to reduce the m3 burden, we emit edges of the graph, using "lo" and "hi" tags, so that we drastically reduce the cardinality in the system.

For example, with the following graph:

    A
   / \
4 /   \ 5
 /     \
B-------C
    7

We would emit:

node-lo:A node-hi:B => 4
node-lo:A node-hi:C => 5
node-lo:B node-hi:C => 7

Note, and this is the problem, that B appears as both low and high in the graph.

As well as the edges, we want to calculate per node "rollups". The way we are currently doing this is:

fLO = exec(fetch name:failures node-lo:* ... | mapKey node-lo node);
fHI = exec(fetch name:failures node-hi:* ... | mapKey node-hi node);

sLO = exec(fetch name:successes node-lo:* ... | mapKey node-lo node);
sHI = exec(fetch name:successes node-hi:* ... | mapKey node-hi node);

f = exec(fLO | fHI);
t = exec(fLO | fHI | sLO | sHI);

(f | sum node) | asPercent (t | sum node)

If you squint you will see that this query actually executes fetches 6 times. Which leads to abysmal performance and runs us very quickly into QoS limits.

The ideal query would look like:

f = fetch name:failures  node:* ... | sum node;
t = fetch name:{failures,successes} node:* ... | sum node;

(f) | asPercent (t)

The reason "rollup" is in quotes, is this is effectively this data:

node:A => 9
node:B => 11
node:C => 12

We know this is not possible today with R2, but we are asking the experts if this could ever be possible, and some hint as towards how we could submit a pull request for such a change.

Currently the workaround we are using today is triple emission:

node-lo:A node-hi:B => 4
node:A              => 4
node:B              => 4
node-lo:A node-hi:C => 5
node:A              => 5
node:C              => 5
node-lo:B node-hi:C => 7
node:B              => 7
node:C              => 7

... Which is not ideal.

Full credit to @NonLogicalDev for the write up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant