Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

groupByTags tag parsing adds bad tags #1917

Open
shanson7 opened this issue Oct 7, 2020 · 4 comments
Open

groupByTags tag parsing adds bad tags #1917

shanson7 opened this issue Oct 7, 2020 · 4 comments
Assignees
Labels
Milestone

Comments

@shanson7
Copy link
Collaborator

shanson7 commented Oct 7, 2020

Describe the bug

groupByTags builds a buffer of ; delimited tags to form the key and then groups all series with the same key together. To get the individual tag values back, it calls SetTags() on the result series.

Given input from a "complex" pipeline like:

groupByTags(
  divideSeriesLists(
    sortByName(
      seriesByTag('namespace=os', 'cluster=clusterName', 'name=cpu.percent.user'),
    false), 
    removeBelowValue(
      sortByName(
        seriesByTag('namespace=os', 'cluster=clusterName', 'name=cpu.percent.idle'),
      false),
    10)
  ),
'sum', 'name')

and let's say the input series just has the cluster tag. After going through SetTags() we get tags like:
name=divideSeries(cpu.percent.user.g
cluster=clusterName
namespace=os, 10))

groupByTags should have just produced the name tag but because it calls SetTags() and the target after divideSeriesLists had values like divideSeries(cpu.percent.user;cluster=clusterName;namespace=os,removeBelowValue(cpu.percent.idle;cluster=clusterName;namespace=os, 10)) it just naively splits on ; and adds some bogus tags.

@shanson7 shanson7 added the bug label Oct 7, 2020
@shanson7
Copy link
Collaborator Author

shanson7 commented Oct 7, 2020

Perhaps groupByTags should be "smarter" in building the key to prevent this issue instead of relying on string encoding/parsing?

@Dieterbe Dieterbe added this to the sprint-18 milestone Oct 12, 2020
@fkaleo fkaleo assigned robert-milan and fkaleo and unassigned robert-milan Oct 28, 2020
@fkaleo
Copy link
Contributor

fkaleo commented Nov 14, 2020

I believe that the following query, a little simpler perhaps, exhibits the same issue:

groupByTags(divideSeriesLists(seriesByTag('name=random1'), seriesByTag('name=random2')), 'sum', 'name')

image

@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 18, 2021

@shanson7 do you feel the following 2 tests adequately cover the problem (if not, can you clarify):

func TestGroupByTagsSingleSeriesWithFuncs(t *testing.T) {
        in := []models.Series{
                getModel("foo(name1;t1=v1,bar(name1;t1=v1, 123))", a),
                getModel("foo(name1;t1=v1,bar(name1;t1=v1, 123))", a),
        }
        out := []models.Series{
                getModel("name1;t1=v1", sumab),
        }

        out[0].Datapoints = out[0].Datapoints[:0]
        aggFunc := getCrossSeriesAggFunc("sum")
        aggFunc(in, &out[0].Datapoints)

        testGroupByTags("SingleSeriesSum", in, out, "sum", []string{"t1"}, nil, t)
}
func TestGroupByTagsSingleSeriesWithFuncs2Tags(t *testing.T) {
        in := []models.Series{
                getModel("foo(name1;t1=v1;t2=v2,bar(name1;t1=v1;t2=v2, 123))", a),
                getModel("foo(name1;t1=v1;t2=v2,bar(name1;t1=v1;t2=v2, 123))", a),
        }
        out := []models.Series{
                getModel("name1;t1=v1;t2=v2", sumab),
        }

        out[0].Datapoints = out[0].Datapoints[:0]
        aggFunc := getCrossSeriesAggFunc("sum")
        aggFunc(in, &out[0].Datapoints)

        testGroupByTags("SingleSeriesSum", in, out, "sum", []string{"t1"}, nil, t)
}

I think we're going about this the wrong way.

What's not clear to me is how to generalize this. if any series target can be any arbitrary mixture of function calls and various series (each with possibly different sets of tags, both in terms of the tag names as well as values), then what are the tags of said series? @fkaleo's PR mentions " SetTags should consider as tags only the tag expressions (ie. ;tag=value) that are found after the end of the last function of the target.". Why those particular tags? this feels completely arbitrary. I don't think we can really assume how this should be reconciled ?

Furthermore, I think we need to be mindful that groupByTag not only needs a proper set of tags, but also a proper name.
I think a "name" like foo(name1;t1=v1,bar(name1;t1=v1, 123)) (which is what serie.Tags["name"] is in one of the tests above) is not useful, at least not if we're going to cleanly reconcile the tags already (see below), then we can do the same for the name.

If by the time groupByTags runs, we then need to do string parsing to figure out what the tags are of the input (and name), that feels wrong. rather, we should probably have each intermediate step somehow set tags and names appropriately, such that groupByTags doesn't have to call SetTags to begin with.

So perhaps we need to figure out/implement graphite-project/graphite-web#2639 first.

@shanson7
Copy link
Collaborator Author

@shanson7 do you feel the following 2 tests adequately cover the problem

Maybe? I'd like to see those tests using the name tag and a case using another tag as well.

if any series target can be any arbitrary mixture of function calls and various series (each with possibly different sets of tags, both in terms of the tag names as well as values), then what are the tags of said series?

The tags of the series are well known, they are in the Tags member. Assuming we already have tags, we shouldn't need to parse tags out of anything. AFAIK, the only issue is with the name tag, which can be made complex with function calls and combinations. Other tags either exist or don't (AFAIK, there are no other tag manipulation functions like aliasTag).

So, I'm not sure that I would consider this a bug in SetTags (which seems to be what #1948 seems to be treating it as) but rather improper usage of SetTags.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants