groupByTags tag parsing adds bad tags #1917

shanson7 · 2020-10-07T13:36:55Z

Describe the bug

groupByTags builds a buffer of ; delimited tags to form the key and then groups all series with the same key together. To get the individual tag values back, it calls SetTags() on the result series.

Given input from a "complex" pipeline like:

groupByTags(
  divideSeriesLists(
    sortByName(
      seriesByTag('namespace=os', 'cluster=clusterName', 'name=cpu.percent.user'),
    false), 
    removeBelowValue(
      sortByName(
        seriesByTag('namespace=os', 'cluster=clusterName', 'name=cpu.percent.idle'),
      false),
    10)
  ),
'sum', 'name')

and let's say the input series just has the cluster tag. After going through SetTags() we get tags like:
name=divideSeries(cpu.percent.user.g
cluster=clusterName
namespace=os, 10))

groupByTags should have just produced the name tag but because it calls SetTags() and the target after divideSeriesLists had values like divideSeries(cpu.percent.user;cluster=clusterName;namespace=os,removeBelowValue(cpu.percent.idle;cluster=clusterName;namespace=os, 10)) it just naively splits on ; and adds some bogus tags.

The text was updated successfully, but these errors were encountered:

shanson7 · 2020-10-07T13:39:05Z

Perhaps groupByTags should be "smarter" in building the key to prevent this issue instead of relying on string encoding/parsing?

fkaleo · 2020-11-14T00:46:04Z

I believe that the following query, a little simpler perhaps, exhibits the same issue:

groupByTags(divideSeriesLists(seriesByTag('name=random1'), seriesByTag('name=random2')), 'sum', 'name')

Dieterbe · 2021-01-18T14:32:55Z

@shanson7 do you feel the following 2 tests adequately cover the problem (if not, can you clarify):

func TestGroupByTagsSingleSeriesWithFuncs(t *testing.T) {
        in := []models.Series{
                getModel("foo(name1;t1=v1,bar(name1;t1=v1, 123))", a),
                getModel("foo(name1;t1=v1,bar(name1;t1=v1, 123))", a),
        }
        out := []models.Series{
                getModel("name1;t1=v1", sumab),
        }

        out[0].Datapoints = out[0].Datapoints[:0]
        aggFunc := getCrossSeriesAggFunc("sum")
        aggFunc(in, &out[0].Datapoints)

        testGroupByTags("SingleSeriesSum", in, out, "sum", []string{"t1"}, nil, t)
}
func TestGroupByTagsSingleSeriesWithFuncs2Tags(t *testing.T) {
        in := []models.Series{
                getModel("foo(name1;t1=v1;t2=v2,bar(name1;t1=v1;t2=v2, 123))", a),
                getModel("foo(name1;t1=v1;t2=v2,bar(name1;t1=v1;t2=v2, 123))", a),
        }
        out := []models.Series{
                getModel("name1;t1=v1;t2=v2", sumab),
        }

        out[0].Datapoints = out[0].Datapoints[:0]
        aggFunc := getCrossSeriesAggFunc("sum")
        aggFunc(in, &out[0].Datapoints)

        testGroupByTags("SingleSeriesSum", in, out, "sum", []string{"t1"}, nil, t)
}

I think we're going about this the wrong way.

What's not clear to me is how to generalize this. if any series target can be any arbitrary mixture of function calls and various series (each with possibly different sets of tags, both in terms of the tag names as well as values), then what are the tags of said series? @fkaleo's PR mentions " SetTags should consider as tags only the tag expressions (ie. ;tag=value) that are found after the end of the last function of the target.". Why those particular tags? this feels completely arbitrary. I don't think we can really assume how this should be reconciled ?

Furthermore, I think we need to be mindful that groupByTag not only needs a proper set of tags, but also a proper name.
I think a "name" like foo(name1;t1=v1,bar(name1;t1=v1, 123)) (which is what serie.Tags["name"] is in one of the tests above) is not useful, at least not if we're going to cleanly reconcile the tags already (see below), then we can do the same for the name.

If by the time groupByTags runs, we then need to do string parsing to figure out what the tags are of the input (and name), that feels wrong. rather, we should probably have each intermediate step somehow set tags and names appropriately, such that groupByTags doesn't have to call SetTags to begin with.

So perhaps we need to figure out/implement graphite-project/graphite-web#2639 first.

shanson7 · 2021-01-18T15:01:03Z

@shanson7 do you feel the following 2 tests adequately cover the problem

Maybe? I'd like to see those tests using the name tag and a case using another tag as well.

if any series target can be any arbitrary mixture of function calls and various series (each with possibly different sets of tags, both in terms of the tag names as well as values), then what are the tags of said series?

The tags of the series are well known, they are in the Tags member. Assuming we already have tags, we shouldn't need to parse tags out of anything. AFAIK, the only issue is with the name tag, which can be made complex with function calls and combinations. Other tags either exist or don't (AFAIK, there are no other tag manipulation functions like aliasTag).

So, I'm not sure that I would consider this a bug in SetTags (which seems to be what #1948 seems to be treating it as) but rather improper usage of SetTags.

shanson7 added the bug label Oct 7, 2020

Dieterbe added this to the sprint-18 milestone Oct 12, 2020

fkaleo assigned robert-milan and fkaleo and unassigned robert-milan Oct 28, 2020

fkaleo mentioned this issue Nov 14, 2020

Fix set tags for complex targets #1948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupByTags tag parsing adds bad tags #1917

groupByTags tag parsing adds bad tags #1917

shanson7 commented Oct 7, 2020

shanson7 commented Oct 7, 2020

fkaleo commented Nov 14, 2020

Dieterbe commented Jan 18, 2021 •

edited

shanson7 commented Jan 18, 2021

groupByTags tag parsing adds bad tags #1917

groupByTags tag parsing adds bad tags #1917

Comments

shanson7 commented Oct 7, 2020

shanson7 commented Oct 7, 2020

fkaleo commented Nov 14, 2020

Dieterbe commented Jan 18, 2021 • edited

shanson7 commented Jan 18, 2021

Dieterbe commented Jan 18, 2021 •

edited