fix(core) fix count_values query. #1545

yu-shipit · 2023-03-30T16:43:42Z

The column for count_values aggregator is StringColumn.

Pull Request checklist

The commit(s) message(s) follows the contribution guidelines ?
Tests for the changes have been added (for bug fixes / features) ?
Docs have been added / updated (for bug fixes / features) ?

The column for count_values aggregator is StringColumn.

amolnayak311 · 2023-03-30T17:00:16Z

query/src/main/scala/filodb/query/exec/aggregator/RowAggregator.scala

@@ -130,7 +130,7 @@ object RowAggregator {
          + s"or calculate the rate and histogram_quantile before applying the aggregation. "
          + s"If you have a genuine use case for this query, please get in touch.")
      }
-    } else if (valColType == ColumnType.DoubleColumn) {
+    } else if (valColType == ColumnType.DoubleColumn || valColType == ColumnType.StringColumn) {


Why do we need else if? Original intent of the PR is to handle histograms which is already done just let everything else follow the same path as before. By restricting the types we don't know additional regression we may see.

good point. The patch removed the default case so that compile would fail if missing any cases.

@vishramachandran , requesting your feedback.

The current patch doesn't work for topk/bottomk
topk(2, foo{..}) fails with The TopK operation is not supported for StringColumn
thus I strongly feel as I originally recommended, minimize the blast radius and restrict your changes to only histograms, which was the original intention. Everything else should take the same code path as before.

you are right.

I think the problem may be caused by this function
def valueColumnType(schema: ResultSchema): ColumnType = { require(schema.isTimeSeries, s"Schema $schema is not time series based, cannot continue query") require(schema.columns.size >= 2, s"Schema $schema has less than 2 columns, cannot continue query") schema.columns(1).colType }
For schema more than 2 columns schema.columns(1).colType may be always be the valueColumnType

amolnayak311 · 2023-03-31T06:16:13Z

query/src/main/scala/filodb/query/exec/aggregator/RowAggregator.scala

+      case BottomK => new TopBottomKRowAggregator(params(0).asInstanceOf[Double].toInt, true)
+      case Quantile => new QuantileRowAggregator(params(0).asInstanceOf[Double])
+      case Stdvar if valColType == ColumnType.DoubleColumn => StdvarRowAggregator
+      case Stddev if valColType == ColumnType.DoubleColumn => StddevRowAggregator


@vishramachandran I see Group, Avg, StdVar and StdDev are now restricted to column type double only. Would that be ok? I mentioned a couple of times here that we should try and minimize the blast radius by just fixing the count and sum for histograms only (the intent behind the original PR) and let everything else continue to work the way it is. We already found a regression in count and luckily I found regression in topk/bottomk during my local testing. I would like to get your views and I'll be holding off my approval for the PR till then.Thanks

cc @kvpetrov @sandeep6189 @sherali42

@amolnayak311 Thanks for the help in identifying the regression !

I agree on containing the blast radius and minimizing regressions. At the same time we need to send out a clearer error message that helps the user esp if the query is not meaningful. I don't see us supporting some of the aggregations (like min, max, stddev, stdvar etc) on histograms. Do you see that possibility ?

I think we are seeing this regression because of absence of unit tests esp for this function. Improving unit tests by running this function through various types could make it more solid, but it comes at the cost of increasing this PR scope.

amolnayak311

As discussed over the call, this looks good now.

fix(core) fix count_values query.

f0b3808

The column for count_values aggregator is StringColumn.

amolnayak311 requested changes Mar 30, 2023

View reviewed changes

Yu Zhang added 2 commits March 30, 2023 12:11

let count values accept all types.

35466e6

fix the cases.

a1f1e0c

yu-shipit requested a review from vishramachandran March 30, 2023 22:51

amolnayak311 reviewed Mar 31, 2023

View reviewed changes

relax cases to not histogram

2644f14

amolnayak311 approved these changes Mar 31, 2023

View reviewed changes

yu-shipit merged commit 59ffa04 into filodb:develop Mar 31, 2023

yu-shipit deleted the fix_count_values branch March 31, 2023 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core) fix count_values query. #1545

fix(core) fix count_values query. #1545

yu-shipit commented Mar 30, 2023 •

edited by amolnayak311

amolnayak311 Mar 30, 2023

yu-shipit Mar 30, 2023

amolnayak311 Mar 30, 2023

amolnayak311 Mar 30, 2023 •

edited

yu-shipit Mar 30, 2023

amolnayak311 Mar 31, 2023 •

edited

vishramachandran Mar 31, 2023 •

edited

amolnayak311 left a comment

fix(core) fix count_values query. #1545

fix(core) fix count_values query. #1545

Conversation

yu-shipit commented Mar 30, 2023 • edited by amolnayak311

amolnayak311 Mar 30, 2023

Choose a reason for hiding this comment

yu-shipit Mar 30, 2023

Choose a reason for hiding this comment

amolnayak311 Mar 30, 2023

Choose a reason for hiding this comment

amolnayak311 Mar 30, 2023 • edited

Choose a reason for hiding this comment

yu-shipit Mar 30, 2023

Choose a reason for hiding this comment

amolnayak311 Mar 31, 2023 • edited

Choose a reason for hiding this comment

vishramachandran Mar 31, 2023 • edited

Choose a reason for hiding this comment

amolnayak311 left a comment

Choose a reason for hiding this comment

yu-shipit commented Mar 30, 2023 •

edited by amolnayak311

amolnayak311 Mar 30, 2023 •

edited

amolnayak311 Mar 31, 2023 •

edited

vishramachandran Mar 31, 2023 •

edited