Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. #16366

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

gianm
Copy link
Contributor

@gianm gianm commented May 1, 2024

This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr. As a result, all such expressions can now participate in vectorized queries.

In addition:

  • Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well. Identifiers already did return true from canVectorize, so this enables them to live up to their claims.

  • In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes it easier to identify tests that can now vectorize.

  • Fixes a null-matcher bug in StringObjectVectorValueMatcher that was uncovered by certain newly-vectorizable test cases.

Benchmarks follow for SqlExpressionBenchmark queries 26 and 27. These two queries are:

-- 26: group by string expr with non-expr agg
SELECT CONCAT(string2, '-', long2), SUM(double1) FROM foo GROUP BY 1 ORDER BY 2

-- 27: group by string expr with expr agg
SELECT CONCAT(string2, '-', long2), SUM(long1 * double4) FROM foo GROUP BY 1 ORDER BY 2

In these cases fallback vectorization is not as compelling as proper vectorization, but it's better than unvectorized execution. The relative benefit is greater for query 27, likely because fallback vectorization for CONCAT enables the long1 * double4 to vectorize as well. In general I would expect the benefit to be greater for more complex queries, due to this effect.

Benchmark                        (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt    Score    Error  Units

SqlExpressionBenchmark.querySql       26            500000      auto        false  avgt    5  259.288 ±  4.863  ms/op
SqlExpressionBenchmark.querySql       26            500000      auto       native  avgt    5  194.405 ± 10.185  ms/op
SqlExpressionBenchmark.querySql       26            500000      auto     fallback  avgt    5  244.807 ± 6.851  ms/op

SqlExpressionBenchmark.querySql       27            500000      auto        false  avgt    5  289.065 ± 15.253  ms/op
SqlExpressionBenchmark.querySql       27            500000      auto       native  avgt    5  194.829 ±  7.732  ms/op
SqlExpressionBenchmark.querySql       27            500000      auto     fallback  avgt    5  248.331 ± 4.732  ms/op

This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable
operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr.

In addition:

- Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition
  to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well.

- In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes
  it easier to identify tests that can now vectorize.

- Fix a null-matcher bug in StringObjectVectorValueMatcher.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant