Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Refactor data filter in scan transformer #5812

Merged

Conversation

gaoyangxiaozhu
Copy link
Contributor

@gaoyangxiaozhu gaoyangxiaozhu commented May 20, 2024

What changes were proposed in this pull request?

This PR do small changes:

  1. Add isRowIndexMetadataColumn function in shim layer to check if one column is row index or not.
  2. Small change of spark 35 datafilters in native scan to support filter based on file metadata columns since velox backend support that.
  3. Don't push down row index column filter to scan since velox not support that.

(Fixes: #5047)

It's dependency PR of #5351

How was this patch tested?

UT test && manually run.

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Contributor Author

@zhouyuan / @JkSelf help review

@gaoyangxiaozhu
Copy link
Contributor Author

@zhouyuan / @JkSelf help review

ping again!

@gaoyangxiaozhu
Copy link
Contributor Author

@zhli1142015 could you help merge ?

@zhli1142015 zhli1142015 merged commit 3bef312 into apache:main May 21, 2024
45 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5812_time.csv log/native_master_05_13_2024_33f993554_time.csv difference percentage
q1 36.70 36.42 -0.279 99.24%
q2 24.37 23.70 -0.671 97.25%
q3 38.06 36.61 -1.451 96.19%
q4 38.92 37.83 -1.098 97.18%
q5 68.78 70.62 1.836 102.67%
q6 9.86 7.47 -2.394 75.73%
q7 81.17 85.08 3.905 104.81%
q8 85.44 84.30 -1.143 98.66%
q9 121.50 121.00 -0.497 99.59%
q10 45.58 45.37 -0.214 99.53%
q11 20.37 19.49 -0.880 95.68%
q12 26.56 26.52 -0.039 99.85%
q13 57.34 55.08 -2.259 96.06%
q14 19.86 21.53 1.669 108.41%
q15 29.24 29.02 -0.218 99.26%
q16 12.97 14.00 1.032 107.95%
q17 102.17 103.62 1.448 101.42%
q18 146.60 146.24 -0.366 99.75%
q19 13.49 13.35 -0.144 98.93%
q20 29.99 29.06 -0.932 96.89%
q21 281.10 282.54 1.446 100.51%
q22 14.80 14.61 -0.190 98.72%
total 1304.88 1303.44 -1.437 99.89%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Row Index Metadata Column support for Parquet Scan
4 participants