Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Issues found by Spark UT Framework on RapidsRegexpExpressionsSuite #10774

Closed
4 tasks done
binmahone opened this issue May 7, 2024 · 6 comments · Fixed by #10861
Closed
4 tasks done

[BUG] Issues found by Spark UT Framework on RapidsRegexpExpressionsSuite #10774

binmahone opened this issue May 7, 2024 · 6 comments · Fixed by #10861
Assignees
Labels
bug Something isn't working

Comments

@binmahone
Copy link
Collaborator

binmahone commented May 7, 2024

Describe the bug

Spark UT Framework enabled RapidsRegexpExpressionsSuite (#10743), with the following test cases explicitly excluded:

  • RegexReplace
  • RegexExtract
  • RegexExtractAll
  • SPLIT

These excluded test cases needs further investigating!!!
Notice: Other test cases in this suite may pass with falling back!

Steps/Code to reproduce bug

  1. Compile everything with mvn -Dbuildver=330 install -DskipTests
  2. Pick a test case name in the above table
  3. Go to RapidsTestSettings and find the line starting with ".exclude" and containing the test case name, comment it out
  4. Run the Suite then you'll see one failed test case. E.g. mvn -nsu -Dbuildver=330 -pl tests -Dsuites="org.apache.spark.sql.rapids.suites.RapidsXXXSuite" test (replace RapidsXXXSuite with the right name in issue header). ALWAYS double check if your suite name coincide with in source code, as it may contain typos!

Expected behavior
The suite can pass without excluding any test case.

@binmahone binmahone added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 7, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label May 7, 2024
@mattahrens
Copy link
Collaborator

Initial scope is triaging unit test failures to determine priorities of individual issues.

@NVnavkumar
Copy link
Collaborator

So the 4 tests pass if I disable Whole-stage codegen in RapidsTestTrait.scala

...        
   .config("spark.sql.codegen.wholeStage", "false")
...

I would recommend disabling it and retrying the whole suite of tests to check up on this issue and #10775 and maybe others.

@NVnavkumar
Copy link
Collaborator

So most of these tests are falling back to the CPU (the ignored test is a codegen verification test):

RapidsRegexpExpressionsSuite:
'LIKE ALL' NOT use RAPIDS
- LIKE ALL
'LIKE ANY' NOT use RAPIDS
- LIKE ANY
'LIKE Pattern' NOT use RAPIDS
- LIKE Pattern
'LIKE Pattern ESCAPE '/'' NOT use RAPIDS
- LIKE Pattern ESCAPE '/'
'LIKE Pattern ESCAPE '#'' NOT use RAPIDS
- LIKE Pattern ESCAPE '#'
'LIKE Pattern ESCAPE '"'' NOT use RAPIDS
- LIKE Pattern ESCAPE '"'
'RLIKE Regular Expression' NOT use RAPIDS
- RLIKE Regular Expression
'RegexReplace' NOT use RAPIDS
- RegexReplace
- SPARK-22570: RegExpReplace should not create a lot of global variables !!! IGNORED !!!
'RegexExtract' NOT use RAPIDS
- RegexExtract
'RegexExtractAll' NOT use RAPIDS
- RegexExtractAll
'SPLIT' NOT use RAPIDS
- SPLIT
'SPARK-30759: cache initialization for literal patterns' offload to RAPIDS
- SPARK-30759: cache initialization for literal patterns
'SPARK-34814: LikeSimplification should handle NULL' NOT use RAPIDS
- SPARK-34814: LikeSimplification should handle NULL

These tests are all falling back to CPU for a couple of reasons:

Use of non-literal patterns:

 !Expression <StringSplit> split(input[0, string, true], input[1, string, true], -1) cannot run on GPU because regexp only supports StringType if it is a literal value; Only literal delimiter patterns are supported

ConstantFolding is excluded:

Expression <Contains> Contains(foo, foo) cannot run on GPU because Cannot run on GPU. Is ConstantFolding excluded? Expression Contains(foo, foo) is foldable and operates on non literals

Recommend to disable this suite until the tests can be rewritten to avoid these errors. Otherwise, this suite is just running the same tests run by Apache Spark.

@binmahone
Copy link
Collaborator Author

will be fixed by #10851, suggest to close this issue

@jlowe jlowe linked a pull request May 21, 2024 that will close this issue
@jlowe
Copy link
Member

jlowe commented May 21, 2024

If we want to close this issue when #10851 is merged, then we can simply link that PR as closing this issue. Then it will be closed iff that PR is merged. I've updated this issue accordingly.

@GaryShen2008
Copy link
Collaborator

Close by #10861

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants