Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h2o.exportFile NPE with parquet "string"s #16161

Closed
hutch3232 opened this issue Apr 17, 2024 · 1 comment · Fixed by #16175 or #16195
Closed

h2o.exportFile NPE with parquet "string"s #16161

hutch3232 opened this issue Apr 17, 2024 · 1 comment · Fixed by #16175 or #16195
Assignees
Labels
Milestone

Comments

@hutch3232
Copy link

H2O version, Operating System and Environment

packageVersion("h2o")
# [1] ‘3.46.0.1’

Actual behavior
When exporting parquets, if any of the columns are of type "string" h2o will generally return a Null Pointer Exception. I did find occasionally it will work, in particular if there aren't very many rows, but we can see in my example it fails even when there are just 100.

Expected behavior
h2o should be able to export "strings". If that isn't technically possible, maybe it should catch that and let the user know what columns are troublesome.

Steps to reproduce

library(h2o)

h2o.init()

df <- h2o.createFrame(rows = 100,
                      cols = 10,
                      string_fraction = 0.1, # create one string column
                      seed = 5,
                      seed_for_column_types = 25)

h2o.describe(df)
#    Label   Type Missing Zeros PosInf NegInf        Min      Max        Mean      Sigma Cardinality
# 1     C1   real       1     0      0      0  -96.26020 98.72199 10.34758802 60.8078989          NA
# 2     C2   real       1     0      0      0  -99.25493 97.81689 -3.39471050 58.8410834          NA
# 3     C3 string       2     0      0      0        NaN      NaN          NA         NA          NA
# 4     C4   enum       2     1      0      0    0.00000 99.00000          NA         NA         100
# 5     C5   real       4     0      0      0  -96.08773 98.90815  7.21620958 58.6954135          NA
# 6     C6    int       1    96      0      0    0.00000  1.00000  0.03030303  0.1722922          NA
# 7     C7    int       0     0      0      0  -98.00000 96.00000 -0.87000000 56.4286722          NA
# 8     C8    int       1     0      0      0 -100.00000 98.00000 -6.07070707 58.9779019          NA
# 9     C9   real       1     0      0      0  -98.43982 97.66665 -7.07826544 55.5486277          NA
# 10   C10   enum       0     2      0      0    0.00000 98.00000          NA         NA         100

h2o.exportFile(data = df,
               path = "df",
               format = "parquet",
               write_checksum = FALSE)

# java.lang.NullPointerException
#
# java.lang.NullPointerException
# at water.parser.parquet.FrameParquetExporter$PartExportParquetTask.map(FrameParquetExporter.java:115)
# at water.MRTask.compute2(MRTask.java:819)
# at water.H2O$H2OCountedCompleter.compute1(H2O.java:1707)
# at water.parser.parquet.FrameParquetExporter$PartExportParquetTask$Icer.compute1(FrameParquetExporter$PartExportParquetTask$Icer.java)
# at water.H2O$H2OCountedCompleter.compute(H2O.java:1703)
# at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
# at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
# at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
# at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
# at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
#
# Error: java.lang.NullPointerException

unlink("~/df", recursive = TRUE)

# delete the string column
df$C3 <- NULL

# success
h2o.exportFile(data = df,
               path = "df",
               format = "parquet",
               write_checksum = FALSE)
@hutch3232 hutch3232 added the bug label Apr 17, 2024
@krasinski krasinski added this to the 3.46.0.2 milestone Apr 23, 2024
krasinski added a commit that referenced this issue Apr 23, 2024
krasinski added a commit that referenced this issue Apr 23, 2024
@krasinski krasinski linked a pull request Apr 23, 2024 that will close this issue
@krasinski
Copy link
Member

@hutch3232 thank you for reporting that issue! the fix will probably go in the next minor release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants