`h2o.exportFile` NPE with parquet "string"s #16161

hutch3232 · 2024-04-17T16:24:19Z

H2O version, Operating System and Environment

packageVersion("h2o")
# [1] ‘3.46.0.1’

Actual behavior
When exporting parquets, if any of the columns are of type "string" h2o will generally return a Null Pointer Exception. I did find occasionally it will work, in particular if there aren't very many rows, but we can see in my example it fails even when there are just 100.

Expected behavior
h2o should be able to export "strings". If that isn't technically possible, maybe it should catch that and let the user know what columns are troublesome.

Steps to reproduce

library(h2o)

h2o.init()

df <- h2o.createFrame(rows = 100,
                      cols = 10,
                      string_fraction = 0.1, # create one string column
                      seed = 5,
                      seed_for_column_types = 25)

h2o.describe(df)
#    Label   Type Missing Zeros PosInf NegInf        Min      Max        Mean      Sigma Cardinality
# 1     C1   real       1     0      0      0  -96.26020 98.72199 10.34758802 60.8078989          NA
# 2     C2   real       1     0      0      0  -99.25493 97.81689 -3.39471050 58.8410834          NA
# 3     C3 string       2     0      0      0        NaN      NaN          NA         NA          NA
# 4     C4   enum       2     1      0      0    0.00000 99.00000          NA         NA         100
# 5     C5   real       4     0      0      0  -96.08773 98.90815  7.21620958 58.6954135          NA
# 6     C6    int       1    96      0      0    0.00000  1.00000  0.03030303  0.1722922          NA
# 7     C7    int       0     0      0      0  -98.00000 96.00000 -0.87000000 56.4286722          NA
# 8     C8    int       1     0      0      0 -100.00000 98.00000 -6.07070707 58.9779019          NA
# 9     C9   real       1     0      0      0  -98.43982 97.66665 -7.07826544 55.5486277          NA
# 10   C10   enum       0     2      0      0    0.00000 98.00000          NA         NA         100

h2o.exportFile(data = df,
               path = "df",
               format = "parquet",
               write_checksum = FALSE)

# java.lang.NullPointerException
#
# java.lang.NullPointerException
# at water.parser.parquet.FrameParquetExporter$PartExportParquetTask.map(FrameParquetExporter.java:115)
# at water.MRTask.compute2(MRTask.java:819)
# at water.H2O$H2OCountedCompleter.compute1(H2O.java:1707)
# at water.parser.parquet.FrameParquetExporter$PartExportParquetTask$Icer.compute1(FrameParquetExporter$PartExportParquetTask$Icer.java)
# at water.H2O$H2OCountedCompleter.compute(H2O.java:1703)
# at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
# at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
# at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
# at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
# at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
#
# Error: java.lang.NullPointerException

unlink("~/df", recursive = TRUE)

# delete the string column
df$C3 <- NULL

# success
h2o.exportFile(data = df,
               path = "df",
               format = "parquet",
               write_checksum = FALSE)

The text was updated successfully, but these errors were encountered:

krasinski · 2024-04-23T19:40:13Z

@hutch3232 thank you for reporting that issue! the fix will probably go in the next minor release

hutch3232 added the bug label Apr 17, 2024

wendycwong assigned krasinski Apr 22, 2024

krasinski added this to the 3.46.0.2 milestone Apr 23, 2024

krasinski added a commit that referenced this issue Apr 23, 2024

GH-16161 Fix parquet export NPE

120c3b2

krasinski added a commit that referenced this issue Apr 23, 2024

GH-16161 Fix parquet export NPE

6879ffd

krasinski mentioned this issue Apr 23, 2024

GH-16161 Fix parquet export NPE #16175

Merged

krasinski linked a pull request Apr 23, 2024 that will close this issue

GH-16161 Fix parquet export NPE #16175

Merged

valenad1 pushed a commit that referenced this issue May 3, 2024

GH-16161 Fix parquet export NPE

73bf877

valenad1 pushed a commit that referenced this issue May 7, 2024

GH-16161 Fix parquet export NPE (#16175)

e1ab25a

valenad1 closed this as completed May 7, 2024

valenad1 mentioned this issue May 9, 2024

GH-16161 add comparison to the test & dont write NA #16195

Merged

valenad1 linked a pull request May 9, 2024 that will close this issue

GH-16161 add comparison to the test & dont write NA #16195

Merged

valenad1 pushed a commit that referenced this issue May 11, 2024

GH-16161 - add comparison to the test & dont write NA (#16195)

a4b9436

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`h2o.exportFile` NPE with parquet "string"s #16161

`h2o.exportFile` NPE with parquet "string"s #16161

hutch3232 commented Apr 17, 2024

krasinski commented Apr 23, 2024

h2o.exportFile NPE with parquet "string"s #16161

h2o.exportFile NPE with parquet "string"s #16161

Comments

hutch3232 commented Apr 17, 2024

krasinski commented Apr 23, 2024

`h2o.exportFile` NPE with parquet "string"s #16161

`h2o.exportFile` NPE with parquet "string"s #16161