You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Actual behavior
When exporting parquets, if any of the columns are of type "string" h2o will generally return a Null Pointer Exception. I did find occasionally it will work, in particular if there aren't very many rows, but we can see in my example it fails even when there are just 100.
Expected behavior h2o should be able to export "strings". If that isn't technically possible, maybe it should catch that and let the user know what columns are troublesome.
Steps to reproduce
library(h2o)
h2o.init()
df<- h2o.createFrame(rows=100,
cols=10,
string_fraction=0.1, # create one string columnseed=5,
seed_for_column_types=25)
h2o.describe(df)
# Label Type Missing Zeros PosInf NegInf Min Max Mean Sigma Cardinality# 1 C1 real 1 0 0 0 -96.26020 98.72199 10.34758802 60.8078989 NA# 2 C2 real 1 0 0 0 -99.25493 97.81689 -3.39471050 58.8410834 NA# 3 C3 string 2 0 0 0 NaN NaN NA NA NA# 4 C4 enum 2 1 0 0 0.00000 99.00000 NA NA 100# 5 C5 real 4 0 0 0 -96.08773 98.90815 7.21620958 58.6954135 NA# 6 C6 int 1 96 0 0 0.00000 1.00000 0.03030303 0.1722922 NA# 7 C7 int 0 0 0 0 -98.00000 96.00000 -0.87000000 56.4286722 NA# 8 C8 int 1 0 0 0 -100.00000 98.00000 -6.07070707 58.9779019 NA# 9 C9 real 1 0 0 0 -98.43982 97.66665 -7.07826544 55.5486277 NA# 10 C10 enum 0 2 0 0 0.00000 98.00000 NA NA 100
h2o.exportFile(data=df,
path="df",
format="parquet",
write_checksum=FALSE)
# java.lang.NullPointerException## java.lang.NullPointerException# at water.parser.parquet.FrameParquetExporter$PartExportParquetTask.map(FrameParquetExporter.java:115)# at water.MRTask.compute2(MRTask.java:819)# at water.H2O$H2OCountedCompleter.compute1(H2O.java:1707)# at water.parser.parquet.FrameParquetExporter$PartExportParquetTask$Icer.compute1(FrameParquetExporter$PartExportParquetTask$Icer.java)# at water.H2O$H2OCountedCompleter.compute(H2O.java:1703)# at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)# at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)# at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)# at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)# at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)## Error: java.lang.NullPointerException
unlink("~/df", recursive=TRUE)
# delete the string columndf$C3<-NULL# success
h2o.exportFile(data=df,
path="df",
format="parquet",
write_checksum=FALSE)
The text was updated successfully, but these errors were encountered:
H2O version, Operating System and Environment
Actual behavior
When exporting parquets, if any of the columns are of type "string"
h2o
will generally return a Null Pointer Exception. I did find occasionally it will work, in particular if there aren't very many rows, but we can see in my example it fails even when there are just 100.Expected behavior
h2o
should be able to export "strings". If that isn't technically possible, maybe it should catch that and let the user know what columns are troublesome.Steps to reproduce
The text was updated successfully, but these errors were encountered: