Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use append mode when writing spark dataframe on Watson Studio #197

Open
charles2588 opened this issue Jun 20, 2018 · 1 comment
Open

Comments

@charles2588
Copy link

Write the file once

df_data_1.write.format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
              .option("codec", "org.apache.hadoop.io.compress.GzipCodec")\
              .mode("append")\
              .save(cos.url('TESTAPPEND/CARS', 'catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a'))

First append mode write is successful.
and then
#Lets write again in append mode and it fails

df_data_1.write.format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
              .option("codec", "org.apache.hadoop.io.compress.GzipCodec")\
              .mode("append")\
              .save(cos.url('TESTAPPEND/CARS', 'catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a'))

Py4JJavaError: An error occurred while calling o161.save.
: org.apache.hadoop.fs.FileAlreadyExistsException: mkdir on existing directory cos://catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a.os_a9bbfb9f99684afe9ec11076b75f1831_configs/TESTAPPEND/CARS
at com.ibm.stocator.fs.ObjectStoreFileSystem.mkdirs(ObjectStoreFileSystem.java:453)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:118)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp

Full notebook:-
https://dataplatform.ibm.com/analytics/notebooks/v2/ec6f5fd0-6141-493c-b2cc-979a9b312393/view?access_token=3b6130f2206249bd03795f932ca3ad30f110321a3086dac07ee4d3eb4d4cbe56

Looking at the append method in the connector code, i see append is not supported.
https://github.com/CODAIT/stocator/blob/0866ef099c838efbfe46e7ad6a036ecfbed2012d/src/main/java/com/ibm/stocator/fs/ObjectStoreFileSystem.java

 public FSDataOutputStream append(Path f, int bufferSize,
      Progressable progress) throws IOException {
    throw new IOException("Append is not supported in the object storage");
  }

If append is not supported, is there a workaround or may be the connector should throw that append is not supported rather than above error.

@gilv
Copy link
Contributor

gilv commented Jun 21, 2018

@charles2588 thanks for reporting this. In general append + object storage is usually a bad idea, no matter which connector you use. I will review the issue you observed to better understand the root cause and to propose the best solution to resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants