Stocator should create folder names with a trailing '/' in IBM COS #210

mariobriggs · 2019-06-10T09:30:48Z

I am using Stocator via Spark to write a dataframe to IBM COS

df.write.parquet("cos://mybucket.service/tpcds/call_center")

in the above call, stocator creates the folder 'call_center' in IBM COS. However stocator does not create the folder name with a trailing '/' and as a result this messes up reading of these IBM COS folders when using other tools like Alluxio, CyberDuck etc.

Below is an example of the CyberDuck UI. Notice the folder 'call_center' is listed as a 0 byte sized file as well.

Browsing through the stocator code, i see the code commented out to create the foldername with a trailing '/' and using a build where it is uncommented solved the issue.

Look forward to a fix

The text was updated successfully, but these errors were encountered:

gilv · 2019-06-12T09:42:59Z

@mariobriggs I will handle this. Thanks

kozchris · 2019-08-15T17:09:41Z

This issue is also breaking our Apache Spark reads of part files. The Apache Spark writes of the part files are creating a 0 byte directory file with no trailing slash. When we add the ending slash to the directory file that gets created the reads work again.

kozchris · 2019-08-15T17:10:29Z

@gilv how is the progress coming on a fix?

rpatel17 · 2019-08-15T17:23:26Z

I am also seeing this as a problem in our project. Thanks @gilv for looking into it.

robin-sun · 2020-12-14T17:07:39Z

Is this issue fixed now after 16 months? I am still seeing an empty file being created.

gilv · 2020-12-15T05:46:57Z

@robin-sun why there is a problem with an empty file? if you write "foo" file with Stocator via Spark it will be

foo
foo/_SUCCESS
foo/part-1-xx
foo/part-2-xx
etc.

You can now use Spark to read "foo" again and all works. If you list object storage via CLI you will see empty file "foo" and "foo/_SUCCESS". Why this is a a problem?

robin-sun · 2020-12-15T10:30:31Z

Hi Gil,
This is causing errors when downloading the whole parent folder to a Windows OS as Windows doesn't support file/folder with the same name. I will have to download the output folder 1 by 1.

But I guess the question is really, why do we need an empty file if it is not used/useful at all.

mariobriggs · 2020-12-16T03:56:29Z

I think the real problem is this... if u wrote to COS using stocator, then u are forced that all your reader clients need to be using stocator as well. The latter is not under your control and therefore problematic. thanksMario ----- Original message -----From: Robin Sun <notifications@github.com>To: CODAIT/stocator <stocator@noreply.github.com>Cc: Mario Briggs <mario.briggs@in.ibm.com>, Mention <mention@noreply.github.com>Subject: [EXTERNAL] Re: [CODAIT/stocator] Stocator should create folder names with a trailing '/' in IBM COS (#210)Date: Tue, Dec 15, 2020 4:00 PM Hi Gil,This is causing errors when downloading the whole parent folder to a Windows OS as Windows doesn't support file/folder with the same name. I will have to download the output folder 1 by 1. But I guess the question is really, why do we need an empty file if it is not used/useful at all. —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.

robin-sun · 2020-12-16T09:46:47Z

Hi Mario/Gil

Could you help me understand, why do we need an empty file there?

gilv · 2020-12-22T17:06:25Z

@mariobriggs @robin-sun empty file name to simulate a folder in object storage is not invented by Stocator, but used in other Big Data systems. This is easiest way for Hadoop eco-system to mark a "folder".. So the compatibility with Windows indeed has issues with such approach. We need empty object since it has Stocator specific metadata. If you just need to download all data created by Stocator to Windows, then just write some script that will ignore empty objects.

mariobriggs changed the title ~~Stocator should create folder names with a trailing '/'~~ Stocator should create folder names with a trailing '/' in COS Jun 10, 2019

mariobriggs changed the title ~~Stocator should create folder names with a trailing '/' in COS~~ Stocator should create folder names with a trailing '/' in IBM COS Jun 10, 2019

gilv self-assigned this Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stocator should create folder names with a trailing '/' in IBM COS #210

Stocator should create folder names with a trailing '/' in IBM COS #210

mariobriggs commented Jun 10, 2019

gilv commented Jun 12, 2019

kozchris commented Aug 15, 2019

kozchris commented Aug 15, 2019

rpatel17 commented Aug 15, 2019

robin-sun commented Dec 14, 2020

gilv commented Dec 15, 2020 •

edited

robin-sun commented Dec 15, 2020

mariobriggs commented Dec 16, 2020 via email

robin-sun commented Dec 16, 2020

gilv commented Dec 22, 2020

Stocator should create folder names with a trailing '/' in IBM COS #210

Stocator should create folder names with a trailing '/' in IBM COS #210

Comments

mariobriggs commented Jun 10, 2019

gilv commented Jun 12, 2019

kozchris commented Aug 15, 2019

kozchris commented Aug 15, 2019

rpatel17 commented Aug 15, 2019

robin-sun commented Dec 14, 2020

gilv commented Dec 15, 2020 • edited

robin-sun commented Dec 15, 2020

mariobriggs commented Dec 16, 2020 via email

robin-sun commented Dec 16, 2020

gilv commented Dec 22, 2020

gilv commented Dec 15, 2020 •

edited