Spark standalone write permissions failure. #3376

trhallam · 2023-09-07T17:23:53Z

My issue is exactly the same as #3284.

To expand on the non-response of the previous issue, my workflow is as follows.

I have HostA with users and Rstudio, this host is used to connect to a cluster on HostB with a running spark standalone deployment.

The standalone deployment is started by user spark. Hence the spark processes are running as spark on the master and worker nodes.

The user can connect to the host ok, starting the sparklyr application and data can be loaded from disk, as long as the spark user has read permissions to the data location.

When using the spark_write_csv command, spark first creates a folder to hold the output files giving file permissions rwxr-x--- and the file belongs to the user who initiated the app on HostA. The spark_write_csv process fails due to inadequate permissions.

From what I understand, you cannot use proxy-user with Spark submit on the Stand-alone deployment, so the solution seems to be to make the owner of the output directory the same as the user which initiates the spark cluster.

Error msg:

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 15) (10.1.0.175 executor 0): java.io.IOException: Mkdirs failed to create file:/work/shared/airlines.csv/_temporary/0/_temporary/attempt_202309071137551599550320799742752_0006_m_000000_15 (exists=false, cwd=file:/opt/spark/spark-3.4.1-bin-hadoop3/work/app-20230907112814-0015/0)

Session Info:

> sparklyr::spark_version(sc)
[1] ‘3.4.1’
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8   
 [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C        
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.4.3  dplyr_1.1.2    sparklyr_1.8.2

loaded via a namespace (and not attached):
 [1] pillar_1.9.0       compiler_4.2.3     dbplyr_2.3.3       base64enc_0.1-3    tools_4.2.3        digest_0.6.33     
 [7] uuid_1.1-1         jsonlite_1.8.7     lifecycle_1.0.3    tibble_3.2.1       gtable_0.3.4       pkgconfig_2.0.3   
[13] rlang_1.1.1        DBI_1.1.3          cli_3.6.1          rstudioapi_0.15.0  yaml_2.3.7         parallel_4.2.3    
[19] withr_2.5.0        httr_1.4.7         generics_0.1.3     vctrs_0.6.3        askpass_1.1        grid_4.2.3        
[25] tidyselect_1.2.0   glue_1.6.2         R6_2.5.1           fansi_1.0.4        tidyr_1.3.0        purrr_1.0.2       
[31] magrittr_2.0.3     scales_1.2.1       ellipsis_0.3.2     nycflights13_1.0.2 colorspace_2.1-0   config_0.3.1      
[37] utf8_1.2.3         openssl_2.1.0      munsell_0.5.0

The text was updated successfully, but these errors were encountered:

edgararuiz · 2023-09-09T22:17:24Z

Hi @trhallam , thank you for the clarification on that. The only thing that I'm not sure of is if there is something for me to do in sparklyr to help. It seems like a current limitation of Spark. Is that right?

trhallam · 2023-09-09T23:43:27Z

Tbh, I'm not entirely sure. I'm hoping there is a way I can better manage this, or a setting I can pass to spark via Sparklyr. It may be as you say, that I need to push this issue upstream to Spark itself, but it's not clear from the error for me which part of the Spark code base is causing the issue. My best guess is some of the Java routines which Sparklyr calls indirectly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark standalone write permissions failure. #3376

Spark standalone write permissions failure. #3376

trhallam commented Sep 7, 2023

edgararuiz commented Sep 9, 2023

trhallam commented Sep 9, 2023

Spark standalone write permissions failure. #3376

Spark standalone write permissions failure. #3376

Comments

trhallam commented Sep 7, 2023

edgararuiz commented Sep 9, 2023

trhallam commented Sep 9, 2023