Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Warning message: In quit... system call failed: Cannot allocate memory #322

Open
ctlamb opened this issue Nov 6, 2018 · 15 comments
Open

Comments

@ctlamb
Copy link

ctlamb commented Nov 6, 2018

Some of my nodes are failing with this error:

Warning message:
In quit(save = "yes", status = workerErrorStatus, runLast = FALSE) :
system call failed: Cannot allocate memory

Does this mean I need a CPU with more memory?

@brnleehng
Copy link
Collaborator

Hi @ctlamb

Yes, this means you will need a CPU with more memory. I suggest measuring the memory usage for each task so you have a benchmark of what Azure VM to use.

Thanks
Brian

@ctlamb
Copy link
Author

ctlamb commented Nov 7, 2018

Excellent, will do. In the meantime, I tried to use a machine with slightly more memory ("vmSize" = "Standard_E4_v3"),, but I am running into the following error after I run foreach (this error doesn't occur with "vmSize" = "Standard_DS12_v2")

##Error: No automatic parser available for 7b/.

@brnleehng
Copy link
Collaborator

What region are you in? It could be possible that Standard_E4_v3 is not available in your region.
Is this happening during makeCluster?

@ctlamb
Copy link
Author

ctlamb commented Nov 7, 2018

I'm in West US. The error throws in foreach

It looks like my tasks only use a max of 8GB of RAM, so the 28GB of ram I had in the "Standard_DS12_v2" should've been plenty. hmmmm. Not sure what's going on here

Memory usage readout


                                                                                                            Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1                                                                             doAzureParallel::setCredentials(credentials)            0.005                0.0               0.0
2                                                                                     mod<-mod.files$FilePath[bp$model[i]]            0.000                0.0               0.0
3                                                                                       tile<-r.files$FilePath[bp$tile[i]]            0.000                0.0               0.0
4      doAzureParallel::getStorageFile(container="occmodels",blobPath=paste0(mod),downloadPath=paste0(mod),overwrite=TRUE)           49.721              190.6             190.6
5                                                                                                brt<-readRDS(paste0(mod))           10.233              665.8             665.8
6  doAzureParallel::getStorageFile(container="rastertiles",blobPath=paste0(tile),downloadPath=paste0(tile),overwrite=TRUE)          496.358             1996.0            1996.0
7                                                     unzip(paste0(tile),exdir=here::here(),junkpaths=TRUE,overwrite=TRUE)           27.612                0.0               0.0
8                                                    raster_data<-list.files(here::here(),pattern=".tif$",full.names=TRUE)            0.150                0.0               0.0
9                                                                                        STACK<-raster::stack(raster_data)            2.337                0.3               6.0
10                                                  STACK[["CutBlock_Occurrence"]]<-ratify(STACK[["CutBlock_Occurrence"]])            5.092                0.0            1161.7
11                                                                        STACK[["Fire_Occ"]]<-ratify(STACK[["Fire_Occ"]])            5.012                0.0            1161.7
12                                                                          STACK[["CRDP_LC"]]<-ratify(STACK[["CRDP_LC"]])            5.132                0.0            1161.7
13                                                                        STACK[["MODIS_LC"]]<-ratify(STACK[["MODIS_LC"]])            4.990                0.0            1161.7
14                                         pred<-dismo::predict(STACK,brt,n.trees=brt$gbm.call$best.trees,type="response")        22156.271              387.8            8056.5
15                                                                                                            return(pred)            0.000                0.0               0.0

@brnleehng
Copy link
Collaborator

Are you setting maxTasksPerNode greater than 1 in your cluster configuration?

@ctlamb
Copy link
Author

ctlamb commented Nov 8, 2018

No it's =1

clusterConfig <- list( "name" = "LambRaster", "vmSize" = "Standard_DS12_v2", "maxTasksPerNode" = 1, "poolSize" = list( "dedicatedNodes" = list( "min" = 1, "max" = 200 ), "lowPriorityNodes" = list( "min" = 0, "max" = 0 ), "autoscaleFormula" = "QUEUE" ), "containerImage" = "rocker/geospatial:latest", "rPackages" = list( "cran" = c("doParallel", "here", "dismo", "gbm", "snow"), "github" = c("Azure/doAzureParallel"), "bioconductor" = c() ), "commandLine" = list() )

@ctlamb
Copy link
Author

ctlamb commented Nov 8, 2018

Is there a better/preferred package I could use to measure the memory usage?

@ctlamb
Copy link
Author

ctlamb commented Nov 8, 2018

Now getting Error: No automatic parser available for 7b/. even when I use the D12 machine now. ugghh, always hard to trouble shoot one issue (memory) when another pops up. Any thoughts? I could start a new thread if its easier

@brnleehng
Copy link
Collaborator

I don't have a preferred package for measuring memory usage.
Where exactly is this error occuring? Is this when the foreach is getting results?

If you have a cluster configuration file and a reproducible sample, I will work on identifying the issue

@simon-tarr
Copy link

simon-tarr commented Nov 16, 2018

This is the same as issue #315.
I've spent many an hour pulling my hair out over this issue and I've no idea what's causing it. I've provided a lot of qualitative information in #315 but haven't had time to build a fully reproducible example at the scale which I think is generating the error.

@ctlamb is your workflow using resource files uploaded to Azure storage? My workflow is and I haven't been able to determine whether the 7b error still occurs when not using resource files. I'd like to attempt to rule out whether resource files could be contributing in some way.

@ctlamb
Copy link
Author

ctlamb commented Nov 16, 2018

Yes I am uploading and downloading data to Azure storage in my workflow. I do wonder if this was an internet issue? My internet speed was recently upgraded, and I haven't got the 7b error since..but thats only based on 5-10 different tries so far. Will update if anything changes

@simon-tarr
Copy link

Yes I am uploading and downloading data to Azure storage in my workflow. I do wonder if this was an internet issue? My internet speed was recently upgraded, and I haven't got the 7b error since..but thats only based on 5-10 different tries so far. Will update if anything changes

Thanks for the extra information. My latest post at #315 documents the return of the dreaded 7b error.

I considered your idea here as well. However, my university network is a gigabit connection and it's rock stable. My home internet is a 100Mb fire connection which is also super reliable (for the most part).

I wonder if there's a limit to the number of connections Batch/HTTR can accept from a single IP address? I'm currently running 2 pools on my laptop (home network) and three on my uni workstation all day and they've been stable all day. If I try and run any more pools than this on either machine, the 7b error will return almost instantly. It's very strange...

@brnleehng
Copy link
Collaborator

Are all of your workflows in interactive mode? (Waiting for the job to be done)

Thanks,
Brian

@simon-tarr
Copy link

Mine is, yes.

@simon-tarr
Copy link

Any news on the status of this error? It's still happening to me with frustrating regularity.

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants