Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

object 'results' not found #334

Open
5 tasks done
ericchansen opened this issue Nov 28, 2018 · 8 comments
Open
5 tasks done

object 'results' not found #334

ericchansen opened this issue Nov 28, 2018 · 8 comments

Comments

@ericchansen
Copy link

ericchansen commented Nov 28, 2018

Before submitting a bug please check the following:

  • Start a new R session
  • Check your credentials file
  • Install the latest doAzureParallel package
  • Submit a minimal, reproducible example
  • run sessionInfo()

Updates

EDIT (11/29/2018) - Added additional examples, corrected a typo and improved formatting.

Description

Can someone please explain why I get the error object 'results' not found? Full code below.

Ideally, I need to return two objects from inside the loop. One is a data frame that gets row binded, and the other is a list that needs to become a list of lists. In the example below, I'm only returning the data frame (will work on adding the list as additional output once this issue is resolved).

Instruction to repro the problem if applicable

Example 1

remove.packages("rAzureBatch")
remove.packages("doAzureParallel")
devtools::install_github("Azure/rAzureBatch", force = TRUE)
devtools::install_github("Azure/doAzureParallel", force = TRUE)

library(doAzureParallel)
setVerbose(TRUE)

setCredentials(file.path(getwd(), "credentials.json"))
cluster <- makeCluster(file.path(getwd(), "cluster.json"), fullName=TRUE)
registerDoAzureParallel(cluster)
getDoParWorkers()

# my_results <- foreach(t = 1:3) %do% { # Works.
# my_results <- foreach(t = 1:3, .combine = 'rbind') %do% { # Works.
# my_results <- foreach(t = 1:3, .combine = 'rbind', .options.azure = list(autoDeleteJob = FALSE)) %dopar% { # Works.
# my_results <- foreach(t = 1:3, .options.azure = list(enableCloudCombine = FALSE, autoDeleteJob = FALSE)) %dopar% { # object 'results' not found
my_results <- foreach(t = 1:3, .combine = 'rbind', .options.azure = list(enableCloudCombine = FALSE, autoDeleteJob = FALSE)) %dopar% { # object 'results' not found
  my_results_df <- data.frame("x" = runif(2), "trial" = replicate(2, t))
  my_results_list <- runif(3)
  return(my_results_df)
}

sessionInfo()

Example 2 featuring superfluous use of setAutoDeleteJob(FALSE)

library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)

setCredentials(file.path(getwd(), "credentials.json"))
cluster <- makeCluster(file.path(getwd(), "cluster.json"), fullName=TRUE)
registerDoAzureParallel(cluster)
getDoParWorkers()

# my_results <- foreach(t = 1:3, .options.azure = list(enableCloudCombine = FALSE, autoDeleteJob = FALSE)) %dopar% { # object 'results' not found
# my_results <- foreach(t = 1:3, .options.azure = list(enableCloudCombine = FALSE)) %dopar% { # object 'results' not found
# my_results <- foreach(t = 1:3, .combine = 'rbind', .options.azure = list(enableCloudCombine = FALSE)) %dopar% { # object 'results' not found
my_results <- foreach(t = 1:3, .combine = 'rbind', .options.azure = list(enableCloudCombine = FALSE, autoDeleteJob = FALSE)) %dopar% { # object 'results' not found
  my_results_df <- data.frame("x" = runif(2), "trial" = replicate(2, t))
  my_results_list <- runif(3)
  return(my_results_df)
}

Output from sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doAzureParallel_0.7.2 iterators_1.0.9       foreach_1.4.5        
 [4] RevoUtilsMath_10.0.1  RevoUtils_10.0.7      RevoMods_11.0.0      
 [7] MicrosoftML_9.3.0     mrsdeploy_1.1.3       RevoScaleR_9.3.0     
[10] lattice_0.20-35       rpart_4.1-11         

loaded via a namespace (and not attached):
 [1] codetools_0.2-15       CompatibilityAPI_1.1.0 digest_0.6.13         
 [4] rAzureBatch_0.6.2      mime_0.5               bitops_1.0-6          
 [7] grid_3.4.3             R6_2.2.2               jsonlite_1.5          
[10] httr_1.3.1             curl_3.1               rjson_0.2.15          
[13] tools_3.4.3            RCurl_1.95-4.9         yaml_2.1.16           
[16] compiler_3.4.3         mrupdate_1.0.1       

Output from error

==============================================================================
Id: job20181128173916
chunkSize: 1
enableCloudCombine: FALSE
errorHandling: stop
wait: TRUE
autoDeleteJob: TRUE
==============================================================================
Submitting tasks (3/3)
Waiting for tasks to complete. . .
| Progress: 100.00% (3/3) | Running: 0 | Queued: 0 | Completed: 3 | Failed: 0 |
Tasks have completed. 
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : 
  object 'results' not found
Called from: e$fun(obj, substitute(ex), parent.frame(), e$data)
@Pullarg
Copy link

Pullarg commented Nov 29, 2018

Have a look at this issue 284, I have just been running into this myself and it seems using the option setAutoDeleteJob(FALSE) and .options.azure = list(enableCloudCombine = FALSE) will solve your issue. the link has more details but bassically you merge it yourself by reading from the blob storage directly.

@ericchansen
Copy link
Author

@Pullarg I tried my_results <- foreach(t = 1:3, .options.azure = list(enableCloudCombine = FALSE, autoDeleteJob = FALSE)) %dopar% { # object 'results' not found, which you can see in my original post. That should behave the same way as using setAutoDeleteJob(FALSE). That being said, I tested several variations with setAutoDeleteJob(FALSE) anyway (code below). All resulted in the same error (also shown below).

Error message

==============================================================================
Id: job20181129155730
chunkSize: 1
enableCloudCombine: FALSE
errorHandling: stop
wait: TRUE
autoDeleteJob: FALSE
==============================================================================
Submitting tasks (3/3)
Waiting for tasks to complete. . .
| Progress: 100.00% (3/3) | Running: 0 | Queued: 0 | Completed: 3 | Failed: 0 |
Tasks have completed. 
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : 
  object 'results' not found
Called from: e$fun(obj, substitute(ex), parent.frame(), e$data)

Sample code

library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)

setCredentials(file.path(getwd(), "credentials.json"))
cluster <- makeCluster(file.path(getwd(), "cluster.json"), fullName=TRUE)
registerDoAzureParallel(cluster)
getDoParWorkers()

# my_results <- foreach(t = 1:3, .options.azure = list(enableCloudCombine = FALSE, autoDeleteJob = FALSE)) %dopar% { # object 'results' not found
# my_results <- foreach(t = 1:3, .options.azure = list(enableCloudCombine = FALSE)) %dopar% { # object 'results' not found
# my_results <- foreach(t = 1:3, .combine = 'rbind', .options.azure = list(enableCloudCombine = FALSE)) %dopar% { # object 'results' not found
my_results <- foreach(t = 1:3, .combine = 'rbind', .options.azure = list(enableCloudCombine = FALSE, autoDeleteJob = FALSE)) %dopar% { # object 'results' not found

  my_results_df <- data.frame("x" = runif(2), "trial" = replicate(2, t))
  my_results_list <- runif(3)
  return(my_results_df)
}

@brnleehng
Copy link
Collaborator

Hi @ericchansen

If you remove the enableCloudCombine flag, you will get your results. The object 'result not found' occurs because no file is found on Azure Storage that contains the merged result (RDS file that contains all the tasks since enableCloudCombine is set to disable). I will add better error handling for this case.

Below: This example works

my_results <- foreach(t = 1:3, .combine = 'rbind') %dopar% {
  my_results_df <- data.frame("x" = runif(2), "trial" = replicate(2, t))
  my_results_list <- runif(3)
  return(my_results_df)
}

my_results

@ericchansen
Copy link
Author

@brnleehng Yep, that's what I've been doing.

I suppose I don't understand the use case for enableCloudCombine = FALSE. How should we be using this option?

The documentation doesn't have any clear examples besides what's mentioned here. Looking at that example, I feel like that would also trigger this error.

@Pullarg
Copy link

Pullarg commented Nov 30, 2018

@brnleehng I need to return a list rather than bind rows. Is there anyway to skip the merge step at all, as i am getting the same failure. I would like to just perform this on the head( non cloud side) by reading from the storage account directly.

Update:
Dumb question just needed to turn the result back into a list. , after having the rbind result i can convert from a data.frame to a list by using this statement split(rbind.df, seq(nrow(rbind.df)))

@brnleehng
Copy link
Collaborator

Hi @ericchansen

The case for enableCloudCombine = FALSE is to avoid merging all your resources onto one VM while the other VMs are in idle (Unless you are using autoscale). There are cases when your tasks are producing many/large files that the merge task can run out of memory causing your job to fail.

Hi @Pullarg,
You should use getJobResult function to download all the results locally and it will manually merge it as a list.

> getJobResult("job20181205211937")
Getting job results...
enableCloudCombine is set to FALSE, we will merge job result locally
[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 4

Thanks,
Brian

@angusrtaylor
Copy link
Contributor

I'm getting the same error even with enableCloudCombine = FALSE. In my code, I am not returning any results from the %dopar% block. Instead, I am just writing my result dataframe to disk. My code runs correctly but the error still appears. Is there a way to avoid this error when the code intentionally does not return a result?

@brnleehng
Copy link
Collaborator

Can you add NULL at the end of the %dopar% block?

I'm looking into fixing enableCloudCombine path = false.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants