The Parallellzation is slow in Python using rpy2 #1052

geofly1985 · 2023-07-25T18:56:35Z

Describe the issue or bug

I've been using "doParalell" R library in the R code, it takes much shorter time in R to execute the code than I use rpy2 to run the same code using rpy2

To Reproduce

import rpy2.robjects as robjects
A = time.perf_counter()
robjects.r('''
        getPrimeNumbers <- function(n) {  
               n <- as.integer(n)
               if(n > 1e6) stop("n too large")
               primes <- rep(TRUE, n)
               primes[1] <- FALSE
               last.prime <- 2L
               for(i in last.prime:floor(sqrt(n)))
               {
                  primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
                  last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
               }
               which(primes)
            }
        
        library(doParallel)  
        cl <- detectCores() 
        registerDoParallel(cl)  
        result <- foreach(i=10:10000) %dopar% getPrimeNumbers(i)
        ''')
print("R code time is:", time.perf_counter()-A)

ask
has somebody else found the same issue as me ?

lgautier · 2023-07-29T13:42:06Z

I don't have doParallel. This seems surprising. Check that the code you run in R is strictly identical (including possible environment variables influencing how parallel clusters are created in R0, the load on the machine not related to this test is identical.

geofly1985 · 2023-07-31T15:12:17Z

It is the exactly the same R code, I just copied the R script to the robjects.r(''' ''').
The rpy2 are using the same R version too.

To teas out if this is due to the "doParallel", I did another test with pure for loop without using any parallelzation:

R script which returns me 355.575s:
proc.time() -> A

getPrimeNumbers <- function(n) {
n <- as.integer(n)
if(n > 1e6) stop("n too large")
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
for(i in last.prime:floor(sqrt(n)))
{
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
}
which(primes)
}

res<- c()
for(i in 10:20000) {res = c(res, getPrimeNumbers(i))}

proc.time()-A

Python script by using rpy2 which returns me 394.3394511249935s:
import rpy2.robjects as robjects
r = robjects.r
A = time.perf_counter()
r('''
getPrimeNumbers <- function(n) {
n <- as.integer(n)
if(n > 1e6) stop("n too large")
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
for(i in last.prime:floor(sqrt(n)))
{
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
}
which(primes)
}

    res<- c()
    for(i in 10:20000) {res = c(res, getPrimeNumbers(i))}
    ''')

print("R code time is:", time.perf_counter()-A)

lgautier · 2023-08-05T20:58:52Z

I simplified the example to use parallel. It appears equally slow to run in R and through rpy2. Are you certain that when you run it in R the parameter 10:10000 does not become 10:1000?

import time
import rpy2.robjects as robjects
A = time.perf_counter()
robjects.r('''
        getPrimeNumbers <- function(n) {  
               n <- as.integer(n)
               if(n > 1e6) stop("n too large")
               primes <- rep(TRUE, n)
               primes[1] <- FALSE
               last.prime <- 2L
               for(i in last.prime:floor(sqrt(n)))
               {
                  primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
                  last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
               }
               which(primes)
            }
        
        library(parallel)
        cl <- makeCluster(getOption("cl.cores", 2)) 
        result <- clusterApply(cl = cl, fun = getPrimeNumbers, 10:10000)
        ''')
print("R code time is:", time.perf_counter()-A)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Parallellzation is slow in Python using rpy2 #1052

The Parallellzation is slow in Python using rpy2 #1052

geofly1985 commented Jul 25, 2023 •

edited by lgautier

lgautier commented Jul 29, 2023

geofly1985 commented Jul 31, 2023

lgautier commented Aug 5, 2023

The Parallellzation is slow in Python using rpy2 #1052

The Parallellzation is slow in Python using rpy2 #1052

Comments

geofly1985 commented Jul 25, 2023 • edited by lgautier

lgautier commented Jul 29, 2023

geofly1985 commented Jul 31, 2023

lgautier commented Aug 5, 2023

geofly1985 commented Jul 25, 2023 •

edited by lgautier