Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Parallellzation is slow in Python using rpy2 #1052

Open
geofly1985 opened this issue Jul 25, 2023 · 3 comments
Open

The Parallellzation is slow in Python using rpy2 #1052

geofly1985 opened this issue Jul 25, 2023 · 3 comments

Comments

@geofly1985
Copy link

geofly1985 commented Jul 25, 2023

Describe the issue or bug

I've been using "doParalell" R library in the R code, it takes much shorter time in R to execute the code than I use rpy2 to run the same code using rpy2

To Reproduce

import rpy2.robjects as robjects
A = time.perf_counter()
robjects.r('''
        getPrimeNumbers <- function(n) {  
               n <- as.integer(n)
               if(n > 1e6) stop("n too large")
               primes <- rep(TRUE, n)
               primes[1] <- FALSE
               last.prime <- 2L
               for(i in last.prime:floor(sqrt(n)))
               {
                  primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
                  last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
               }
               which(primes)
            }
        
        library(doParallel)  
        cl <- detectCores() 
        registerDoParallel(cl)  
        result <- foreach(i=10:10000) %dopar% getPrimeNumbers(i)
        ''')
print("R code time is:", time.perf_counter()-A)

ask
has somebody else found the same issue as me ?

@lgautier
Copy link
Member

I don't have doParallel. This seems surprising. Check that the code you run in R is strictly identical (including possible environment variables influencing how parallel clusters are created in R0, the load on the machine not related to this test is identical.

@geofly1985
Copy link
Author

It is the exactly the same R code, I just copied the R script to the robjects.r(''' ''').
The rpy2 are using the same R version too.

To teas out if this is due to the "doParallel", I did another test with pure for loop without using any parallelzation:

R script which returns me 355.575s:
proc.time() -> A

getPrimeNumbers <- function(n) {
n <- as.integer(n)
if(n > 1e6) stop("n too large")
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
for(i in last.prime:floor(sqrt(n)))
{
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
}
which(primes)
}

res<- c()
for(i in 10:20000) {res = c(res, getPrimeNumbers(i))}

proc.time()-A

Python script by using rpy2 which returns me 394.3394511249935s:
import rpy2.robjects as robjects
r = robjects.r
A = time.perf_counter()
r('''
getPrimeNumbers <- function(n) {
n <- as.integer(n)
if(n > 1e6) stop("n too large")
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
for(i in last.prime:floor(sqrt(n)))
{
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
}
which(primes)
}

    res<- c()
    for(i in 10:20000) {res = c(res, getPrimeNumbers(i))}
    ''')

print("R code time is:", time.perf_counter()-A)

@lgautier
Copy link
Member

lgautier commented Aug 5, 2023

I simplified the example to use parallel. It appears equally slow to run in R and through rpy2. Are you certain that when you run it in R the parameter 10:10000 does not become 10:1000?

import time
import rpy2.robjects as robjects
A = time.perf_counter()
robjects.r('''
        getPrimeNumbers <- function(n) {  
               n <- as.integer(n)
               if(n > 1e6) stop("n too large")
               primes <- rep(TRUE, n)
               primes[1] <- FALSE
               last.prime <- 2L
               for(i in last.prime:floor(sqrt(n)))
               {
                  primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
                  last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
               }
               which(primes)
            }
        
        library(parallel)
        cl <- makeCluster(getOption("cl.cores", 2)) 
        result <- clusterApply(cl = cl, fun = getPrimeNumbers, 10:10000)
        ''')
print("R code time is:", time.perf_counter()-A)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants