Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with JoblibParallelization #537

Open
cheginit opened this issue Jan 1, 2024 · 2 comments
Open

Issue with JoblibParallelization #537

cheginit opened this issue Jan 1, 2024 · 2 comments
Assignees

Comments

@cheginit
Copy link

cheginit commented Jan 1, 2024

The current implementation of JoblibParallelization that uses an already instantiated Parallel can lead to some issues such as this one. A better approach would be to take as input all Parallel args and instantiate it in the __call__ method. I tested this approach, and it works without any issue. I can open a PR if interested.

@blankjul
Copy link
Collaborator

blankjul commented Jan 10, 2024

Thanks for your feedback! Can you provide a small example here in this issue how the interface would look like?
A PR works too of course.

@blankjul blankjul self-assigned this Jan 10, 2024
@cheginit
Copy link
Author

Sure. This is what I've been using:

class JoblibParallelization:
    def __init__(
        self,
        n_jobs: int = -1,
        backend: Literal["loky", "threading", "multiprocessing"] = "loky",
        return_as: Literal["list", "generator"] = "list",
        verbose: int = 0,
        timeout: float | None = None,
        pre_dispatch: str | int = "2 * n_jobs",
        batch_size: int | Literal["auto"] = "auto",
        temp_folder: str | Path | None = None,
        max_nbytes: int | str | None = "1M",
        mmap_mode: Literal["r+", "r", "w+", "c"] | None = "r",
        prefer: Literal["processes", "threads"] | None = None,
        require: Literal["sharedmem"] | None = None,
        *args: Any,
        **kwargs: Any,
    ) -> None:
        self.n_jobs = n_jobs
        self.backend = backend
        self.return_as = return_as
        self.verbose = verbose
        self.timeout = timeout
        self.pre_dispatch = pre_dispatch
        self.batch_size = batch_size
        self.temp_folder = temp_folder
        self.max_nbytes = max_nbytes
        self.mmap_mode = mmap_mode
        self.prefer = prefer
        self.require = require
        super().__init__()

    def __call__(
        self,
        f: Callable[..., Any],
        X: Iterable[Any],
    ) -> list[Any] | Generator[Any, Any, None]:
        with joblib.Parallel(
            n_jobs=self.n_jobs,
            backend=self.backend,
            return_as=self.return_as,
            verbose=self.verbose,
            timeout=self.timeout,
            pre_dispatch=self.pre_dispatch,
            batch_size=self.batch_size,
            temp_folder=self.temp_folder,
            max_nbytes=self.max_nbytes,
            mmap_mode=self.mmap_mode,
            prefer=self.prefer,
            require=self.require,
        ) as parallel:
            return parallel(joblib.delayed(f)(x) for x in X)

It can be easily instantiated without any arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants