Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doParallel doesn't work with Azure Compute cluster #413

Open
Omi1906 opened this issue Mar 10, 2021 · 0 comments
Open

doParallel doesn't work with Azure Compute cluster #413

Omi1906 opened this issue Mar 10, 2021 · 0 comments

Comments

@Omi1906
Copy link

Omi1906 commented Mar 10, 2021

Describe the bug
We want to use the R package - doParallel in order to use most of the cores on a node of Azure Compute cluster. However when we start an experiment by submitting the job on compute cluster, the experiment fails with the following message:

Failed to create bus connection: No such file or directory
Error in serialize(data, node$con) : error writing to connection
Calls: train ... postNode -> sendData -> sendData.SOCKnode -> serialize
Execution halted

To Reproduce
Steps to reproduce the behavior:

  1. Identify the cores available in the compute cluster
  2. Register half of the available cores for parallel processing.
  3. Run xgboost training in parallel
    Code is attached for additional insights.

Expected behavior
doParallel package is able to execute the xgboost training in parallel and the results should be obtained much faster than a result obtained through training on a single core.

Additional context
Based on other answers found over the internet, it looks like the problem was related to service socket bus but I am not sure how it is configured for a compute cluster.

TrainingScript.txt
Estimator.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant