Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Git PAT token not used when installing packages? #359

Open
p-smirnov opened this issue Jun 24, 2019 · 11 comments
Open

Git PAT token not used when installing packages? #359

p-smirnov opened this issue Jun 24, 2019 · 11 comments

Comments

@p-smirnov
Copy link

I am experiencing the known issue with autoscale and github package installation, where the error message is:

Error: HTTP error 403.
  API rate limit exceeded for 52.*******. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

  Rate limit remaining: 0/60
  Rate limit reset at: 2019-06-24 22:49:36 UTC

  To increase your GitHub API rate limit
  - Use `usethis::browse_github_pat()` to create a Personal Access Token.
  - Use `usethis::edit_r_environ()` and add the token as `GITHUB_PAT`.
Execution halted

However, I have set the githubAuthenticationToken in the credentials.json file. Is the environmental variable not yet set when the github install occurs with the packages are specified in the cluster.json file?

Possibly relevant: I am using a custom docker image (but I want to install the packages from git as I am iterating on package implementation).

I am not sure how to make a reproducible example, but it occurs when scaling up from 1 to ~400 nodes. Here is my cluster.json in case it helps to reproduce:

  "name": "psmirnov",
  "vmSize": "Standard_D2_v3",
  "maxTasksPerNode": 4,
  "poolSize": {
    "dedicatedNodes": {
      "min": 1,
      "max": 1
    },
    "lowPriorityNodes": {
      "min": 0,
      "max": 5000
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "bhklab/pharmacogx:v3",
  "rPackages": {
    "cran": ["MASS", "tictoc", "mvtnorm", "abind", "polynom", "memoise", "purrr", "matrixStats"],
    "github": ["bhklab/mCI", "bhklab/fastCI"],
    "bioconductor": []
  },
  "commandLine": [],
  "subnetId": ""
}
@minister3000
Copy link

I experience a similar behavior when not using a docker image. It appears that the github 'Personal Access Token' (PAT) is completely ignored even though it is set up correctly in the credentials file. Therefore I am not able to scale the project up without running into the 'API rate limit exceeded' issue described by p-smirnov above.
I confirmed my suspicion that the PAT entry in the credentials file is ignored by setting my github repository to 'private', after which the repo can no longer be installed on the Azure nodes even though the personal access token should allow precisely this. Any help on this issue is appreciated...

@brnleehng
Copy link
Collaborator

@p-smirnov @minister3000 I'm taking a look at this

@brnleehng
Copy link
Collaborator

When we migrated to docker containers, it looks like the PAT environment variable is not being passed to the container. Since we use the R in the container image, the container requires the environment variable to exist.

https://github.com/Azure/doAzureParallel/blob/master/R/utility-commands.R#L100-L138

@minister3000
Copy link

Thanks for looking into this. I should have been more specific: I am not using a custom docker image but 'rocker/tidyverse:lastest'. If I read your answer correctly the PAT variable is not passed to this container either? Is there another way to set the required environment variable, maybe through the cluster.json file?

@brnleehng
Copy link
Collaborator

Yes that is correct. The PAT variable is not being passed through container either. I will add a fix for adding the PAT variable to the current environment variables.

I will discuss with others on possibility on environment variables on cluster file.

@minister3000
Copy link

Thank you for confirming the issue and working on it. I assume private Github repositories can not be installed until this is fixed, and the maximum number of nodes is limited to 40 when using public repositories. (Github allows 60 unauthenticated requests per hour and I reach the limit with 40 nodes for whatever reason). Is there an estimated timeline to get the fix in place?

@brnleehng
Copy link
Collaborator

I have a working fix branch that you can use. My plan is to merge it on Monday to do further testing.

devtools::install_github("Azure/doAzureParallel", ref="fix/github-pat-token")

@Solfood
Copy link

Solfood commented Aug 2, 2019

Another issue being seen with this. Fetching private repository is working but package build is returning a node failure error.

─ building ‘demoRcpp_1.0.tar.gz’

g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c rcpp_hello_world.cpp -o rcpp_hello_world.o
g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c script.cpp -o script.o
g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o demoRcpp.so RcppExports.o rcpp_hello_world.o script.o -L/usr/local/lib/R/lib -lR
Error getting parent environment: there is no package called ‘BiocInstaller’

@minister3000
Copy link

I can confirm that the fix you provided is working and that the PAT is being passed to, and accepted by GitHub. I no longer hit GitHub's 60 unauthenticated requests threshold and am able to fetch from private repositories and install and run packages that rely on Rcpp. Thank you very much for providing a solution to this problem.

@p-smirnov
Copy link
Author

@brnleehng Thank you very much for the fix!

@englianhu
Copy link

I am experiencing the known issue with autoscale and github package installation, where the error message is:

Error: HTTP error 403.
  API rate limit exceeded for 52.*******. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

  Rate limit remaining: 0/60
  Rate limit reset at: 2019-06-24 22:49:36 UTC

  To increase your GitHub API rate limit
  - Use `usethis::browse_github_pat()` to create a Personal Access Token.
  - Use `usethis::edit_r_environ()` and add the token as `GITHUB_PAT`.
Execution halted

However, I have set the githubAuthenticationToken in the credentials.json file. Is the environmental variable not yet set when the github install occurs with the packages are specified in the cluster.json file?

Possibly relevant: I am using a custom docker image (but I want to install the packages from git as I am iterating on package implementation).

I am not sure how to make a reproducible example, but it occurs when scaling up from 1 to ~400 nodes. Here is my cluster.json in case it helps to reproduce:

  "name": "psmirnov",
  "vmSize": "Standard_D2_v3",
  "maxTasksPerNode": 4,
  "poolSize": {
    "dedicatedNodes": {
      "min": 1,
      "max": 1
    },
    "lowPriorityNodes": {
      "min": 0,
      "max": 5000
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "bhklab/pharmacogx:v3",
  "rPackages": {
    "cran": ["MASS", "tictoc", "mvtnorm", "abind", "polynom", "memoise", "purrr", "matrixStats"],
    "github": ["bhklab/mCI", "bhklab/fastCI"],
    "bioconductor": []
  },
  "commandLine": [],
  "subnetId": ""
}

refer to https://gist.github.com/Z3tt/3dab3535007acf108391649766409421#gistcomment-3746021, simple and awesome !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants