Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHES Runners at Enterprise Level support #1303

Open
axel3rd opened this issue Oct 19, 2021 · 27 comments
Open

GHES Runners at Enterprise Level support #1303

axel3rd opened this issue Oct 19, 2021 · 27 comments

Comments

@axel3rd
Copy link
Contributor

axel3rd commented Oct 19, 2021

Feature

GHES provides today Runners at enterprise level.

Having some pool usable by organizations (if enabled by orga owner in settings) can be helpful (project teams don't have to manage runners and scalability by them-self).

If it can be supported (as complement off enable_organization_runners parameter), it would be nice 😁.


Investigations

Currently if enable_organization_runners is used, the scale-up send as configuration:

{
    "environment": "github-runners-poc",
    "runnerServiceConfig": "--url https://github.company.com/some-org --token AAA[...] --labels ubuntu --runnergroup Default",
    "runnerOwner": "some-org",
    "runnerType": "Org"
}

When new Runner is added manually at Enterprise Level (https://github.company.com/enterprises/[enterprise-name]/settings/actions/runners), the configuration parameters are:

./config.sh --url https://github.company.com/enterprises/my-company-name --token BBB[...]

Even if userdata_template parameter is used with a full custom user-data.sh script where $CONFIG is not used and previous line ~hardcoded:

CONFIG=$(aws ssm get-parameters --names ${environment}-$INSTANCE_ID --with-decryption --region $REGION | jq -r ".Parameters | .[0] | .Value")

./config.sh --unattended --name $INSTANCE_ID --work "_work" $CONFIG

.... It doesn't work due to token validity during time, which provides after ~1 hour of usage:

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication

Http response code: Unauthorized from 'POST https://github.company.com/api/v3/actions/runner-registration'
{"message":"Token expired.","documentation_url":"https://docs.github.com/enterprise/3.2/rest"}
Response status code does not indicate success: 401 (Unauthorized).

Problems to solve

1° Parameter name:

A new enable_enterprise_runners, with company-name as value (=> string type), can be added.

It provides a config like:

{
    "environment": "github-runners-poc",
    "runnerServiceConfig": "--https://github.company.com/enterprises/my-company-name --token AAA[...] --labels ubuntu --runnergroup Default",
    "runnerOwner": "my-company-name",
    "runnerType": "Enterprise"
}

2° Retrieve token usable at enterprise level:

Retrieve a token usable for Runner at Enterprise Level (the one given when "New Runner" is clicked in Enterprise Settings) is perhaps not obvious, should be investigated.

@skyzyx
Copy link
Contributor

skyzyx commented Oct 20, 2021

I spent about 12 hours working through this same deployment, and ran into the exact same issue yesterday, trying to parse the differences between the current documentation and the GHES 3.2 docs.

Also to note: GHES 3.2 doesn't support the workflow_job event (had to fall back to check_run), nor the --ephemeral switch (although that's supposed to be coming in v3.3).

@axel3rd
Copy link
Contributor Author

axel3rd commented Oct 21, 2021

I will try to have info/help of GitHub team (version requirements, way to retrieve token, ...)

@npalm
Copy link
Member

npalm commented Oct 26, 2021

@jonico did you try to run the runners on enterprise level?

@jonico
Copy link
Contributor

jonico commented Nov 1, 2021

@npalm: I have tested other controllers with enterprise level scope but not your particular one (I have left GitHub and no longer have access to an enterprise account).

@npalm
Copy link
Member

npalm commented Nov 1, 2021

@npalm: I have tested other controllers with enterprise level scope but not your particular one (I have left GitHub and no longer have access to an enterprise account).

thx for the update

@axel3rd
Copy link
Contributor Author

axel3rd commented Nov 2, 2021

I have tested other controllers with enterprise level scope

@jonico : Off chance, do you remember which ones? (To deep dive the way how the token is created with manage_runners:enterprise scope). Many thanks.

@axel3rd
Copy link
Contributor Author

axel3rd commented Nov 3, 2021

NB1: When this issue will be solved, official GitHub documentation (Autoscaling with self-hosted runners > Recommended autoscaling solutions) should be updated 😁.


@jonico : Off chance, do you remember which ones?

NB2: actions-runner-controller/actions-runner-controller is probably a good sample 🤔

@jonico
Copy link
Contributor

jonico commented Nov 4, 2021

NB2: actions-runner-controller/actions-runner-controller is probably a good sample 🤔

This is the one where I tested the support.

@npalm
Copy link
Member

npalm commented Nov 9, 2021

@axel3rd did you got it all working, should we keep the issue open?

@axel3rd
Copy link
Contributor Author

axel3rd commented Nov 9, 2021

@axel3rd did you got it all working, should we keep the issue open?

@npalm no, it does not work. Issue should be kept open IMO.

Hard to find time from my side for the moment to investigate the way to:

  • generate a token with manage_runners:enterprise scope
  • find a convenient way to implement it from lambda

@npalm
Copy link
Member

npalm commented Nov 9, 2021

Are you using a service user? The module is set up to use an GitHub App. So no need to generate user tokens with scopes.

@axel3rd
Copy link
Contributor Author

axel3rd commented Nov 10, 2021

@npalm : I should have a try when time ; with service account or "personal" admin token ; to subscribe runner or generate a correct token for that.
Even if GitHub App is no more used for generating token at Enterprise level, the module mechanism to store tokens/credentials (SSM, ...) could be nice to adapt.

@axel3rd
Copy link
Contributor Author

axel3rd commented Dec 7, 2021

After find time and a exchange with GitHub support...

Autoscaling with self-hosted runners is only supported on GHES v3.3, so it is preferable to wait this release. (I suppose it is due to workflow_job:queued event support, better than check_run event only available in GHES v3.2.)

But both are supported by this scalability solution => can work 😁.

PoC which is working fine with GHES 3.2 :

  • Read and understand create a registration token for an enterprise
  • Generate an admin PAT with admin:enterprise permission (the only available in GHES v3.2, perhaps admin:enterprise/manage_runners:enterprise would be better when available in GHES ... v3.3?)
  • In the user-data.sh, customize/hard-code the actions-runner ./config.sh call with:
echo "TMP - Get a runner registration token..."
ENTERPRISE=my-company
ADMIN_PAT=ghp_TheCorrectAdminPAT
TOKEN_REGISTRATION=$(curl -fs -X POST -u token:$ADMIN_PAT -H "Accept: application/vnd.github.v3+json" ${ghes_url}/api/v3/enterprises/$ENTERPRISE/actions/runners/registration-token | jq -r ".token")

echo "Configuring GitHub Action Runner..."
sudo -iu $USER_NAME bash -c "cd actions-runner && ./config.sh --unattended --name $INSTANCE_ID --labels ubuntu,ubuntu-latest,ubuntu-20.04 --runnergroup Default --url ${ghes_url}/enterprises/$ENTERPRISE --token $TOKEN_REGISTRATION"

Note: It is just for a test, a sustainable implementation should have new parameters (proposal):

  • enable_enterprise_runners: Enable enterprise runner (same way as enable_organization_runners)
  • enterprise_admin_pat: Admin PAT for runner token registration
  • enterprise_id: Enterprise name, if cannot be retrieved automatically (to check)

With that, the original runner configuration (./config.sh --unattended --name "$instance_id" --work "_work" $${config}) can be reused, with config provided in SSM.

@axel3rd
Copy link
Contributor Author

axel3rd commented Jan 4, 2022

FI: With GHES v3.3 Global Webhook can be configured, on Workflow jobs events.

Due to enterprise_admin_pat requirement (to create a token for runner authentication), the "GitHub App" become probably useless "At Enterprise Level".

@npalm
Copy link
Member

npalm commented Jan 4, 2022

So for running on enterprise level the only option is using PAT?

@axel3rd
Copy link
Contributor Author

axel3rd commented Jan 4, 2022

So for running on enterprise level the only option is using PAT?

From my understanding, yes ; because Create a registration token for an enterprise requires an access token with the manage_runners:enterprise scope, which is not available on a GitHub App permissions.

@npalm
Copy link
Member

npalm commented Jan 4, 2022

I see, and we created the module with the vision to avoid using physical user accounts. But in this case, there seems indeed no other option. The Lambdas inside the module runners are creating all an authenticaation via an app installation. For supporting enterpise level runners a change is requied in those lambdas.

For scaling the module relies on triggers. A trigger can be sent via a webhook (org / repo) or via an app (via installation). I assume that you can configure on enterprise level a webhook as well?

@axel3rd
Copy link
Contributor Author

axel3rd commented Jan 4, 2022

I assume that you can configure on enterprise level a webhook as well?

Yes, but Global Webhook supports only a Secret as security mechanism, not a private key.

For supporting enterpise level runners a change is required in those lambdas.

Yes, but perhaps too in the webhook lambda depending the previous change...

The lambda environment variables will have some "pseudo" duplicate 😢, even if we can "replace" in the Terraform configuration:

  github_app = {
    key_base64     = "base64string"
    id             = "1"
    webhook_secret = "webhook_secret"
  }

By~:

  enterprise = {
    admin_pat      = "ghe_xxxx_with_'manage_runners:enterprise'_scope"
    id             = "my-company"
    webhook_secret = "global_webhook_secret"
  }

(the ~enable_enterprise_runners parameter could perhaps useless and implicit if some enterprise parameters are filled).

Currently I didn't try to remove completely the GitHub App for my tries ... these suppositions are "under investiguation" ^^ 😁.

@axel3rd
Copy link
Contributor Author

axel3rd commented Jan 14, 2022

Note: On GHES in the webhook Payload (Global or from GitHubApp), there is the enterprise id:

{
  "action": "queued",
  "workflow_job": {...}
  "repository": {...}
  "organization": {...}
  "enterprise": {
    "id": 1,
    "slug": "company",
    "name": "COMPANY",
    ...
  },

=> Could perhaps be stored in SQS and reused if enable_enterprise_runners=true (or some "Enterprise parameter(s)" defined which engage enterprise level support).

if (body.action === 'queued') {
await sendActionRequest({
id: body.workflow_job.id,
repositoryName: body.repository.name,
repositoryOwner: body.repository.owner.login,
eventType: githubEvent,
installationId: installationId,
});

@ScottGuymer
Copy link
Member

Would the enterprise id also be present for orgs that are part of an enterprise on GH SaaS?

I presume so as I think you can create enterprise level runners on GH SaaS too.

@axel3rd
Copy link
Contributor Author

axel3rd commented Jan 14, 2022

Would the enterprise id also be present for orgs that are part of an enterprise on GH SaaS?
I presume so as I think you can create enterprise level runners on GH SaaS too.

The global webhook documenation for enterprise-cloud and enterprise-server@3.3 are the ~same, so I guess yes to 👍 ; but I cannot verify by myself.

@axel3rd
Copy link
Contributor Author

axel3rd commented Apr 29, 2022

NB: Since runner v2.282.0, the --pat parameter has been introduced, helping to manage configuration of ephemeral runners (actions/runner#660):

 --pat                  GitHub personal access token used for checking network connectivity when executing `./run.sh --check`
 --ephemeral            Configure the runner to only take one job and then let the service un-configure the runner after the job finishes (default false)

This parameter can be used to authenticate runner at Enterprise level directly with an admin PAT during the configuration step:

./config.sh --unattended --name test --labels ubuntu-latest --runnergroup Default --url https://github.company.com/enterprises/my-company --pat ghp_XXXXXXXXXXXXXXXXXXXXXXXXx

Perhaps more simple than register a temporary admin token, but in this case the PAT is in clear in user-data.sh

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label May 30, 2022
@axel3rd
Copy link
Contributor Author

axel3rd commented May 30, 2022

This issue should stay opened IMO (PR will be provided once #1256 rebased).

@ScottGuymer ScottGuymer removed the Stale label May 30, 2022
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

@mariusfilipowski
Copy link
Contributor

Some very cool ideas for GHES. Looking if we can adapt this.
As I see all APIs for the lambdas should be available on the Enterprise level
With Global Webhooks we have the possibility to notify the lambdas, I've check the events, they are the same - only without the part with the installation id
And with an enterprise pat, we can get an temp registration token.

The only problem is, that this is a PAT ;-) and tied to a person
But I see no other option right now. Perhaps Fine grained PATs in GHES 3.10. But they are still PATs.

@npalm
Copy link
Member

npalm commented Nov 2, 2023

Would be great to got more support for GHES from the community. We are not on GHES so even cannot test the module for this. Any help welcome, including testing. Feel free to DM me in slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants