Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemeral autoscaling does not work, idle runners always at maximum #2449

Open
arthur-telia opened this issue Sep 22, 2022 · 8 comments
Open
Labels
help wanted Extra attention is needed stale:exempt

Comments

@arthur-telia
Copy link

arthur-telia commented Sep 22, 2022

We are trying to setup autoscaling using Ephemeral runners flag, It seems that the lambda function never scales runners down and always keeps the runners at the value of runners_maximum_count that we have set.

for example, if we set runners_maximum_count to 20, then exactly 20 will be idle all the time, no matter if we have any pending jobs or not

version is v1.8.1,
using prebuilt AWS AMI ubuntu-jammy-22.04-amd64-server

related configuration below

  # Uncomment to enable ephemeral runners
  delay_webhook_event      = 0
  enable_ephemeral_runners = true
  enabled_userdata         = false
  minimum_running_time_in_minutes = 10
  runners_maximum_count = 20
  scale_down_schedule_expression = "cron(* * * * ? *)"
  enable_job_queued_check = true
  idle_config = [{
    cron      = "* * 9-17 * * 1-5"
    timeZone  = "Europe/Amsterdam"
    idleCount = 3
  }]

could you please advice if this is a bug or misconfiguration from our side?
let me know if more information is required

@npalm
Copy link
Member

npalm commented Sep 22, 2022

idle_config is not meant for ephemeral runners. But for non ephemeral runners. The idle config is used by the scale down runner to decide to kill a runner or not. You can use the pool for having some warm runners in combination with ephemeral.

@arthur-telia
Copy link
Author

arthur-telia commented Sep 23, 2022

@npalm I've tried the pool block as well, but it always stays at the runners_maximum_count + the pool_config

  pool_runner_owner = "company"                  # Org to which the runners are added
  pool_config = [{
    size                = 5                    # size of the pool
    schedule_expression = "cron(* * * * ? *)"   # cron expression to trigger the adjustment of the pool
  }]

how do I scale the runners down when I have no pending jobs in the queue?

@npalm
Copy link
Member

npalm commented Sep 23, 2022

Creating a pool will create every time the function is triggered based on the schedule runners. It first looks up the number of idle runners, and next top up in your case to 5. The scale down lambda can shutdown you pool runners. Based on an minimal running time.

@erkexzcx
Copy link

Creating a pool will create every time the function is triggered based on the schedule runners. It first looks up the number of idle runners, and next top up in your case to 5. The scale down lambda can shutdown you pool runners. Based on an minimal running time.

Could you please rephrase your comment?

Please take a look at the config that we are using:

 # Uncomment to enable ephemeral runners
  delay_webhook_event      = 0
  enable_ephemeral_runners = true
  enabled_userdata         = false
  minimum_running_time_in_minutes = 5
  runners_maximum_count = 3
  enable_job_queued_check = true

  # Uncommet idle config to have idle runners from 9 to 5 in time zone Amsterdam
  #idle_config = [{
  #  cron      = "* * 9-17 * * 1-5"
  #  timeZone  = "Europe/Amsterdam"
  #  idleCount = 3
  #}]

  pool_runner_owner = "<redacted>"                  # Org to which the runners are added
  pool_config = [{
    size                = 3                    # size of the pool
    schedule_expression = "cron(* * * * ? *)"   # cron expression to trigger the adjustment of the pool
  }]
  scale_down_schedule_expression = "cron(*/5 * * * ? *)"

Result:
image

Can you clarify what cause 6 runners to be spawned if we set runners_maximum_count = 3? There are no jobs in queue, nothing. We don't see any reason why should be have 6 runners when we expect 0 (or at max 3).

Can you take a look at above provided config and clarify that this is valid?

We also understand that enable_ephemeral feature is in beta, but it seems it does not work completely in our case. We are not sure what might cause this behavior.

@M1kep
Copy link
Contributor

M1kep commented Oct 3, 2022

I've found that the pool lambda does not take into account the minimum start time. When it runs every minute, if the runner has not registered with Github in time, it'll "top up" the pool again. We've found that this happens 2 - 3 times, leading to the initial pool deployment to deploy 2x - 3x the pool size while ignoring runners_maximum_count

We were able to fix this by updating the pool lambda to use the minimum boot time when determining if the pool should be topped up, it's had limited testing, but I can submit a PR for review

I've attached a screenshot I have from the other day(Note that one of the arrows points to the wrong line, but the count and number of invocations is still notated properly)
CleanShot 2022-10-03 at 13 27 17

@npalm
Copy link
Member

npalm commented Oct 4, 2022

M1kep Would be great if you have time to submit a PR.

I thought the issue is the pool lambda is not seeing instances that are created but not ready yet. So assuming you running the pool lambda every minute, and creating an instance takes 2 minutes you can easy end up with 2 or 3 times the size of the pool

@piscue
Copy link
Contributor

piscue commented Oct 7, 2022

@M1kep @npalm faced the same issue. I had a monitor that controls the amount of EC2 spawn and never honored the runners_maximum_count. I did check the code, I was about to open the issue, but I found this one.

It will be nice if can understand/sum the running instances + the pending ones in order to understand the maximum you wanna run and keep costs under control

@guicaulada
Copy link

Was this fixed by #2801 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed stale:exempt
Projects
None yet
Development

No branches or pull requests

6 participants