feat!: autoscaler with scaling schedules #139

d-costa · 2024-01-03T18:11:46Z

what

Add the ability to use an autoscaler to scale down to zero outside the defined schedules.
No more than one instance will be created at the same time.
Since we can only attach autoscalers to non-stateful MIGs, this commit also removes the responsibility of creating the home folder disk (atlantis-disk-0) from it, effectively making it a stateless MIG. Nonetheless, destroying the group will not destroy the disk.
Add resources for the disk and the autoscaler, and a usage example. Updated the README.

BREAKING CHANGE: the 50GB stateful disk is no longer created by the MIG, which makes the MIG no longer stateful. Additionally, if terraform destroy is executed, the disk is destroyed.

why

Many times, teams only use Atlantis during certain week days and periods of the day (e.g. 8AM to 7PM). This feature allows the MIG to scale down to zero outside the defined periods. Scaling down during the weekends alone will reduce costs by ~28%.
Cost is usually the main concern when adopting Atlantis vs GitHub Actions.

Notes:

How responsive is the scaling with respect to the schedule?
- ~2-3 minutes to scale up, ~10 minutes to scale down after the window ends.
Will the MIG scale down even if an apply is executing?
- Yes. The plan becomes stale, and an atlantis plan and apply will fix the drift.
What happens if the instance is destroyed after a plan is calculated?
- When the instance is brought back up, the disk is attached and you can atlantis apply as usual.

Let me know if you find this useful, and whether it fits your vision for the module! 😄

Sidenote: We also tried to implement an on-demand scale up (to deploy an instance outside the schedules) using Monitoring metrics based on the load balancer, which is technically possible, but we were unsuccessful. While the group indeed scales from 0 to 1 when requests arrive, it never scales back down: in the absence of requests, the metric will keep the last value. For reference, we tried the following:

resource "google_compute_autoscaler" "default" {
  # ...
  autoscaling_policy {
    dynamic "metric" {
      for_each = var.autoscaling.scale_up_on_demand ? [
        # You can only use the AND operator for joining selectors. You can only use direct equality comparison operator (=) without any functions for each selector.
        # Metric types must be unique within the scaling configuration.
        {
          # Keep instance up when used
          name   = "loadbalancing.googleapis.com/https/request_bytes_count"
          filter = "metric.labels.response_code_class = \"200\" AND resource.type = \"https_lb_rule\" AND resource.labels.project_id = \"${var.project}\" AND resource.labels.forwarding_rule_name = \"${var.name}\""
          target = 1
        },
        {
          # Scale up when needed
          name   = "loadbalancing.googleapis.com/https/request_count"
          filter = "metric.labels.response_code = \"503\" AND resource.type = \"https_lb_rule\" AND resource.labels.project_id = \"${var.project}\" resource.labels.forwarding_rule_name = \"${var.name}\""
          target = 0.001
        }
      ] : []
      content {
        name   = metric.value.name
        target = metric.value.target
        type   = "DELTA_PER_SECOND"
        filter = metric.value.filter
      }
    }

references

Scaling schedules: https://cloud.google.com/compute/docs/autoscaler/scaling-schedules#schedule_configuration_options
google_compute_autoscaler resource: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_autoscaler#nested_scaling_schedules

bschaatsbergen · 2024-01-03T23:41:06Z

Thanks for opening this extensive PR @d-costa it looks really well on first sight - allow me to review it as soon as possible together again with the other PR (which I already reviewed but just need to merge). End of year was really hectic for me unfortunately!

d-costa · 2024-01-03T23:50:18Z

No problem at all! And happy new year 😄

bschaatsbergen · 2024-01-05T21:43:17Z

Happy new year and best wishes @d-costa 😃
Could you please resolve the conflicts @d-costa, I've just merged the Shared VPC PR from you (#137)

dgteixeira · 2024-03-26T14:36:57Z

hey @bschaatsbergen , how are you?
Any idea on when you might be able to review this and #131 ?

We are currently using a local version of this but we would love to point all the way to your upstream :)

bschaatsbergen · 2024-03-28T21:48:39Z

hey @bschaatsbergen , how are you? Any idea on when you might be able to review this and #131 ?

We are currently using a local version of this but we would love to point all the way to your upstream :)

Hi @dgteixeira, thanks for getting in touch. Now that the repository has been transferred to the runatlantis organization, and you, @d-costa, @DanielRieske, and @cblkwell are maintainers of this repository, I encourage you to collaborate on addressing these PRs together :)

cblkwell · 2024-04-01T13:49:34Z

This one looks good to me if we can resolve the conflicts -- once that is done I don't think I'll have a problem signing off.

cblkwell · 2024-04-02T13:39:22Z

Hrm. What's up with the ci? :/

Add the ability to use an autoscaler to scale down to zero outside the defined schedules. Only non-stateful MIGs can be used with autoscalers, so this commit also removes the responsibility of creating the home folder disk (atlantis-disk-0) from the MIG, effectively making it a stateless MIG. Nonetheless, destroying the group will not destroy the disk. Add resources for the disk and the autoscaler, and a usage example. Update the README. BREAKING CHANGE: the 50GB stateful disk is no longer created by the mig, which makes the mig no longer stateful. Additionally, if terraform destroy is executed, the disk is destroyed.

d-costa force-pushed the autoscaler branch from a370949 to 9233cca Compare January 5, 2024 21:54

d-costa force-pushed the autoscaler branch from 9233cca to 73bd364 Compare February 14, 2024 15:52

dgteixeira mentioned this pull request Mar 26, 2024

feat: add option to override image entrypoint and command #131

Open

d-costa force-pushed the autoscaler branch from 73bd364 to 0313b5e Compare April 1, 2024 15:29

d-costa requested a review from a team as a code owner April 1, 2024 15:29

github-actions bot added the documentation Improvements or additions to documentation label Apr 1, 2024

d-costa closed this Apr 6, 2024

d-costa reopened this Apr 6, 2024

d-costa self-assigned this Apr 7, 2024

d-costa force-pushed the autoscaler branch from 0313b5e to c03e6c8 Compare April 9, 2024 08:49

d-costa force-pushed the autoscaler branch from c03e6c8 to 69bf12c Compare April 9, 2024 08:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: autoscaler with scaling schedules #139

feat!: autoscaler with scaling schedules #139

d-costa commented Jan 3, 2024 •

edited

bschaatsbergen commented Jan 3, 2024 •

edited

d-costa commented Jan 3, 2024

bschaatsbergen commented Jan 5, 2024 •

edited

dgteixeira commented Mar 26, 2024

bschaatsbergen commented Mar 28, 2024

cblkwell commented Apr 1, 2024

cblkwell commented Apr 2, 2024

feat!: autoscaler with scaling schedules #139

Are you sure you want to change the base?

feat!: autoscaler with scaling schedules #139

Conversation

d-costa commented Jan 3, 2024 • edited

what

why

Notes:

references

bschaatsbergen commented Jan 3, 2024 • edited

d-costa commented Jan 3, 2024

bschaatsbergen commented Jan 5, 2024 • edited

dgteixeira commented Mar 26, 2024

bschaatsbergen commented Mar 28, 2024

cblkwell commented Apr 1, 2024

cblkwell commented Apr 2, 2024

d-costa commented Jan 3, 2024 •

edited

bschaatsbergen commented Jan 3, 2024 •

edited

bschaatsbergen commented Jan 5, 2024 •

edited