Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster roll failure when 2 or more VNGs are updated at once #447

Open
dmitrykruglov opened this issue Jul 14, 2023 · 2 comments
Open

Cluster roll failure when 2 or more VNGs are updated at once #447

dmitrykruglov opened this issue Jul 14, 2023 · 2 comments
Labels
kind/bug Categorizes an issue or PR as related to a bug.

Comments

@dmitrykruglov
Copy link

Description

Hello,

We have 2 VNGs (spotinst_ocean_aws_launch_spec) that have should_roll feature enabled (in order to automate cluster/VNG roll when configuration changes).
When updating two VNGs at once in 1 terraform apply (for example AMI ID change), terraform fails with an error "Can't have 2 Rolls at the same time. Please stop the previous one". This is one of the reasons why we had to stop using VNGs for now and only use the default VNG to avoid this problem..

Terraform Version

1.3.9

Affected Resource(s)

spotinst_ocean_aws_launch_spec

Terraform Configuration Files

module "ocean-aws-k8s-vng_stateless" {
   source = "spotinst/ocean-aws-k8s-vng/spotinst"

   name = "stateless-group" # Name of VNG in Ocean
   ocean_id = local.ocean_id

   image_id = "ami-07bccaac087171156"
   labels = [{key="type",value="stateless"}]
   spot_percentage = 100 # Change the spot %

   should_roll = true
 }

 ## Create additional Ocean Virtual Node Group (launchspec) ##
 module "ocean-aws-k8s-vng_stateful" {
   source = "spotinst/ocean-aws-k8s-vng/spotinst"

   name = "stateful-group"  # Name of VNG in Ocean
   ocean_id = local.ocean_id

   image_id = "ami-07bccaac087171156"
   labels = [{key="type",value="stateful"}]
   taints = [{key="type",value="stateful",effect="NoSchedule"}]
   spot_percentage = 0
   #instance_types = ["g4dn.xlarge","g4dn.2xlarge"] # Limit VNG to specific instance types

   should_roll = true
 }

Debug Output

deployment/191/default/spotio": exit status 1
Dynamic environment variables added:
_PASS

module.ocean-aws-k8s-vng_stateless.spotinst_ocean_aws_launch_spec.nodegroup: Modifying... [id=ols-*******1]
module.ocean-aws-k8s-vng_stateful.spotinst_ocean_aws_launch_spec.nodegroup: Modifying... [id=ols-*******2]
module.ocean-aws-k8s-vng_stateful.spotinst_ocean_aws_launch_spec.nodegroup: Modifications complete after 1s [id=ols-*******2]
╷
│ Error: onRoll() -> Roll failed for cluster [ols-*******1], error: POST https://api.spotinst.io/ocean/aws/k8s/cluster/ols-*******1/roll?accountId=act-******: 400 (request: "32217267-9bdb-463a-ad6b-fc1440a6018a") CLUSTER_ROLL_ALREADY_IN_PROGRESS: Can't have 2 Rolls at the same time. Please stop the previous one.
│ 
│ 
│   with module.ocean-aws-k8s-vng_stateless.spotinst_ocean_aws_launch_spec.nodegroup,
│   on .terraform/modules/ocean-aws-k8s-vng_stateless/main.tf line 2, in resource "spotinst_ocean_aws_launch_spec" "nodegroup":
│    2: resource "spotinst_ocean_aws_launch_spec" "nodegroup" {
│ 

Expected Behavior

Terraform shouldn't crash with an error.
Cluster roll either needs to complete just once, applying changes to both VNGs, or VNGs need to roll independently at the same time.

Actual Behavior

Terraform crashes with the error "Can't have 2 Rolls at the same time" and fails to roll/apply changes to one of the VNGs.

Steps to Reproduce

  • Create 2 VNGs using spotinst/ocean-aws-k8s-vng/spotinst module with should_roll = true.
  • Update image_id to a different image
  • terraform apply
@dmitrykruglov dmitrykruglov added the kind/bug Categorizes an issue or PR as related to a bug. label Jul 14, 2023
@ilijad1
Copy link

ilijad1 commented Sep 1, 2023

@dmitrykruglov I got the same issue when trying to upgrade multiple VNGs at once, and i believe it needs to be fixed or well documented in the provider Terraform docs.

If you want to rollout more than one VNG at the same time, you should do that from the Ocean cluster level (example below):

resource "spotinst_ocean_aws" "ocean_cluster" {
  count                = ..........
  name                 = ..........
  controller_id        = ..........
  region               = ..........
  image_id             = ..........
  iam_instance_profile = ..........
  desired_capacity = ..........
  min_size         = ..........
  max_size         = ..........
  security_groups = []
  subnet_ids           = ..........
  key_name             = ..........
  
  update_policy {
    should_roll      = true 
    conditioned_roll = true|false
    auto_apply_tags  = true

    roll_config {
      batch_size_percentage        = 33
      launch_spec_ids              = ["ols-a0b****1", "ols-a0b****1"]
      batch_min_healthy_percentage = 20
      respect_pdb                  = true
    }
  }
   
  autoscaler {}
}

I managed to test this and it works perfectly fine for a list of VNGs.

The ocean_cluster documentation has the details for the configuration: https://registry.terraform.io/providers/spotinst/spotinst/latest/docs/resources/ocean_aws#update-policy

@chandra1-n chandra1-n removed their assignment Apr 15, 2024
@sharadkesarwani
Copy link
Contributor

Hi @dmitrykruglov
The error you encountered while updating 2 vngs is intended.
In order to update 2 or more vngs you can configure "update_policy" in cluster config and can pass list of vng_ids as shown in snippet below.

update_policy {
should_roll = true
roll_config {
batch_size_percentage = 33
launch_spec_ids = ["ols-a0b1", "ols-a0b1"]
batch_min_healthy_percentage = 20
respect_pdb = true
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants