Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spot instances block_duration_minutes is not supported in AWS anymore? #596

Open
fzipi360 opened this issue Jan 27, 2023 · 4 comments
Open

Comments

@fzipi360
Copy link

🗣️ Foreword

Thank you for taking the time to fill out this bug report fully. Without it, we may not be able to fix the bug, and the issue may be closed without resolution.

👻 Brief Description

Per this AWS documentation, the block_duration_minutes is not available anymore (don't know if only on new accounts).

One course of action here is to use without altering too much is to do some date math and push the same time delta in the block_duration_minutes to the valid_until method. Like, for example, valid_until = Time.now.advance(minutes: block_duration_minutes), and then pass that to the corresponding object.

Version

❯ ruby --version
ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [x86_64-darwin22]
❯ bundle install
Using concurrent-ruby 1.1.9
Using i18n 1.8.10
Using minitest 5.14.4
Using tzinfo 2.0.4
Using zeitwerk 2.4.2
Using activesupport 6.1.4.1
Using public_suffix 3.1.1
Using addressable 2.5.2
Using ast 2.4.2
Using aws-eventstream 1.2.0
Using aws-partitions 1.701.0
Using aws-sigv4 1.5.2
Using jmespath 1.6.2
Using aws-sdk-core 3.170.0
Using aws-sdk-alexaforbusiness 1.50.0
Using aws-sdk-amplify 1.32.0
Using aws-sdk-apigateway 1.67.0
Using aws-sdk-apigatewayv2 1.36.0
Using aws-sdk-applicationautoscaling 1.51.0
Using aws-sdk-athena 1.41.0
Using aws-sdk-autoscaling 1.63.0
Using aws-sdk-batch 1.47.0
Using aws-sdk-budgets 1.41.0
Using aws-sdk-cloudformation 1.58.0
Using aws-sdk-cloudfront 1.56.0
Using aws-sdk-cloudhsm 1.33.0
Using aws-sdk-cloudhsmv2 1.36.0
Using aws-sdk-cloudtrail 1.38.0
Using aws-sdk-cloudwatch 1.55.0
Using aws-sdk-cloudwatchevents 1.46.0
Using aws-sdk-cloudwatchlogs 1.45.0
Using aws-sdk-codecommit 1.45.0
Using aws-sdk-codedeploy 1.43.0
Using aws-sdk-codepipeline 1.47.0
Using aws-sdk-cognitoidentity 1.31.0
Using aws-sdk-cognitoidentityprovider 1.53.0
Using aws-sdk-configservice 1.66.0
Using aws-sdk-costandusagereportservice 1.34.0
Using aws-sdk-databasemigrationservice 1.53.0
Using aws-sdk-dynamodb 1.63.0
Using aws-sdk-ec2 1.361.0
Using aws-sdk-ecr 1.47.0
Using aws-sdk-ecrpublic 1.6.0
Using aws-sdk-ecs 1.85.0
Using aws-sdk-efs 1.45.0
Using aws-sdk-eks 1.63.0
Using aws-sdk-elasticache 1.62.0
Using aws-sdk-elasticbeanstalk 1.45.0
Using aws-sdk-elasticloadbalancing 1.34.0
Using aws-sdk-elasticloadbalancingv2 1.68.0
Using aws-sdk-elasticsearchservice 1.56.0
Using aws-sdk-eventbridge 1.24.0
Using aws-sdk-firehose 1.42.0
Using aws-sdk-glue 1.88.0
Using aws-sdk-guardduty 1.48.0
Using aws-sdk-iam 1.61.0
Using aws-sdk-kafka 1.41.0
Using aws-sdk-kinesis 1.35.0
Using aws-sdk-kms 1.49.0
Using aws-sdk-lambda 1.69.0
Using aws-sdk-mq 1.40.0
Using aws-sdk-networkfirewall 1.8.0
Using aws-sdk-networkmanager 1.14.0
Using aws-sdk-organizations 1.59.0
Using aws-sdk-ram 1.26.0
Using aws-sdk-rds 1.127.0
Using aws-sdk-redshift 1.69.0
Using aws-sdk-route53 1.55.0
Using aws-sdk-route53domains 1.33.0
Using aws-sdk-route53resolver 1.30.0
Using aws-sdk-s3 1.103.0
Using aws-sdk-secretsmanager 1.46.0
Using aws-sdk-securityhub 1.53.0
Using aws-sdk-servicecatalog 1.60.0
Using aws-sdk-ses 1.41.0
Using aws-sdk-shield 1.41.0
Using aws-sdk-signer 1.32.0
Using aws-sigv2 1.1.0
Using aws-sdk-simpledb 1.29.0
Using aws-sdk-sms 1.32.0
Using aws-sdk-sns 1.45.0
Using aws-sdk-sqs 1.44.0
Using aws-sdk-ssm 1.119.0
Using aws-sdk-states 1.39.0
Using aws-sdk-transfer 1.34.0
Using multipart-post 2.1.1
Using faraday 0.17.4
Using unf_ext 0.0.8
Using unf 0.1.4
Using domain_name 0.5.20190701
Using http-cookie 1.0.4
Using faraday-cookie_jar 0.0.7
Using timeliness 0.3.10
Using ms_rest 0.7.6
Using ms_rest_azure 0.12.0
Using azure_graph_rbac 0.17.2
Using azure_mgmt_key_vault 0.17.7
Using azure_mgmt_resources 0.18.2
Using azure_mgmt_security 0.19.0
Using azure_mgmt_storage 0.23.0
Using bcrypt_pbkdf 1.0.0
Using bundler 2.1.4
Using fuzzyurl 0.9.0
Using tomlrb 1.3.0
Using mixlib-config 2.2.18
Using mixlib-shellout 2.4.4
Using chef-config 13.7.16
Using libyajl2 2.1.0
Using ffi-yajl 2.4.0
Using hashie 3.6.0
Using mixlib-log 1.7.1
Using rack 2.2.3
Using uuidtools 2.1.5
Using chef-zero 13.1.0
Using diff-lcs 1.4.4
Using erubis 2.7.0
Using highline 1.7.10
Using iniparse 1.5.0
Using iso8601 0.9.1
Using mixlib-archive 0.4.20
Using mixlib-authentication 1.4.2
Using mixlib-cli 1.7.0
Using net-ssh 4.2.0
Using net-sftp 2.1.2
Using net-ssh-gateway 2.0.0
Using net-ssh-multi 1.2.1
Using ffi 1.15.5
Using ipaddress 0.8.3
Using plist 3.6.0
Using systemu 2.6.5
Using wmi-lite 1.0.5
Using ohai 13.12.6
Using proxifier 1.0.3
Using rspec-support 3.10.2
Using rspec-core 3.10.1
Using rspec-expectations 3.10.1
Using rspec-mocks 3.10.2
Using builder 3.2.4
Using rspec_junit_formatter 0.2.3
Using multi_json 1.15.0
Using rspec 3.10.0
Using rspec-its 1.3.0
Using net-scp 2.0.0
Using net-telnet 0.1.1
Using sfl 2.3
Using specinfra 2.82.25
Using serverspec 2.41.8
Using syslog-logger 1.6.8
Using chef 13.7.16
Using cleanroom 1.0.0
Using minitar 0.9
Using sawyer 0.8.2
Using octokit 4.21.0
Using retryable 3.0.5
Using molinillo 0.8.0
Using semverse 3.0.0
Using solve 4.0.4
Using thor 0.20.3
Using berkshelf 7.0.8
Using chef-telemetry 1.1.1
Using coderay 1.1.3
Using parallel 1.21.0
Using parser 3.0.2.0
Using rainbow 3.0.0
Using regexp_parser 2.1.1
Using rexml 3.2.5
Using rubocop-ast 1.12.0
Using ruby-progressbar 1.11.0
Using unicode-display_width 2.4.2
Using rubocop 1.22.0
Using cookstyle 7.25.6
Using declarative 0.0.20
Using excon 0.87.0
Using docker-api 2.2.0
Using erubi 1.12.0
Using faraday_middleware 0.14.0
Using jwt 2.3.0
Using memoist 0.16.2
Using os 1.1.1
Using signet 0.15.0
Using googleauth 0.14.0
Using httpclient 2.8.3
Using mini_mime 1.1.2
Using trailblazer-option 0.1.1
Using uber 0.1.0
Using representable 3.1.1
Using retriable 3.1.2
Using google-api-client 0.52.0
Using gssapi 1.3.1
Using gyoku 1.4.0
Using htmlentities 4.3.4
Using inifile 3.0.0
Using json-schema 2.8.1
Using tty-color 0.6.0
Using pastel 0.8.0
Using strings-ansi 0.2.0
Using unicode_utils 1.4.0
Using strings 0.2.1
Using tty-cursor 0.7.1
Using tty-box 0.7.0
Using tty-screen 0.8.1
Using wisper 2.0.1
Using tty-reader 0.9.0
Using tty-prompt 0.23.1
Using license-acceptance 1.0.19
Using method_source 0.9.2
Using parslet 1.8.2
Using pry 0.12.2
Using rubyzip 1.3.0
Using sslshake 1.3.1
Using sync 0.5.0
Using tins 1.29.1
Using term-ansicolor 1.7.1
Using json 2.5.1
Using train-core 3.8.1
Using little-plugger 1.1.4
Using logging 2.3.1
Using nori 2.6.0
Using rubyntlm 0.6.3
Using winrm 2.3.6
Using winrm-fs 1.3.3
Using winrm-elevated 1.2.3
Using train-winrm 0.2.12
Using train 3.8.1
Using train-aws 0.2.20
Using train-habitat 0.2.22
Using tty-table 0.12.0
Using inspec 4.18.51
Using mixlib-versioning 1.2.12
Using mixlib-install 3.12.24
Using test-kitchen 1.25.0
Using kitchen-docker_cli 0.19.0
Using lockfile 2.1.3
Using kitchen-dokken 2.14.0
Using kitchen-ec2 3.15.0
Using kitchen-inspec 1.2.0
Using kitchen-syncgz 1.0.0
Using kitchen-vagrant 1.6.0

Environment

MacOS Ventura.

Scenario

env KITCHEN_LOCAL_YML=../kitchen.yml kitchen test default-ec2-ubuntu-1404

Steps to Reproduce

Using this example config:

---
ec2:
  region: us-east-1
  associate_public_ip: true
  # kitchen values
  vpc_id: <my_vpc>
  security_group_ids: ["sg-xxxxxxxx"]
  # kitchen-public-us-east-1a
  subnet_id: "subnet-yyyyyy"
  interface: dns
  # ec2 instance config
  instance_type: t2.micro
  spot_price: 0.035
  spot_wait: 60
  block_duration_minutes: 60
chef:
  require_chef_omnibus: false
  name: chef
  version: 12.5.1
  log_level: auto

AWS Account was created recently just for this.

Expected Result

kitchen test to finish properly.

Actual Result

❯ env KITCHEN_LOCAL_YML=../kitchen.yml kitchen test default-ec2-ubuntu-1404
-----> Starting Kitchen (v1.25.0)
$$$$$$ Deprecated configuration detected:
require_chef_omnibus
chef_omnibus_url
Run 'kitchen doctor' for details.

-----> Cleaning up any prior instances of <default-ec2-ubuntu-1404>
-----> Destroying <default-ec2-ubuntu-1404>...
       Finished destroying <default-ec2-ubuntu-1404> (0m0.00s).
-----> Testing <default-ec2-ubuntu-1404>
-----> Creating <default-ec2-ubuntu-1404>...
       Detected platform: ubuntu version 14.04 on x86_64. Instance Type: t2.micro. Default username: ubuntu (default).
       If you are not using an account that qualifies under the AWS
free-tier, you may be charged to run these suites. The charge
should be minimal, but neither Test Kitchen nor its maintainers
are responsible for your incurred costs.

       Created automatic key pair kitchen-defaultec2ubuntu1404-username-C02FL11FML85-2023-01-27T18:05:37Z-n8ivs95x
       Waited 0/60s for spot request to become fulfilled.
       Removing automatic key pair kitchen-defaultec2ubuntu1404-username-C02FL11FML85-2023-01-27T18:05:37Z-n8ivs95x
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>>     Failed to complete #create action: [Could not create a spot instance:
BlockDurationMinutes is not a valid parameter. in the specified region us-east-1. Please check this AMI is available in this region.] on default-ec2-ubuntu-1404
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

➕ Additional context

Add any other context about the problem here. e.g. related issues or existing pull requests.

@fzipi360
Copy link
Author

fzipi360 commented Feb 6, 2023

@tas50 Don't know who can take a look at this one....

@RulerOf
Copy link
Contributor

RulerOf commented Jul 10, 2023

@fzipi360 you're correct in that AWS broke this feature.

I ended up writing a lambda function to clean up test-kitchen instances that are >48 hours old, but it's not really designed properly. IMO it's kitchen-ec2's responsibility to own the lifecycle of the systems it creates.

To that end, I was thinking of doing something like this (draw.io link):
kitchen-ec2-cleanup drawio

This will significantly expand the scope of the APIs that kitchen-ec2 has to have permission over, but I'm of the opinion that these requirements are reasonable.

Any thoughts?

@fzipi360
Copy link
Author

Hey @RulerOf ! We did more or less the same, having a reaper process that ends instances based on kitchen tags that run periodically.

But I think we also need to clean it up to accept that block_duration_minutes might not be working for the account and react accordingly when creating new hosts. I mean, cleaning up is "easy", but failing to create the test instance breaks the setup for everyone with new accounts in AWS.

@RulerOf
Copy link
Contributor

RulerOf commented Jul 10, 2023

But I think we also need to clean it up to accept that block_duration_minutes might not be working for the account and react accordingly when creating new hosts.

With the underlying functionality being entirely different, I'm not sure I would want to implement this new approach as a failover for account that doesn't support block_duration_minutes and would rather force the user to declare a new key in their driver config, e.g. max_instance_lifetime or terminate_after_minutes. For accounts that don't support block_duration_minutes we can rescue the exception from this specific API call and re-raise with a message that points users to the new driver parameter(s).

Thinking about this some more, I'd propose two conflicting parameters:

  • terminate_after_creation_minutes — Use eventbridge to terminate the instance this many minutes after the instance's creation.
  • terminate_after_idle_minutes — Use eventbridge to terminate the instance this many minutes after the last time test-kitchen was run. Every time you run test kitchen, if the instance is still alive, the eventbridge timer is updated to extend the instance's lifetime.

I suggest both of these because the former is really easy to understand, but the latter is really honestly how I would prefer it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants