Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: SAM deploy tries to delete and recreate AWS::Serverless::API domain when switching away from Fn:If to hardcoded value #3007

Closed
Landon-Alo opened this issue Mar 8, 2023 · 19 comments

Comments

@Landon-Alo
Copy link

Landon-Alo commented Mar 8, 2023

Description:

I have a lambda function with an API Gateway that is deployed to two environments. In each environment I want to specify a different Domain Name. To do so I used conditional statements within the AWS::Serverless::Api resource type:

ApiGatewayApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Domain:
        DomainName: !If [inDev, test.dev.hello.world, test.hello.world]
        CertificateArn: !If [inDev, 123, 321]
        EndpointConfiguration: EDGE
        Route53:
          HostedZoneId: !If [inDev, abcxyz, xyzabc]

This worked fine but then we were asked to set up the template.yaml to handle a third environment. To do this I decided to stop using the conditional and instead use parameters that are passed in via the parameter_overrides option in samconfig.toml. This means that the above resource block now looks like:

ApiGatewayApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Domain:
        DomainName: !Ref DomainName
        CertificateArn: !Ref EdgeCertificateArn
        EndpointConfiguration: EDGE
        Route53:
          HostedZoneId: !Ref Route53HostedZoneId

Note that the domain name is unchanged for the two environments that already existed. I then try to deploy this to our dev environment via sam deploy --config-env dev --config-file ./samconfig.toml --tags createdby=awssam team=abc --resolve-image-repos --resolve-s3 --no-confirm-changeset --no-fail-on-empty-changeset. Again, nothing is changed other than how I'm getting the data into the template.

What I expect to happen is that there will be no changes because I'm deploying using the dev config-env which already existed and for which I changed no values. I only moved values out of the conditional and into the parameter_overrides.

What actually happens is the changeset reports the following:

CloudFormation stack changeset
-----------------------------------------------------------------------------------------------------------------------------------------
Operation                          LogicalResourceId                  ResourceType                       Replacement                      
-----------------------------------------------------------------------------------------------------------------------------------------
+ Add                              ApiGatewayApiDeployment567d98957   AWS::ApiGateway::Deployment        N/A                              
                                   0                                                                                                      
+ Add                              ApiGatewayDomainName5a4c9e240d     AWS::ApiGateway::DomainName        N/A                              
* Modify                           ApiGatewayApiBasePathMapping       AWS::ApiGateway::BasePathMapping   True                             
* Modify                           ApiGatewayApiprodStage             AWS::ApiGateway::Stage             False                            
* Modify                           RecordSetGroup0d3ed29639           AWS::Route53::RecordSetGroup       False                            
- Delete                           ApiGatewayApiDeploymentff19363ec   AWS::ApiGateway::Deployment        N/A                              
                                   c                                                                                                      
- Delete                           ApiGatewayDomainName4148406711     AWS::ApiGateway::DomainName        N/A                              
-----------------------------------------------------------------------------------------------------------------------------------------

This is problematic for a couple of reasons.

  1. If this plan were to work it would involve downtime for our service since the domain would need to be deleted and recreated.
  2. The plan doesn't actually work because SAM will first try to create the custom domain, only to error out because it already exists.

I have also tried this by modifying the AWS::Serverless::Api resource just like so:

ApiGatewayApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Domain:
        DomainName: test.dev.hello.world
        CertificateArn: !If [inDev, 123, 321]
        EndpointConfiguration: EDGE
        Route53:
          HostedZoneId: !If [inDev, abcxyz, xyzabc]

Where above I simply hardcode the DomainName (again, this is after having already deployed with the conditional setup prior). Even this setup will trigger the changeset above where it wants to delete the existing custom domain and create a new one.

Steps to reproduce:

I went ahead and replicated this behavior using the hello world SAM app with modification. What you'll need to do is initialize the hello world app and replace the template.yaml and samconfig.toml with the below code. Obviously, you'll need to update the Domain properties to actual values for your test case.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  test-sam

  Sample SAM Template for test-sam

Globals:
  Function:
    Timeout: 3
    MemorySize: 128
    Tracing: Active
  Api:
    TracingEnabled: true

Parameters:
  Environment:
    Type: String
    Description: Name of environment
    AllowedValues:
      - dev
      - prod

Conditions:
  inDev:
    !Equals [!Ref Environment, dev]

Resources:
  #############################################################################
  # API Gateway with Custom Domain Name
  #############################################################################
  ApiGatewayApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Domain:
        DomainName: !If [inDev, test.dev.hello.world, test.hello.world]
        CertificateArn: !If [inDev, 123, 321]
        EndpointConfiguration: EDGE
        Route53:
          HostedZoneId: !If [inDev, abcxyz, xyzabc]
  
  #############################################################################
  # Lambda for Service
  #############################################################################
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: hello_world/
      Handler: app.lambda_handler
      Runtime: python3.9
      Architectures:
      - x86_64
      Events:
        HelloWorld:
          Type: Api
          Properties:
            Path: /hello
            Method: GET
            RestApiId: !Ref ApiGatewayApi

and here is the samconfig.toml. Note, that you don't need the parameter overrides to replicate the bug.

version = 0.1
[dev]
[dev.deploy]
[dev.deploy.parameters]
stack_name = "sam-test"
s3_prefix = "sam-test"
region = "us-east-1"
confirm_changeset = true
capabilities = ["CAPABILITY_IAM", "CAPABILITY_NAMED_IAM"]
parameter_overrides = [
    "Environment=dev",
    "DomainName=test.dev.hello.world",
    "EdgeCertificateArn=123",
    "Route53ZoneId=abcxyz",
]

[prod]
[prod.deploy]
[prod.deploy.parameters]
stack_name = "sam-test"
s3_prefix = "sam-test"
region = "us-east-1"
confirm_changeset = true
capabilities = ["CAPABILITY_IAM", "CAPABILITY_NAMED_IAM"]
parameter_overrides = [
    "Environment=prod",
    "DomainName=test.hello.world",
    "EdgeCertificateArn=321",
    "Route53ZoneId=xyabc",
]

After updating the samconfig.toml and template.yaml you'll need to do the following steps:

  1. Deploy the application to the dev environment.
  2. Change DomainName in AWS::Serverless::Api to be test.dev.hello.world (the same name it was deployed with before)
  3. Build and try to deploy again

Observed result:

Initiating deployment
=====================

2023-03-08 14:09:49,686 | Collected default values for parameters: {}
2023-03-08 14:09:49,700 | Sam customer defined id is more priority than other IDs. Customer defined id for resource ApiGatewayApi is ApiGatewayApi
2023-03-08 14:09:49,700 | Sam customer defined id is more priority than other IDs. Customer defined id for resource HelloWorldFunction is HelloWorldFunction
2023-03-08 14:09:49,700 | 0 stacks found in the template
2023-03-08 14:09:49,700 | Collected default values for parameters: {}
2023-03-08 14:09:49,712 | Sam customer defined id is more priority than other IDs. Customer defined id for resource ApiGatewayApi is ApiGatewayApi
2023-03-08 14:09:49,712 | Sam customer defined id is more priority than other IDs. Customer defined id for resource HelloWorldFunction is HelloWorldFunction
2023-03-08 14:09:49,712 | 2 resources found in the stack 
        Uploading to sam-test/538be5ddbb1414e39664b8ea7dc96ed1.template  1609 / 1609  (100.00%)


Waiting for changeset to be created..

CloudFormation stack changeset
-----------------------------------------------------------------------------------------------------------------------------------------
Operation                          LogicalResourceId                  ResourceType                       Replacement                      
-----------------------------------------------------------------------------------------------------------------------------------------
+ Add                              ApiGatewayApiDeployment567d98957   AWS::ApiGateway::Deployment        N/A                              
                                   0                                                                                                      
+ Add                              ApiGatewayDomainName5a4c9e240d     AWS::ApiGateway::DomainName        N/A                              
* Modify                           ApiGatewayApiBasePathMapping       AWS::ApiGateway::BasePathMapping   True                             
* Modify                           ApiGatewayApiprodStage             AWS::ApiGateway::Stage             False                            
* Modify                           RecordSetGroup0d3ed29639           AWS::Route53::RecordSetGroup       False                            
- Delete                           ApiGatewayApiDeploymentff19363ec   AWS::ApiGateway::Deployment        N/A                              
                                   c                                                                                                      
- Delete                           ApiGatewayDomainName4148406711     AWS::ApiGateway::DomainName        N/A                              
-----------------------------------------------------------------------------------------------------------------------------------------


Changeset created successfully. arn:aws:cloudformation:us-east-1:xxx:changeSet/samcli-deploy1678313390/cb3b2f9e-e8d0-469b-810d-d6c5c5731237


2023-03-08 14:10:03 - Waiting for stack create/update to complete

CloudFormation events from stack operations (refresh every 0.5 seconds)
-----------------------------------------------------------------------------------------------------------------------------------------
ResourceStatus                     ResourceType                       LogicalResourceId                  ResourceStatusReason             
-----------------------------------------------------------------------------------------------------------------------------------------
CREATE_IN_PROGRESS                 AWS::ApiGateway::DomainName        ApiGatewayDomainName5a4c9e240d     -                                
CREATE_IN_PROGRESS                 AWS::ApiGateway::Deployment        ApiGatewayApiDeployment567d98957   -                                
                                                                      0                                                                   
CREATE_FAILED                      AWS::ApiGateway::DomainName        ApiGatewayDomainName5a4c9e240d     test.dev.xxx.xxx already    
                                                                                                         exists in stack                  
                                                                                                         arn:aws:cloudformation:us-       
                                                                                                         east-1:xxx:stack/sam-te 
                                                                                                         st/16f30000-bdf3-11ed-977a-12beb 
                                                                                                         d4450e9                          
CREATE_FAILED                      AWS::ApiGateway::Deployment        ApiGatewayApiDeployment567d98957   Resource creation cancelled      
                                                                      0                                                                   
UPDATE_ROLLBACK_IN_PROGRESS        AWS::CloudFormation::Stack         sam-test                           The following resource(s) failed 
                                                                                                         to create:                       
                                                                                                         [ApiGatewayDomainName5a4c9e240d, 
                                                                                                         ApiGatewayApiDeployment567d98957 
                                                                                                         0].                              
UPDATE_ROLLBACK_COMPLETE_CLEANUP   AWS::CloudFormation::Stack         sam-test                           -                                
_IN_PROGRESS                                                                                                                              
DELETE_COMPLETE                    AWS::ApiGateway::DomainName        ApiGatewayDomainName5a4c9e240d     -                                
DELETE_COMPLETE                    AWS::ApiGateway::Deployment        ApiGatewayApiDeployment567d98957   -                                
                                                                      0                                                                   
UPDATE_ROLLBACK_COMPLETE           AWS::CloudFormation::Stack         sam-test                           -                                
-----------------------------------------------------------------------------------------------------------------------------------------

2023-03-08 14:13:07,379 | Execute stack waiter exception
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/aws-sam-cli/1.76.0/libexec/lib/python3.8/site-packages/samcli/lib/deploy/deployer.py", line 502, in wait_for_execute
    waiter.wait(StackName=stack_name, WaiterConfig=waiter_config)
  File "/opt/homebrew/Cellar/aws-sam-cli/1.76.0/libexec/lib/python3.8/site-packages/botocore/waiter.py", line 55, in wait
    Waiter.wait(self, **kwargs)
  File "/opt/homebrew/Cellar/aws-sam-cli/1.76.0/libexec/lib/python3.8/site-packages/botocore/waiter.py", line 375, in wait
    raise WaiterError(
botocore.exceptions.WaiterError: Waiter StackUpdateComplete failed: Waiter encountered a terminal failure state: For expression "Stacks[].StackStatus" we matched expected path: "UPDATE_ROLLBACK_COMPLETE" at least once
2023-03-08 14:13:07,384 | Telemetry endpoint configured to be https://aws-serverless-tools-telemetry.us-west-2.amazonaws.com/metrics
2023-03-08 14:13:07,475 | Sending Telemetry: {'metrics': [{'commandRun': {'requestId': '9543398a-8c4d-4d4f-befc-1c2ad451d024', 'installationId': 'dda226e3-9b79-4e59-84a4-1c7253bce103', 'sessionId': '6666cf63-41ac-47e3-9766-19f1b6d116df', 'executionEnvironment': 'CLI', 'ci': False, 'pyversion': '3.8.16', 'samcliVersion': '1.76.0', 'awsProfileProvided': True, 'debugFlagProvided': True, 'region': 'us-east-1', 'commandName': 'sam deploy', 'metricSpecificAttributes': {'projectType': 'CFN', 'gitOrigin': None, 'projectName': 'c705de491dcb53c849e84aa5634de3748cc3f96f7126d125eef2aa054399d24d', 'initialCommit': None}, 'duration': 200522, 'exitReason': 'DeployFailedError', 'exitCode': 1}}]}
2023-03-08 14:13:08,016 | Telemetry response: 200
Error: Failed to create/update the stack: sam-test, Waiter StackUpdateComplete failed: Waiter encountered a terminal failure state: For expression "Stacks[].StackStatus" we matched expected path: "UPDATE_ROLLBACK_COMPLETE" at least once

Expected result:

I expected that there would be no changes on the changeset because I am not changing values, only the way the values are passed into the template (hardcoded vs using a conditional statement)

Additional environment details (Ex: Windows, Mac, Amazon Linux etc)

{
  "version": "1.76.0",
  "system": {
    "python": "3.8.16",
    "os": "macOS-13.2-arm64-arm-64bit"
  },
  "additional_dependencies": {
    "docker_engine": "20.10.23",
    "aws_cdk": "Not available",
    "terraform": "Not available"
  }
}

Thank you! Happy to answer any clarifying questions.

@Landon-Alo Landon-Alo added the stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. label Mar 8, 2023
@mndeveci mndeveci transferred this issue from aws/aws-sam-cli Mar 9, 2023
@mndeveci
Copy link
Contributor

mndeveci commented Mar 9, 2023

Thanks for reporting this issue @Landon-Alo , transferring to SAMT repo.
Just for reference, I think this is coming from here: https://github.com/aws/serverless-application-model/blob/develop/samtranslator/model/apigateway.py#L137

Sets up the resource such that it will trigger a re-deployment when Swagger changes or always_deploy is true or the openapi version changes or a domain resource changes.

@Landon-Alo
Copy link
Author

@mndeveci ah apologies. Thank you for transferring.

@hoffa
Copy link
Contributor

hoffa commented Mar 9, 2023

Do you encounter the issue if you add AWS::LanguageExtensions to the Transforms? It resolves intrinsic functions before the SAM transform receives the template. Haven't yet looked deep, but at a glance this could be related to #2533.

@Landon-Alo
Copy link
Author

@hoffa, thanks for the response. I added the AWS::LanguageExtensions transform as such

Transform: 
  - AWS::LanguageExtensions
  - AWS::Serverless-2016-10-31

Then I ran sam build and deploy with the AWS:Serverless::Api in the current state (with if statements)

ApiGatewayApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Domain:
        DomainName: !If [inDev, test.dev.hello.world, test.hello.world]
        CertificateArn: !If [inDev, 123, 321]
        EndpointConfiguration: EDGE
        Route53:
          HostedZoneId: !If [inDev, abcxyz, xyzabc]

Without the AWS::LanguageExtensions this is a no change deployment. Adding the LanguageExtensions transforms, however returns the following changeset:

CloudFormation stack changeset
---------------------------------------------------------------------------------------------------------------------------------
Operation                        LogicalResourceId                ResourceType                     Replacement                    
---------------------------------------------------------------------------------------------------------------------------------
+ Add                            ApiGatewayApiDeploymentb605a4f   AWS::ApiGateway::Deployment      N/A                            
                                 b64                                                                                              
+ Add                            ApiGatewayDomainName5a4c9e240d   AWS::ApiGateway::DomainName      N/A                            
+ Add                            RecordSetGroup06ffb40f10         AWS::Route53::RecordSetGroup     N/A                            
* Modify                         ApiGatewayApiBasePathMapping     AWS::ApiGateway::BasePathMappi   True                           
                                                                  ng                                                              
* Modify                         ApiGatewayApiprodStage           AWS::ApiGateway::Stage           False                          
- Delete                         ApiGatewayApiDeploymentff19363   AWS::ApiGateway::Deployment      N/A                            
                                 ecc                                                                                              
- Delete                         ApiGatewayDomainName4148406711   AWS::ApiGateway::DomainName      N/A                            
- Delete                         RecordSetGroup0d3ed29639         AWS::Route53::RecordSetGroup     N/A                            
---------------------------------------------------------------------------------------------------------------------------------

I then changed the DomainName part of the AWS::Serverless::Api to be hardcoded to test.dev.hello.world and (while keeping the AWS::LanguageExtensions transform tried sam build and deploy. This returned a similar changeset to the above.

CloudFormation stack changeset
---------------------------------------------------------------------------------------------------------------------------------
Operation                        LogicalResourceId                ResourceType                     Replacement                    
---------------------------------------------------------------------------------------------------------------------------------
+ Add                            ApiGatewayApiDeploymentb605a4f   AWS::ApiGateway::Deployment      N/A                            
                                 b64                                                                                              
+ Add                            ApiGatewayDomainName5a4c9e240d   AWS::ApiGateway::DomainName      N/A                            
+ Add                            RecordSetGroup06ffb40f10         AWS::Route53::RecordSetGroup     N/A                            
* Modify                         ApiGatewayApiBasePathMapping     AWS::ApiGateway::BasePathMappi   True                           
                                                                  ng                                                              
* Modify                         ApiGatewayApiprodStage           AWS::ApiGateway::Stage           False                          
- Delete                         ApiGatewayApiDeploymentff19363   AWS::ApiGateway::Deployment      N/A                            
                                 ecc                                                                                              
- Delete                         ApiGatewayDomainName4148406711   AWS::ApiGateway::DomainName      N/A                            
- Delete                         RecordSetGroup0d3ed29639         AWS::Route53::RecordSetGroup     N/A                            
---------------------------------------------------------------------------------------------------------------------------------

I then deleted the stack and rebuilt it, but this time with the AWS::LanguageExtensions transform and with the if statements in AWS::Serverless::Api. After this built I then tried to change the DomainName parameter from being an !If statement to hardcoded as test.dev.hello.world. I found that in this case the behavior was as expected and I was able to switch from !If without issue.

Unfortunately I would like to avoid deleting a production stack and imposing downtime on critical services. Any thoughts for how I could do a migration with an existing stack without deleting it?

Thanks.

@aahung
Copy link
Contributor

aahung commented Mar 10, 2023

From the code, the logical ID of AWS::ApiGateway::DomainName is the hash of the whole value of "DomainName". Because it is intrinsic, SAM will use the intrinsic form to hash. So if the intrinsic form changes despite the resolved value is the same, the hash is still different, which causing the "replace" to happen.

api_domain_name = "{}{}".format("ApiGatewayDomainName", LogicalIdGenerator("", domain_name).gen())

There is no workaround available for now and we cannot change how the logical ID is generated in SAM to avoid backward compatibility issue.

@aahung aahung added type/bug and removed stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels Mar 10, 2023
@aahung
Copy link
Contributor

aahung commented Mar 10, 2023

CFN doesn't support changing logical ID without replacing the resource (Create new one and delete the old resource). However, in theory this should work:

Template A: the original SAM template that was deployed successfully
Template B: the new SAM template including the intrinsic change

  1. In the CFN console, acquired the transformed CFN template, add resources you want to change logical ID's DeletionPolicy to Retain (-> Template A1)
  2. Update the CFN stack with A1
  3. Remove the resources from A1 (-> A2)
  4. Update the CFN stack with A2.
  5. Now the CFN loses track of those resources.
  6. Use SAM to translate template B and get B1.
  7. Use CFN's import feature to import the resources in to the stack.
  8. Update the CFN stack with B.

@Landon-Alo
Copy link
Author

Landon-Alo commented Mar 10, 2023

Thanks @aahung I'll work on figuring out the flow on my test sam app. I have a few clarifying questions. So if I understand correctly:

  1. In the console, get the processed template for A, give any resources I want to retain a DeletionPolicy=Retain, and update the CFN stack (A1)
  2. Working with the processed template JSON (locally), remove the resources I want to change from A1 (-> A2). Is this how I should be making the modifications?
  3. Update the CFN stack with A2 which will cause the stack to lose those resources

After here is where I start to get a little lost.
4. Next, I'm going to use SAM to translate template B and get B1.

How can I get the translated template without doing a full stack deployment? I think this is where I'm getting hung up on and is creating confusion for the later steps.

  1. Use CFN import feature to import the resources into the stack.

Is template B an entirely separate stack? Import seems to only be allowed when a stack has been fully initialized or when a stack is initially created.

Should I be creating a new stack, minus the ApiGateway and LambdaFunctions (since they can't be built as they already exist) and then importing the resources into it?

Or should I be modifying the processed JSON to not use the !If statements and use !Ref functions instead, and then creating a new stack and importing the resources at initialization?

Or should I get a processed template that uses AWS::LanguageExtensions (B1), edit the template for the existing stack via the Designer, and import the resources back in that way?

Thank you.

@hoffa
Copy link
Contributor

hoffa commented Mar 10, 2023

How can I get the translated template without doing a full stack deployment?

There's a few ways.

Get transformed template of deployed stack

If your stack is deployed, you can get the transformed template from the CloudFormation console (Template tab, enable View processed template).

Or using the AWS CLI, assuming your stack is named <my-stack>:

aws cloudformation get-template --query TemplateBody --change-set-name "$(aws cloudformation describe-stacks --query 'Stacks[0].ChangeSetId' --output text --stack-name <my-stack>)"

Transform template locally

If you want to transform a template locally, you can use the script included in our repository:

git clone https://github.com/aws/serverless-application-model.git
cd serverless-application-model
python3 -m venv .venv
source .venv/bin/activate
make init

Then:

bin/sam-translate.py --template-file template.yaml

Note however that transforming using that script won't always work, as it assumes the input template is in same format as what AWS::Serverless-2016-10-31 receives (e.g. after sam package and all local paths are replaced with proper URIs to resources in AWS).

Transform template without full deployment

If you want a more faithful transformation, but without actually creating the resources in the template, you can create a change set (not execute it) and get the transformed template.

If it's too tedious to do it through the console, you could whip up a script such as the following (untested, for inspiration only, not production-ready):

import json
import sys
import uuid

import boto3


def transform(template: str) -> str:
    cfn = boto3.client("cloudformation")
    name = f"transform-{uuid.uuid4()}"
    change_set = cfn.create_change_set(
        TemplateBody=template,
        StackName=name,
        ChangeSetName=name,
        ChangeSetType="CREATE",
        Capabilities=[
            "CAPABILITY_IAM",
            "CAPABILITY_AUTO_EXPAND",
        ],
    )
    change_set_id = change_set["Id"]
    waiter = cfn.get_waiter("change_set_create_complete")
    waiter.wait(
        ChangeSetName=change_set_id,
        WaiterConfig={
            "Delay": 1,
        },
    )
    transformed = cfn.get_template(ChangeSetName=change_set_id)
    cfn.delete_stack(StackName=name)
    return json.dumps(transformed["TemplateBody"])


def main():
    print(transform(sys.stdin.read()))


if __name__ == "__main__":
    main()

And then transform with:

python transform.py < sam-template.yaml > cfn-template.json

@aahung
Copy link
Contributor

aahung commented Mar 10, 2023

Next, I'm going to use SAM to translate template B and get B1.

This one might not be needed. I think you can proceed the import without have it transformed first (to be confirmed in your test)

Is template B an entirely separate stack? Import seems to only be allowed when a stack has been fully initialized or when a stack is initially created.

It is your original stack, as at that point, your original stack doesn't have the resources you want to rename.

Or should I get a processed template that uses AWS::LanguageExtensions (B1), edit the template for the existing stack via the Designer, and import the resources back in that way?

Using AWS::LanguageExtensions might be a good choice as it avoid this issue in the future. See above, you might not need to process the template B manually. You can just import with template B and let CFN to handle the processing. Let me know if it works.

@Landon-Alo
Copy link
Author

Cool, thanks guys. I'll chip away at it and let you know if I have follow-up questions.

@aahung aahung self-assigned this Mar 10, 2023
@Landon-Alo
Copy link
Author

So I think I worked out the flow but hit a hiccup with not being able to import AWS::Route53::RecordSetGroup and AWS::Lambda::Permission resources.

The process

  1. Add DeletionPolicy=Retain to resources I want to change
  2. Remove said resources from stack
  3. Locally modify the template.yaml to use AWS::LanguageExtensions
  4. Run sam build
  5. Run sam package --resolve-s3 --output-template-file packaged.yaml
  6. In the GUI, create a changeset using packaged.yaml, save the resulting processed template, locally, as b.json
  7. In the GUI, click "import resources into stack", use b.json as the template

At this point I hit an error where it says The following resource types are not supported for resource import: AWS::Route53::RecordSetGroup,AWS::Lambda::Permission and links out to this documentation: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/resource-import-supported-resources.html?icmpid=docs_cfn_console

Unfortunately I need to remove the RecordSetGroup from the stack as I use a !If statement in it, which is the root of the issue here.

Does this flow seem accurate? Any thoughts for how to get around the issue of not being able to import RecordSetGroups? That seems like a bit of a wall.

Thank you.

@aahung
Copy link
Contributor

aahung commented Mar 13, 2023

If some resources do not support import, can they be excluded from those with DeletionPolicy set to "Retain" and not removed from A1 -> A2?

@Landon-Alo
Copy link
Author

@aahung this is a fair point and I actually think I can get away with doing that here.

I was deleting resources from the stack by commenting them out of the untransformed template.yaml and then running build and deploy. This is a bit like taking a hammer to the problem. I'll revisit doing this work to the template directly.

What is the best was to modify the transformed template directly? Should I download it, remove the necessary resources, then reimport it via the GUI?

Thank you.

@aahung
Copy link
Contributor

aahung commented Mar 14, 2023

What is the best was to modify the transformed template directly?

Using CFN CreateChangeSet API is the method with closest output as normal SAM transform. Doing it locally with bin/sam-translate.py is another option.

@Landon-Alo
Copy link
Author

I went ahead and revisited this approach working with the transformed templates directly and trying to find a way to do the migration without having to touch RecordSetGroup. Unfortunately, I do not believe it is possible.

The issue at hand is the Aws::Serverless::Api generates a large number of resources automatically. Among them it generates a RecordSetGroupXXXXXXX and an ApiGatewayDomainNameXXXXX and these two resources are the ones affected by the bug. This means, that I have no way of independently fixing just ApiGatewayDomainNameXXXXX because doing so would also affect RecordSetGroupXXXXXX since they're both generated from the same resource abstraction, Aws::Serverless::Api.

For example here is the RecordSetGroupXXXXX in the "A" version aka the version I need to fix:

"RecordSetGroup0d3ed29639": {
        "Type": "AWS::Route53::RecordSetGroup",
        "DeletionPolicy": "Retain",
        "Properties": {
          "HostedZoneId": {
            "Fn::If": [
              "inDev",
              "ABCXYZ",
              "XYZABC"
            ]
          },
          "RecordSets": [
            {
              "Name": {
                "Fn::If": [
                  "inDev",
                  "test.dev.hello.world",
                  "test.hello.world"
                ]
              },
              "Type": "A",
              "AliasTarget": {
                "HostedZoneId": "zyx",
                "DNSName": {
                  "Fn::GetAtt": [
                    "ApiGatewayDomainName4148406711",
                    "DistributionDomainName"
                  ]
                }
              }
            }
          ]
        }
      }

Take note of the "RecordSets[Name]" key. Here is the RecordSetGroup resource after using the AWS::LanguageExtensions fix:

"RecordSetGroup0d3ed29639": {
        "Type": "AWS::Route53::RecordSetGroup",
        "DeletionPolicy": "Retain",
        "Properties": {
          "HostedZoneId": {
            "Fn::If": [
              "inDev",
              "ABCXYZ",
              "XYZABC"
            ]
          },
          "RecordSets": [
            {
              "Name": "test.dev.hello.world",
              "Type": "A",
              "AliasTarget": {
                "HostedZoneId": "zyx",
                "DNSName": {
                  "Fn::GetAtt": [
                    "ApiGatewayDomainName5a4c9e240d",
                    "DistributionDomainName"
                  ]
                }
              }
            }
          ]
        }
      },

You can see that the only change is RecordSets[Name], which is set by the DomainName parameter in the Aws::Serverless::Api. Looping back to the point, the problematic piece of this is ApiGatewayDomainNameXXXX also uses the DomainName parameter in Aws::Serverless::Api.

Thus, I think I'm in a bit of a conundrum.

Situation 1: Use AWS::LanguageExtensions

This will change RecordSetGroupXXXXX, specifically RecordSets[Name], thus when I go back to import the changed ApiGatewayDomainNameXXXX it will fail and want to delete and rebuild the record set.

Situation 2: Use !Ref to limit changes

I don't think this would work either because I need to update the DomainName parameter of Aws::Serverless::Api to use a !Ref instead of the !If. By doing so, however, would cause changes to RecordSetGroupXXXXX which will break the import.

Conclusion
I also skipped over the fact that RecordSetGroupXXXX references ApiGatewayDomainNameXXXXX so that would be a problem to solve as well.

What do you think? I'm not sure if there is a good way around this. My immediate reaction is that I would need to refactor Aws::Serverless::Api into its respective parts and then do the migration, but I'm not sure if that is possible. Hopefully I didn't miss anything obvious and this was useful. Thank you.

@aahung
Copy link
Contributor

aahung commented Mar 14, 2023

I didn't realize RecordSetGroup's logical ID also needs renaming. If import doesn't support RecordSetGroup, this won't work for you...

For the current SAM-T, sorry, I don't see a nice workaround. Using language extension will definitely change the logical ID and will require you to delete the domain name first (causing downtime).

An ugly workaround

I want to step back and know more about why you want to make change to the ApiGatewayApi.DomainName to use Ref instead of If. Are you planning to support more/different values of DomainName or just refactoring? To avoid downtime, a not-so-nice solution will be duplicating the whole ApiGatewayApi and use Condition:.

So you will have something like

Conditions:
  UseOldApiGatewayApi: ... (if the DomainName equals to your current production domain)

Resources:
...
ApiGatewayApi:
  Condition: UseOldApiGatewayApi
  Properties:
     DomainName: !If [inDev, test.dev.hello.world, test.hello.world]
     everything else stays the same

ApiGatewayApiNew:
  Condition: UseApiGatewayApi
  Properties:
     DomainName: !Ref DomainName
     everything else stays the same

It is not pretty, will prevent you from adding language extensions in the future and may have cons.

Another theory

Is it possible for you to not use the property DomainName? You can continue to use AWS::Serverless::Api but use native CFN resources (like AWS::Route53::RecordSetGroup, AWS::ApiGateway::DomainName, AWS::ApiGateway::BasePathMapping) for domain name related functions. Because you are not using SAM-T to generate those resources, you could keep the logical IDs intact while doing whatever you want with the values.

Wait for SAM-T to have a new property to provide logical ID suffix

I cannot guarantee this unfortunately but will look into whether a new property is possible.

@Landon-Alo
Copy link
Author

Thanks @aahung those are a few good ideas to go off of. I need to make it more dynamic because now the service needs to deploy to three different environments rather than two. Unfortunately, the way I had set it up locked me into a two environment setup.

I'm currently testing an approach where I spin up a parallel version of the API that bypasses the Domain object on Aws::Serverless::Api by using a cloudfront distribution and a weighted record. Then I'm hoping I can migrate traffic to the cloudfront distribution version.

I'll also play around with using a condition as you described, I didn't know you could do that! It may be ugly but it seems like that may be a good bandaid.

If I have time, I'll also play around with using the native CFN resources and see if I can do a migration flow that way.

I'll poke around with these solutions and follow-up. I feel like we're close to something. Thanks again.

@Landon-Alo
Copy link
Author

Landon-Alo commented Mar 14, 2023

I came up with a flow that worked on my test application. Going to try it later this week on the main service.

The process will be as follows:

  1. Build a new API gateway, attach it to the lambda, and build a cloudfront distribution that points to the new API gateway.
  2. Deploy ^ through from Stg -> Dev -> Prod
  3. Manually convert all existing records to be weighted records
  4. Build a record set group that creates a weighted DNS record pointing to the cloudfront distribution
  5. Deploy ^ through Stg -> Dev -> Prod

At this point there should now be two DNS records for this service in each environment. Manually migrate traffic from the old DNS record to the new DNS record. Once the migration has completed in all environments do the following:

Delete the old API from the SAM template
Delete the lambda events pointing to the old API from the SAM template
Deploy ^ through from Stg -> Dev -> Prod

This is similar in spirit to deconstructing the Aws::Serverless::Api into its constituent parts. However, I avoid setting up a custom domain name and instead use cloudfront, which allows me to run the API in parallel with itself, do a migration, and then close the "old" one.

I'm going to give it a shot later this week and close this issue if I'm successful.

@Landon-Alo
Copy link
Author

Hello! Closing this out. The cloudfront approach didn't work either because we can't attach the alternate domain name to the distribution as long as the custom domain name for the API is up. Thus, we would have to take downtime while we delete the custom domain name and update the cloudfront distribution. At that point it's better to just take the downtime and rebuild the Aws::Serverless::Api resource.

We could have likely used the conditional resource approach that was proposed in #3007 (comment), but we decided to manage the custom domain name, base path mapping, and route53 record in terraform instead.

Thus, our flow was:

  1. Add DeletionPolicy: Retain to Aws::Serverless::Api
  2. Run through CI/CD
  3. Delete the Domain block from Aws::Serverless::Api to keep only the API in the stack
  4. Run through CI/CD
  5. Import the custom domain name, base path mapping, and route53 record into terraform

And that's it. In terms of the developers using the repository they'll be none the wiser to the change. The change largely impacts the infrastructure team and as such we've added documentation on how the custom domain name and associated elements are managed.

Thanks again for you're help. I'm going to close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants