Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aws): use AWS profiles using .credentials file #3973

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

roehrijn
Copy link

@roehrijn roehrijn commented Oct 5, 2023

Description

This PR introduces a new command line flag --aws-profile which can be used multiple times and allows to make use of AWS profiles in .credentials files (see also Configuration and credential file settings).
Furthermore the PR allows to use multiple of these profiles at once in a way that it queries for available Route53 zones using credentials of each profile and later distinguishes between the profiles in order to make changes with proper credentials to the right Route53 zone.

We at Mercedes-Benz are running a fork with those changes now for several months in production.

Checklist

  • Unit tests updated
  • End user documentation updated

Jan Roehrich jan.roehrich@mercedes-benz.com, Mercedes-Benz Tech Innovation GmbH, legal info/Impressum

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 5, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @roehrijn!

It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 5, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @roehrijn. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 5, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johngmyers for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 5, 2023
@tobiasgiese
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 5, 2023
@johngmyers
Copy link
Contributor

What is the value of having this complexity versus running an external-dns deployment per profile?

@johngmyers
Copy link
Contributor

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2023
@roehrijn
Copy link
Author

roehrijn commented Oct 13, 2023

Hi @johngmyers,
we run almost 1000 Kubernetes clusters which are concurrently accessing our Route53 zones through external-dns. To cope with Route53 throttling we need to distribute our zones over several AWS accounts which leads us to 6 profiles, currently. During some migration situations there might even be 9 profiles.
Thus, the answer to your question about value has two dimensions. First, we have the complexity of running and managing up to 9 external-dns instances on up to 1000 clusters. On the other hand it is also about resource consumption. In the setup we use we need to request at least 60MB of memory per instance in order to work properly, in some situations a bit more. 60MB * 6 instances * 1000 clusters = a lot of additional EC2 costs.

@mloiseleur
Copy link
Contributor

AWS profile is a feature of aws config file used in many aws compatible tools.
Do you think you can rebase and finish this PR ?

@roehrijn
Copy link
Author

Hi @mloiseleur, yes I'm going to rebase soon. Just had some other important topics on the table over the last view weeks.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2023
@roehrijn
Copy link
Author

Hi @mloiseleur, I rebased the PR. Do you see a chance to get this aspect merged? If yes, I would also take the effort to adapt the documentation accordingly.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2023
@mloiseleur
Copy link
Contributor

@roehrijn As soon as you finish and rebase this PR, I don't see why not ?

Note: It surely could help too to remove the WIP in the title.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 29, 2023
@roehrijn roehrijn changed the title [WIP] Allow usage of (multiple) profiles using .credentials file in AWS provider Allow usage of (multiple) profiles using .credentials file in AWS provider Dec 29, 2023
main.go Outdated
if cfg.Provider == "aws" || cfg.Provider == "aws-sd" || cfg.Registry == "dynamodb" {
awsSession, err = aws.NewSession(
if len(cfg.AWSProfiles) == 0 || (len(cfg.AWSProfiles) == 1 && cfg.AWSProfiles[0] == "") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a longer code change without test.
Maybe better to also have in our provider/aws package a function that creates the session instead of putting it in to main(). main() is not good testable.

Copy link
Author

@roehrijn roehrijn Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @szuecs, I extracted two functions, one to create sessions per profile and one to create a default session. I hope this is what you've intended. Anyway I stumbled over two aspects:

  1. I think the code of creating sessions over creating AWS clients to create an AWS provider can be further streamlined to remove complexity from main. I think this can all be integrated in aws.NewAWSProvider(). I'm eager to work on that but I guess its better to do that in a dedicated PR. WDYT?
  2. I didn't add tests for the newly created functions. IMHO they're hard to unit test due to their strong dependencies to AWS SDK packages and behavior. However, I was wondering if you ever considered making use of LocalStack for testing the AWS provider? Same here, I would be eager to take care about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know AWS SDK packages have a test part for mocking so you could test that at least you can instantiate a mock session, not sure if it is really helpful but a thing we could do.
I have no strong opinion on having aws.NewAWSProvider(aws.Options{..}) or something like that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any suitable mocking option for AWS sessions in AWS SDK v1, what is still in use here. However, it turned out that in most cases session creation facilities in AWS SDK don't talk to any remote endpoint. So I was able to test the profile loading behavior from a credential file as well as the default behavior from before this PR. What is unfortunately not possible to test in this way is the role assumption options. They require connectivity to AWS' STS service.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2024
@roehrijn roehrijn force-pushed the roehrijn/aws-profiles branch 2 times, most recently from d9206cf to 5f4a13c Compare January 28, 2024 15:04
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 5, 2024
@roehrijn roehrijn force-pushed the roehrijn/aws-profiles branch 2 times, most recently from 80af5cc to 11e098f Compare March 6, 2024 09:09
@roehrijn
Copy link
Author

roehrijn commented Mar 6, 2024

@mloiseleur @szuecs, please let me know if there is something left to do in order to get this merged. I would be very happy to avoid further rebases if possible, because it is a lot of work to validate function and valid documentation after each rebase. I'm eager to deal with missing aspects, however it would kindly ask to get to know them all together in order to avoid extra work.

@mloiseleur
Copy link
Contributor

I understand. I think we are at the end.
/lgtm
/assign @szuecs for final review

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 12, 2024
@mloiseleur
Copy link
Contributor

/retitle feat(aws): use AWS profiles using .credentials file

@k8s-ci-robot k8s-ci-robot changed the title Allow usage of (multiple) profiles using .credentials file in AWS provider feat(aws): use AWS profiles using .credentials file Mar 12, 2024
if errors.As(err, &awsErr) {
switch awsErr.Code() {
case "AccessDenied":
log.Warnf("Skipping AWS profile %q due to missing permission: %v", profile, awsErr.Message())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is only a warning and not fatal?
This does not seem to be a temporary issue, but requires attention so we should stop the process.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see answer on next comment. I think both have same scope, aren't they?

log.Warnf("Skipping AWS profile %q due to missing permission: %v", profile, awsErr.Message())
continue
case "InvalidClientTokenId", "ExpiredToken", "SignatureDoesNotMatch":
log.Warnf("Skipping AWS profile %q due to credential issues: %v", profile, awsErr.Message())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is only a warning and not fatal?
This does not seem to be a temporary issue, but requires attention so we should stop the process.

At least I expect this to be an error instead of warning.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @szuecs, thanks for the review.

In our use-case where we handle many AWS profiles with one external-dns instance, it occasionally happens that one profile is not ready due to eventual consistency topics during resource provisioning. However, the other profiles are working well in those situations. That's why we don't want the process to be killed. Especially because we heavily depend on caching in order not to face the Route53 throttling thresholds. Thats why I implemented it as a warning.

However, I understand other views on that. Changing this two cases to log an error instead of a warning would be completely fine for me. Additionally I see two more possibilities to handle that:

  1. In order to keep the former semantics I could log the warnings (or errors) and add fatal behavior in case none of the given profiles is working.
  2. I can introduce a behavioral switch as command line argument in order to let the user decide between fatal and graceful behavior (don't like that option much TBH).

WDYT?

log.Warnf("Skipping AWS profile %q due to credential issues: %v", profile, awsErr.Message())
continue
default:
// noting to do here. Falling through to general error handling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you will wrap and ignore rest of the errors by SoftError, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say ignoring them, but yes, they're handled by NewSoftError(...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants