Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

CREATE_FAILED AWS::EFS::MountTarget Subnet0MountTarget mount target already exists in this AZ #208

Closed
nicerobot opened this issue Jan 6, 2017 · 22 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. waiting for community feedback

Comments

@nicerobot
Copy link

Following the docs and customizing cluster.yaml, i reach

# Create MountTargets for a pre-existing Elastic File System (Amazon EFS). Enter the resource id, eg "fs-47a2c22e"
# This is a NFS share that will be available across the entire cluster through a hostPath volume on the "/efs" mountpoint
#
# You can create a new EFS volume using the CLI:
# $ aws efs create-file-system --creation-token $(uuidgen)
# elasticFileSystemId: fs-47a2c22e

If i specify my existing elasticFileSystemId which is already in use and allocated to the subnets I'm specifying for the AWS-native auto-scaling:

subnets:
  - availabilityZone: us-east-1b
    instanceCIDR: "10.0.190.0/24"
  - availabilityZone: us-east-1d
    instanceCIDR: "10.0.191.0/24"
elasticFileSystemId: fs-XXXXXXXX

It appears that kube-aws is attempting to mount the subnets to my existing EFS which causes it to fail:

Creating AWS resources. This should take around 5 minutes.
Error: Error creating cluster: Stack creation failed: CREATE_FAILED : The following resource(s) failed to create: [Subnet1MountTarget, IAMInstanceProfileWorker, Subnet0MountTarget, ExternalDNS, IAMInstan
ceProfileController, IAMInstanceProfileEtcd].

Printing the most recent failed stack events:
CREATE_FAILED AWS::CloudFormation::Stack spatial-kube The following resource(s) failed to create: [Subnet1MountTarget, IAMInstanceProfileWorker, Subnet0MountTarget, ExternalDNS, IAMInstanceProfileControl
ler, IAMInstanceProfileEtcd].
CREATE_FAILED AWS::EFS::MountTarget Subnet1MountTarget mount target already exists in this AZ
CREATE_FAILED AWS::EFS::MountTarget Subnet0MountTarget mount target already exists in this AZ

I've removed the following from stack-template.json:

{{if $.ElasticFileSystemID}}
,
"{{$subnetLogicalName}}MountTarget": {
  "Properties" : {
    "FileSystemId": "{{$.ElasticFileSystemID}}",
    "SubnetId": { "Ref": "{{$subnetLogicalName}}" },
    "SecurityGroups": [ { "Ref": "SecurityGroupMountTarget" } ]
  },
  "Type" : "AWS::EFS::MountTarget"
}
{{end}}

But maybe there should be another flag that asks whether to mount.

@mumoshu
Copy link
Contributor

mumoshu commented Jan 6, 2017

@nicerobot Thanks for re-creating this!

Revisiting my comment on the former issue coreos/coreos-kubernetes#802 (comment);
Without mount targets, elasticFileSystemId does almost nothing useful. Then my question is: why do you need to specify elasticFileSystemId in the first place?

@nicerobot
Copy link
Author

@mumoshu I have provided an EFS Id of an existing resource that is already mounted on the subnets in which I'm placing k8s. What I want is simply to be able to specify the EFS Id and have a flag to indicate that it doesn't need to mount the subnets (alternatively, somehow check before creation if EFS is mounted to the needed subnets).

@mumoshu
Copy link
Contributor

mumoshu commented Jan 10, 2017

@nicerobot Ah, so if you had the flag to cancel creating mount targets, the EFS will still be mounted successfully via efs.service in cloud-config-worker and cloud-config-controller, right?

@nicerobot
Copy link
Author

@mumoshu Hmmm ... 🤷‍♂️ kube-aws is still pretty new to me. I was just looking for the quickest way to get a k8s cluster integrated into our existing infrastructure and when I tried to assign elasticFileSystemId to our existing volume id, i got that error during up. When I looked at what it was trying to do, it seemed like it was trying to mount already mounted subnets (unfortunately I no longer have the logs). I've since commented out the EFSId so that I can build a cluster and am now longer clear why it failed given that it properly created the subnets, they should be mountable. Anyway, I'm going to destroy it and try again while we're still not reliant on the services there yet. I'll get back to you.

@mumoshu
Copy link
Contributor

mumoshu commented Jan 10, 2017

@nicerobot Thanks for replying!

Please forgive me if I had gotten too deep in my previous comment. I'm open for questions/discussions to resolve your issue anyways.

AFAIK:

  • A subnet is not mounted but an EFS is
  • A subnet is not mounted but rather associated to mount target(s) before the EFS is mounted to worker/controller nodes. In other words, you associate a subnet to a mount target to make the EFS mountable to the nodes in the subnet.
  • You can't have two or more mount targets in an AZ hence the error(CREATE_FAILED AWS::EFS::MountTarget Subnet0MountTarget mount target already exists in this AZ) which seems to indicate that you do already have the mount target for the EFS in us-east-1b and us-east-1d

So:

  • Do you have a mount target for the EFS fs-47a2c22e?
  • Could you locate existing mount targets for the EFS respectively in us-east-1b and us-east-1d?

If so, could you try going ahead with removing {{if $.ElasticFileSystemID}}~{{end}} and kube-aws up, assign subnets created by kube-aws to the existing mount targets, and then run sudo systemctl restart efs.service in every worker/controller node.

If that works, I guess the flag added to kube-aws as you've suggested makes sense to me.

@nicerobot
Copy link
Author

nicerobot commented Jan 10, 2017

@mumoshu So i just uncommented elasticFileSystemId and tried up again and it failed the same way again.

It makes sense again. I already have all the AZs mounted for the EFS volume which is why I needed the flag to ignore the mount. I was confusing myself with subnet references instead of thinking of AZs.

Anyway, I can try again with the removing of the mount from the template again. When I did that before, the cluster wouldn't start because of the efs.service failure which I didn't know how to resolve. But since I already have all the AZs mounted on the EFS volume, I don't think I can add them manually, right?

@mumoshu
Copy link
Contributor

mumoshu commented Jan 10, 2017

@nicerobot Thanks for the info again 👍

AFAIK a kube-aws cluster won't fail to start because of errors in efs.service. Could you confirm that the resulting cluster does work without efs.service running?

Also, could you manually update mount targets from e.g. the AWS console to include the subnets created via kube-aws after kube-aws up finished, and then restart efs.service in each node w/ sudo systemctl restart efs.service so that in theory your cluster could mount the EFS.

@nicerobot
Copy link
Author

@mumoshu 👍 Will do. It's starting now. Also, the efs.service issue might have only been coincidental with other issues I had. It was one of the first few times I up'd the cluster so i was still kinda lost looking 👀 for what i'd done wrong 😊. Give me about 20-30min... Thanks

@nicerobot
Copy link
Author

@mumoshu Ok, indeed, the cluster seems fine. The efs.service error appear on the controller node which is why i thought the cluster would be unstable.

Failed Units: 1
  efs.service

So, to fix that error, I have to add the kube-aws subnets to EFS mount targets but we have that volume mounted across other hosts across AZs on different subnets. And it only allows one subnet per AZ. So it seems like we can't even use this volume with k8s?

@mumoshu
Copy link
Contributor

mumoshu commented Jan 12, 2017

@nicerobot Thanks for the info!
I'd been misunderstanding how a mount target works.
Then, I guess you need to deploy at least a part of your worker nodes, which you'd like to mount EFS to, to the existing subnet which is assigned to the mount target in concern.
#227 is in progress for that.

@nicerobot
Copy link
Author

@mumoshu Thanks. No worries. I can make due without EFS for the time being. And we might be transitioning to tectonic once install supports targeting an existing VPC where I'm hopeful this'll already be available ;-)

@mumoshu
Copy link
Contributor

mumoshu commented Feb 1, 2017

@nicerobot Hi, deployments to existing subnets is experimentally supported since v0.9.4-rc.1 released today.
If you had a chance to do so, would you mind trying? 😃
You can specify existing subnets like:

subnets:
  - name: SubnetWithEFS
     availabilityZone: ...
     id: subnet-...
worker:
  subnets:
  - name: SubnetWithEFS

Please be sure that you've properly associated the existing subnet with the EFS like we've discussed above.

@mumoshu mumoshu added this to the v0.9.4-rc.1 milestone Feb 1, 2017
@nicerobot
Copy link
Author

nicerobot commented Feb 5, 2017

@mumoshu Nice. Thanks. I've tried it but something seems to be failing on the controller.

Container Linux by CoreOS stable (1235.9.0)
Failed Units: 1
  coreos-cloudinit-505743134.service

Though I'm not so sure it's due to the new config. I've reverted to the prior cluster.yml and the cluster still doesn't start.

@nicerobot
Copy link
Author

@mumoshu So i reverted to kube-aws version v0.9.2-rc.5 and my original cluster.yml still works. So it seems like kube-aws version v0.9.4-rc.1 has introduced something breaking. My original config causes while updating the it to add the new subnets feature causes the above failure.

Container Linux by CoreOS stable (1235.9.0)
Update Strategy: No Reboots
Failed Units: 1
  docker.service

@mumoshu
Copy link
Contributor

mumoshu commented Feb 6, 2017

Hi @nicerobot thanks for the feedback!
Two things:

  • I guess you've tried to run kube-aws update for upgrading?
    • If so, would you mind running kube-aws destroy and then kube-aws up to completely recreate your cluster? kube-aws update isn't intended for upgrading kube-aws.
  • Could you summarize changes applied to your cluster.yaml? I'm not yet sure your configuration but anyways failingdocker.service and/or coreos-cloudinit-505743134.service tend to be caused by missing internet connectivity among nodes originating from e.g. reused subnets without proper route table configuration.

@nicerobot
Copy link
Author

@mumoshu Yep. I run update before every up. I've destroyed and recreated the cluster four times, twice trying different settings for 0.9.4's new features and twice using different techniques to revert the configuration (undo vs git checkout) but still using 0.9.4.

Original:

subnets:
  - availabilityZone: us-east-1b
    instanceCIDR: "10.0.190.0/24"
  - availabilityZone: us-east-1d
    instanceCIDR: "10.0.191.0/24"

vs new subnets feature:

worker:
  subnets:
  - name: SubnetWithEFSb
  - name: SubnetWithEFSd

subnets:
  - name: SubnetWithEFSb
    availabilityZone: us-east-1b
    id: subnet-7cfce557
  - name: SubnetWithEFSd
    availabilityZone: us-east-1d
    id: subnet-69cee330

elasticFileSystemId: fs-fc04feb5

@mumoshu
Copy link
Contributor

mumoshu commented Feb 7, 2017

@nicerobot Thanks again for the feedback!

Excuse me if I've missed writing documentation for that but:

  • AFAIK existing subnets subnet-7cfce557 and subnet-69cee330 and route table(s) assigned to those must have been tagged like KubernetesCluster=<YOUR KUBE-AWS CLUSTER NAME> for kubernetes to work.
    Tagging must be done (at least for now) on your side because kube-aws tries its best NOT to modify existing(=managed by not kube-aws but you) aws resources. https://github.com/kubernetes/community/blob/master/contributors/design-proposals/aws_under_the_hood.md
  • Also you must have an internet gateway properly configured for your VPC and the subnets subnet-7cfce557 and subnet-69cee330 must have valid route(s) to the internet gateway to enable internet connection or your node won't come up.

@nicerobot
Copy link
Author

@mumoshu Thanks. No worries. Good information. I'll Make those updates and try again this weekend (I hope).

@mumoshu mumoshu removed this from the v0.9.4-rc.1 milestone Mar 1, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 21, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 21, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. waiting for community feedback
Projects
None yet
Development

No branches or pull requests

4 participants