Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS v0.19.5 Creating cluster in Docker fails at some point #8123

Open
abregar opened this issue May 9, 2024 · 2 comments
Open

EKS v0.19.5 Creating cluster in Docker fails at some point #8123

abregar opened this issue May 9, 2024 · 2 comments

Comments

@abregar
Copy link

abregar commented May 9, 2024

Considering this a problem classification. Tried to initialize dev cluster on Macos (Sonoma 14.4.1), Docker desktop v4.30.0 as documentation suggests with a higher verbosity level:

eksctl anywhere create cluster -f $CLUSTER_NAME.yaml -v 9

Using latest release as in:

Initializing long running container     {"name": "eksa_1715243480381280000", "image": "public.ecr.aws/eks-anywhere/cli-tools:v0.19.5-eks-a-65"}

Initialization goes well, containers for control-plane, lb, etcd, .. are successfully created. But creation process then stops at this point:

24-05-09T10:52:50.466+0200    V1      cleaning up temporary namespace  for diagnostic collectors      {"namespace": "eksa-diagnostics"}
2024-05-09T10:52:50.466+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-05-09T10:52:50.466+0200    V6      Executing command       {"cmd": "/usr/local/bin/docker exec -i eksa_1715244428714146000 kubectl delete namespace eksa-diagnostics --kubeconfig mgmt/mgmt-eks-a-cluster.kubeconfig"}
2024-05-09T10:52:55.641+0200    V5      Retry execution successful      {"retries": 1, "duration": "5.175007875s"}
2024-05-09T10:52:55.642+0200    V4      Task finished   {"task_name": "collect-cluster-diagnostics", "duration": "17.227805209s"}
2024-05-09T10:52:55.642+0200    V4      ----------------------------------
2024-05-09T10:52:55.642+0200    V4      Saving checkpoint       {"file": "mgmt-checkpoint.yaml"}
2024-05-09T10:52:55.643+0200    V4      Tasks completed {"duration": "5m38.393764542s"}
2024-05-09T10:52:55.643+0200    V3      Cleaning up long running container      {"name": "eksa_1715244428714146000"}
2024-05-09T10:52:55.643+0200    V6      Executing command       {"cmd": "/usr/local/bin/docker rm -f -v eksa_1715244428714146000"}
Error: creating namespace eksa-system: The connection to the server localhost:8080 was refused - did you specify the right host or port?

To me, it looks like that temporary container is rm too early and script does not handle the missing kubeconfig then.

So, questions - is this considered a bug, is it possible to workaround quickly somehow and is there a possibility to continue the cluster creation procedure from the failing point?

@sp1999
Copy link
Member

sp1999 commented May 14, 2024

Hey @abregar, did you have the KUBECONFIG env variable set when creating the cluster? If yes, can you unset it and try recreating the cluster again?

@abregar
Copy link
Author

abregar commented May 23, 2024

no, KUBECONFIG was not set. Also tried new release EKS v0.19.6 and is still failing for me at the same point.
Any other hints what should I try to check or modify in some script?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants