Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWX Operator Fails to Install AWX Containers/Instance #284

Closed
bandwiches opened this issue May 5, 2021 · 16 comments · Fixed by #330
Closed

AWX Operator Fails to Install AWX Containers/Instance #284

bandwiches opened this issue May 5, 2021 · 16 comments · Fixed by #330

Comments

@bandwiches
Copy link

ISSUE TYPE

AWX Operator fails to perform installation.

SUMMARY

Had an instance of 17.0.1 running, don't care if the data persists either.

Performed data migration following Data Migration instructions.

Performed install of AWX Operator following INSTALL.md

  • minikube v1.18.1 was installed during this time following links from INSTALL.md
ENVIRONMENT
  • AWX version: 19.1.0
  • Operator version: 0.9.0
  • Kubernetes version: 1.20.2
  • AWX install method: operator
STEPS TO REPRODUCE

Follow INSTALL.md

EXPECTED RESULTS

Expected to see pods/AWX instance

ACTUAL RESULTS

minikube kubectl apply -- -f myawx.yml

After 30 minutes only the orchestrator is running, tailing the logs shows a looping error.

ADDITIONAL INFORMATION
xxx@yyy:~$ minikube kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
awx-operator-5595d6fc57-hdj9d   1/1     Running   0          29m
xxx@yyy:~$ minikube version
minikube version: v1.18.1
AWX-OPERATOR LOGS
{
  "level": "error",
  "ts": 1620223136.6924627,
  "logger": "logging_event_handler",
  "msg": "",
  "name": "custom.name.awx", 
  "namespace": "default",
  "gvk": "awx.ansible.com/v1beta1,Kind=AWX",
  "event_type": "runner_on_failed",
  "job": "2601737961087659062",
  "EventData.Task": "Create Database if no database is specified",
  "EventData.TaskArgs": "",
  "EventData.FailedTaskPath": "/opt/ansible/roles/installer/tasks/database_configuration.yml:68",
  "error": "[playbook task failed]",
  "stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/events.loggingEventHandler.Handle\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/events/log_events.go:87"
}
@bandwiches
Copy link
Author

bandwiches commented May 5, 2021

Blew the entire thing away and restarted fresh. Service pods are stuck 0/4 pending. It's been an additional 45 minutes now.

This is the exact task that continuously fails over and over again with no real output/log.

TASK [installer : Apply deployment resources] **********************************
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:34

Output

{
    "level":"error",
    "ts":1620229749.932002,
    "logger":"controller-runtime.controller",
    "msg":"Reconciler error",
    "controller":"awx-controller",
    "request":"default/awx",
    "error":"event runner on failed",
    "stacktrace": 
        "github.com/go-logr/zapr.(*zapLogger).Error
            pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
        sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
            pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
        sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
            pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
        sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
            pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
        k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
            pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155
        k8s.io/apimachinery/pkg/util/wait.BackoffUntil
            pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156
        k8s.io/apimachinery/pkg/util/wait.JitterUntil
            pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133
        k8s.io/apimachinery/pkg/util/wait.Until
            pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"
}

Logs
awx-web: stale after image with the following two lines

[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

redis: stale after image downloads
awx-task: nothing
awx-ee: nothing

@tchellomello
Copy link
Contributor

@bandwiches we will need more information to understand what is going on.
Please send us the following:

kubectl get awx -o yaml awx

kubectl describe deployment awx

kubectl describe statefulset awx-postgres

kubectl get pods 

kubectl get events

Thanks!

@exodusprime1337
Copy link

exodusprime1337 commented May 6, 2021

I'm seeing the same thing in minikube. here is the output from mine. Saw the same error in kubernetes on centos7 as well. Fresh install with all latest binaries.

describe_awx.txt
describe_stateful.txt
events.txt
get_awx.txt
pods.txt

@bandwiches
Copy link
Author

bandwiches commented May 6, 2021

@bandwiches we will need more information to understand what is going on.
Please send us the following:

kubectl get awx -o yaml awx

kubectl describe deployment awx

kubectl describe statefulset awx-postgres

kubectl get pods 

kubectl get events

Thanks!

For the sake of clarity, I feel I should state that I'm using minikube since it is recommended by the AWX install guide.

get_awx.txt
describe_deployment_awx.txt
describe_statefulset.txt
get_pods.txt
get_events.txt

(Edit) I see a CPU warning (insufficient CPU) for the AWX pod. I have to say, this is a dedicated VM w/2 CPU and 2GB RAM. This VM has had no issues running AWX v15 and v17. New install method introduced in v19 all of a sudden complains about resources? Understandable that this could change from version to version, but it would be nice to know minimal system requirements now that it's an issue.

@exodusprime1337
Copy link

Here is the snippet of the error i'm seeing which i believe is exactly like @bandwiches error.

{"level":"error","ts":1620310325.2259731,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"awx-controller","request":"default/awx","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"}
{"level":"info","ts":1620310327.3738635,"logger":"logging_event_handler","msg":"[playbook task]","name":"awx","namespace":"default","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"261049867304784443","EventData.Name":"installer : Patching labels to AWX kind"}

@bandwiches
Copy link
Author

@exodusprime1337

Spot on.

@abcqwertz
Copy link

abcqwertz commented May 7, 2021

I have the exact same error, but on a bare-metal kubernetes cluster:
AWX version: 19.1.0
Operator version: 0.9.0
Kubernetes version: v1.21.0 with containerd 1.4.4
AWX install method: operator

@tchellomello
Copy link
Contributor

tchellomello commented May 11, 2021

@bandwiches we will need more information to understand what is going on.
Please send us the following:

kubectl get awx -o yaml awx

kubectl describe deployment awx

kubectl describe statefulset awx-postgres

kubectl get pods 

kubectl get events

Thanks!

For the sake of clarity, I feel I should state that I'm using minikube since it is recommended by the AWX install guide.

get_awx.txt
describe_deployment_awx.txt
describe_statefulset.txt
get_pods.txt
get_events.txt

(Edit) I see a CPU warning (insufficient CPU) for the AWX pod. I have to say, this is a dedicated VM w/2 CPU and 2GB RAM. This VM has had no issues running AWX v15 and v17. New install method introduced in v19 all of a sudden complains about resources? Understandable that this could change from version to version, but it would be nice to know minimal system requirements now that it's an issue.

For your case, it looks the issue is related with the CPU (like you mentioned)

NAME                            READY   STATUS    RESTARTS   AGE
awx-5b58db49c-9gslf             0/4     Pending   0          7m3s
awx-operator-5595d6fc57-92txg   1/1     Running   0          10m
awx-postgres-0                  1/1     Running   0          7m14s

LAST SEEN   TYPE      REASON                    OBJECT                                          MESSAGE
87s         Warning   FailedScheduling          pod/awx-5b58db49c-9gslf                         0/1 nodes are available: 1 Insufficient cpu.

Looking at your deployment, we can see it's using the default resource limits:

   awx-web:
    Image:      quay.io/ansible/awx:19.1.0
    Port:       8052/TCP
    Host Port:  0/TCP
    Requests:
      cpu:     1
      memory:  2Gi

....

   awx-task:
    Image:      quay.io/ansible/awx:19.1.0
    Port:       <none>
    Host Port:  <none>
    Args:
      /usr/bin/launch_awx_task.sh
    Requests:
      cpu:     500m
      memory:  1Gi

....

Please note the suggested values (memory and cpu) are still the same (see https://github.com/ansible/awx-operator/pull/93/files) and you can override it to fulfill your needs. That should the job for you. Please let us know.

@tchellomello
Copy link
Contributor

tchellomello commented May 11, 2021

I'm seeing the same thing in minikube. here is the output from mine. Saw the same error in kubernetes on centos7 as well. Fresh install with all latest binaries.

describe_awx.txt
describe_stateful.txt
events.txt
get_awx.txt
pods.txt

Same thing here @exodusprime1337

LAST SEEN   TYPE      REASON                    OBJECT                                          MESSAGE
2s          Warning   FailedScheduling          pod/awx-5b58db49c-bfwnt                         0/1 nodes are available: 1 Insufficient memory.
21m         Normal    SuccessfulCreate          replicaset/awx-5b58db49c                        Created pod: awx-5b58db49c-bfwnt

   awx-web:
    Image:      quay.io/ansible/awx:19.1.0
    Port:       8052/TCP
    Host Port:  0/TCP
    Requests:
      cpu:     1
      memory:  2Gi

    Requests:
      cpu:     500m
      memory:  1Gi

If you run kubectl get nodes <NODE_NAME> -o yaml, you shall see the amount of memory for your node:

  allocatable:
    cpu: 7800m
    ephemeral-storage: "222240964241"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 31547268Ki
    pods: "250"
  capacity:
    cpu: "8"
    ephemeral-storage: 235495Mi
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 32173956Ki
    pods: "250"


> kubectl top nodes                                                                                                                                                                                                                                                                       
NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%     
p70      763m         9%     12685Mi         41%         

@bandwiches
Copy link
Author

bandwiches commented May 11, 2021

@tchellomello thanks for the update there. I'm following you, but I have a serious concern about the AWX install tutorial since it gives a bare minimum config and that leads to this result. Perhaps there should be more cross-communication between the two packages to ensure that the minimal config is actually the bare minimum? These settings are never mentioned in install doc.

Edit -
Per your link, I noticed both of these.
awx_v1beta1_molecule.yml (cpu: 500m, memory: 128M // cpu: 500m, memory: 128M)
installer\defaults\main.yml (cpu: 1000m, memory: 1Gi // cpu: 500m, memory: 2Gi)

One issue is that AWX INSTALL.md doesn't have any mention of minimal requirements thus making the transition from v17 to v19 even harder since what worked before, may no longer work as "default". While I understand requirements may change, it would also be nice to know that the minimal requirements/default have changed.

@tchellomello
Copy link
Contributor

tchellomello commented May 11, 2021

@bandwiches I hear you, I agree that the documentation has lots of room to improve, and please if you see any place that could use some enhancement, do not hesitate to submit a PR.

In regards to the https://github.com/ansible/awx-operator/blob/devel/deploy/crds/awx_v1beta1_molecule.yaml, that is used on the molecule tests here -> https://github.com/ansible/awx-operator/blob/devel/molecule/test-local/converge.yml#L31 so that is totally different scenario and should not necessarily be consistent as for this test we don't need to allocate that mount of memory and cpu.

@bandwiches
Copy link
Author

@bandwiches I hear you, I agree that the documentation has lots of room to improve, and please if you see any place that could use some enhancement, do not hesitate to submit a PR.

I would love to, except I think the awx repo is outpacing awx-operator and making the inconsistencies impossible to fix.

In regards to your response about system settings - understood and that's fair, no qualms about that.

I was running into another issue once I was able to resolve the resources issue and I feel it's actually still appropriate here. The awx-service was not externally reachable by default (regardless of Ingress or NodePort). The issue was actually related to IPTABLES not adding a rule to allow the destination port for the service.

minikube service awx-service --url returns the IP:PORT, but that PORT is never allowed through iptables. Adding a rule to the DOCKER chain on the dport jumping to ACCEPT fixed this.

Second issue - minikube service IP. I don't see anywhere that this is configurable, however I'll admit that I may be overlooking it given how many different repo's I've had to visit today. This actually presents 2 issues (1) now we're required to route to the host first for the underlying subnet access and (2) there's no consideration for organizational overlap if that subnet is already in use. I believe the default underlying network is 192.168.49.0/24 which is huge for a bridge/transit network and increases the risk of overlap.

@fsdrw08
Copy link

fsdrw08 commented Jul 13, 2021

Hi bandwiches
Great thanks for your hint, I have the same issue to deploy ansible awx on k3s cluseter in a VM, and no idea what happen and how to trouble shooting, regarding to your post, finally I increase my ansible awx VM host memory and CPU core, and the problem get fix.

@dgoldsmith
Copy link

I am having an issue installing AWX under K3S on both CentOS 8.5 and Rocky 8.5. I am trying to follow the steps from the below blog post:

https://computingforgeeks.com/install-and-configure-ansible-awx-on-centos/

I have repeated the build multiple times stepping up my VM CPU/RAM resources. Initially tried 2c/4g then 4g/8g, then 8c/16gb. The VM has a single 50GB disk with automatic partitioning from the OS installer. The Minimal software selection was used.

I have disabled both SELinux and firewalld.

K3s installs and I can deploy the AWX Operator. When I try installing AWX with the command "kubectl apply -f awx-instance-deployment.yml -n awx", it appears to successfully deploy the aws-postgresql container but never starts to deploy the 4 pod AWX container.

The output of the following commands is included here:

kubectl get awx -o yaml awx
kubectl describe deployment awx
kubectl describe statefulset awx-postgres
kubectl get pods
kubectl get events

debug.get-awx.txt
debug.describe-deployment.txt
debug.describe-statefulset.txt
debug.get-pods.txt
debug.get-events.txt

Thanks,
David Goldsmith

@dgoldsmith
Copy link

And then I searched the current issues here and found this:

#824

Seems to be my issue -- chanign nodeport to clusterip allowed it to bring up the AWX pods.

[root@d-1-cfg-awx-c8 awx-operator]# kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-operator-controller-manager-5ddf49cc4f-j94tb 2/2 Running 0 37m
awx-postgres-0 1/1 Running 0 30m
awx-5d9669598d-hd8tk 4/4 Running 0 3m57s

Now I need to work on the ClusterIP access as it does not seem to be mapping to a high port for access:

[root@d-1-cfg-awx-c8 awx-operator]# kubectl get service -n awx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
awx-operator-controller-manager-metrics-service ClusterIP 10.43.87.77 8443/TCP 35m
awx-postgres ClusterIP None 5432/TCP 27m
awx-service ClusterIP 10.43.54.59 80/TCP 110s

@jermigonis
Copy link

Don't mean to necrobump this, but I just installed it following https://computingforgeeks.com/how-to-install-ansible-awx-on-ubuntu-linux and the issue I had with Redis not launching was that the CPU needed to be set to host in Proxmox, not KVM or QEMU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants