AWX Operator Fails to Install AWX Containers/Instance #284

bandwiches · 2021-05-05T14:20:12Z

ISSUE TYPE

AWX Operator fails to perform installation.

SUMMARY

Had an instance of 17.0.1 running, don't care if the data persists either.

Performed data migration following Data Migration instructions.

Performed install of AWX Operator following INSTALL.md

minikube v1.18.1 was installed during this time following links from INSTALL.md

ENVIRONMENT

AWX version: 19.1.0
Operator version: 0.9.0
Kubernetes version: 1.20.2
AWX install method: operator

STEPS TO REPRODUCE

Follow INSTALL.md

EXPECTED RESULTS

Expected to see pods/AWX instance

ACTUAL RESULTS

minikube kubectl apply -- -f myawx.yml

After 30 minutes only the orchestrator is running, tailing the logs shows a looping error.

ADDITIONAL INFORMATION

xxx@yyy:~$ minikube kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
awx-operator-5595d6fc57-hdj9d   1/1     Running   0          29m

xxx@yyy:~$ minikube version
minikube version: v1.18.1

AWX-OPERATOR LOGS

{
  "level": "error",
  "ts": 1620223136.6924627,
  "logger": "logging_event_handler",
  "msg": "",
  "name": "custom.name.awx", 
  "namespace": "default",
  "gvk": "awx.ansible.com/v1beta1,Kind=AWX",
  "event_type": "runner_on_failed",
  "job": "2601737961087659062",
  "EventData.Task": "Create Database if no database is specified",
  "EventData.TaskArgs": "",
  "EventData.FailedTaskPath": "/opt/ansible/roles/installer/tasks/database_configuration.yml:68",
  "error": "[playbook task failed]",
  "stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/events.loggingEventHandler.Handle\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/events/log_events.go:87"
}

The text was updated successfully, but these errors were encountered:

bandwiches · 2021-05-05T15:49:04Z

Blew the entire thing away and restarted fresh. Service pods are stuck 0/4 pending. It's been an additional 45 minutes now.

This is the exact task that continuously fails over and over again with no real output/log.

TASK [installer : Apply deployment resources] **********************************
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:34

Output

{
    "level":"error",
    "ts":1620229749.932002,
    "logger":"controller-runtime.controller",
    "msg":"Reconciler error",
    "controller":"awx-controller",
    "request":"default/awx",
    "error":"event runner on failed",
    "stacktrace": 
        "github.com/go-logr/zapr.(*zapLogger).Error
            pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
        sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
            pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
        sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
            pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
        sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
            pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
        k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
            pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155
        k8s.io/apimachinery/pkg/util/wait.BackoffUntil
            pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156
        k8s.io/apimachinery/pkg/util/wait.JitterUntil
            pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133
        k8s.io/apimachinery/pkg/util/wait.Until
            pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"
}

Logs
awx-web: stale after image with the following two lines

[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

redis: stale after image downloads
awx-task: nothing
awx-ee: nothing

tchellomello · 2021-05-05T22:08:03Z

@bandwiches we will need more information to understand what is going on.
Please send us the following:

kubectl get awx -o yaml awx

kubectl describe deployment awx

kubectl describe statefulset awx-postgres

kubectl get pods 

kubectl get events

Thanks!

exodusprime1337 · 2021-05-06T03:27:37Z

I'm seeing the same thing in minikube. here is the output from mine. Saw the same error in kubernetes on centos7 as well. Fresh install with all latest binaries.

describe_awx.txt
describe_stateful.txt
events.txt
get_awx.txt
pods.txt

bandwiches · 2021-05-06T14:09:32Z

@bandwiches we will need more information to understand what is going on.
Please send us the following:
kubectl get awx -o yaml awx

kubectl describe deployment awx

kubectl describe statefulset awx-postgres

kubectl get pods 

kubectl get events
Thanks!

For the sake of clarity, I feel I should state that I'm using minikube since it is recommended by the AWX install guide.

get_awx.txt
describe_deployment_awx.txt
describe_statefulset.txt
get_pods.txt
get_events.txt

(Edit) I see a CPU warning (insufficient CPU) for the AWX pod. I have to say, this is a dedicated VM w/2 CPU and 2GB RAM. This VM has had no issues running AWX v15 and v17. New install method introduced in v19 all of a sudden complains about resources? Understandable that this could change from version to version, but it would be nice to know minimal system requirements now that it's an issue.

exodusprime1337 · 2021-05-06T14:14:29Z

Here is the snippet of the error i'm seeing which i believe is exactly like @bandwiches error.

{"level":"error","ts":1620310325.2259731,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"awx-controller","request":"default/awx","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"}
{"level":"info","ts":1620310327.3738635,"logger":"logging_event_handler","msg":"[playbook task]","name":"awx","namespace":"default","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"261049867304784443","EventData.Name":"installer : Patching labels to AWX kind"}

bandwiches · 2021-05-06T14:18:47Z

@exodusprime1337

Spot on.

abcqwertz · 2021-05-07T08:56:06Z

I have the exact same error, but on a bare-metal kubernetes cluster:
AWX version: 19.1.0
Operator version: 0.9.0
Kubernetes version: v1.21.0 with containerd 1.4.4
AWX install method: operator

tchellomello · 2021-05-11T04:11:27Z

@bandwiches we will need more information to understand what is going on.
Please send us the following:
kubectl get awx -o yaml awx

kubectl describe deployment awx

kubectl describe statefulset awx-postgres

kubectl get pods 

kubectl get events
Thanks!
For the sake of clarity, I feel I should state that I'm using minikube since it is recommended by the AWX install guide.

get_awx.txt
describe_deployment_awx.txt
describe_statefulset.txt
get_pods.txt
get_events.txt

(Edit) I see a CPU warning (insufficient CPU) for the AWX pod. I have to say, this is a dedicated VM w/2 CPU and 2GB RAM. This VM has had no issues running AWX v15 and v17. New install method introduced in v19 all of a sudden complains about resources? Understandable that this could change from version to version, but it would be nice to know minimal system requirements now that it's an issue.

For your case, it looks the issue is related with the CPU (like you mentioned)

NAME                            READY   STATUS    RESTARTS   AGE
awx-5b58db49c-9gslf             0/4     Pending   0          7m3s
awx-operator-5595d6fc57-92txg   1/1     Running   0          10m
awx-postgres-0                  1/1     Running   0          7m14s

LAST SEEN   TYPE      REASON                    OBJECT                                          MESSAGE
87s         Warning   FailedScheduling          pod/awx-5b58db49c-9gslf                         0/1 nodes are available: 1 Insufficient cpu.

Looking at your deployment, we can see it's using the default resource limits:

   awx-web:
    Image:      quay.io/ansible/awx:19.1.0
    Port:       8052/TCP
    Host Port:  0/TCP
    Requests:
      cpu:     1
      memory:  2Gi

....

   awx-task:
    Image:      quay.io/ansible/awx:19.1.0
    Port:       <none>
    Host Port:  <none>
    Args:
      /usr/bin/launch_awx_task.sh
    Requests:
      cpu:     500m
      memory:  1Gi

....

Please note the suggested values (memory and cpu) are still the same (see https://github.com/ansible/awx-operator/pull/93/files) and you can override it to fulfill your needs. That should the job for you. Please let us know.

tchellomello · 2021-05-11T04:17:41Z

I'm seeing the same thing in minikube. here is the output from mine. Saw the same error in kubernetes on centos7 as well. Fresh install with all latest binaries.

describe_awx.txt
describe_stateful.txt
events.txt
get_awx.txt
pods.txt

Same thing here @exodusprime1337

LAST SEEN   TYPE      REASON                    OBJECT                                          MESSAGE
2s          Warning   FailedScheduling          pod/awx-5b58db49c-bfwnt                         0/1 nodes are available: 1 Insufficient memory.
21m         Normal    SuccessfulCreate          replicaset/awx-5b58db49c                        Created pod: awx-5b58db49c-bfwnt

   awx-web:
    Image:      quay.io/ansible/awx:19.1.0
    Port:       8052/TCP
    Host Port:  0/TCP
    Requests:
      cpu:     1
      memory:  2Gi

    Requests:
      cpu:     500m
      memory:  1Gi

If you run kubectl get nodes <NODE_NAME> -o yaml, you shall see the amount of memory for your node:

  allocatable:
    cpu: 7800m
    ephemeral-storage: "222240964241"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 31547268Ki
    pods: "250"
  capacity:
    cpu: "8"
    ephemeral-storage: 235495Mi
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 32173956Ki
    pods: "250"


> kubectl top nodes                                                                                                                                                                                                                                                                       
NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%     
p70      763m         9%     12685Mi         41%

bandwiches · 2021-05-11T14:33:42Z

@tchellomello thanks for the update there. I'm following you, but I have a serious concern about the AWX install tutorial since it gives a bare minimum config and that leads to this result. Perhaps there should be more cross-communication between the two packages to ensure that the minimal config is actually the bare minimum? These settings are never mentioned in install doc.

Edit -
Per your link, I noticed both of these.
awx_v1beta1_molecule.yml (cpu: 500m, memory: 128M // cpu: 500m, memory: 128M)
installer\defaults\main.yml (cpu: 1000m, memory: 1Gi // cpu: 500m, memory: 2Gi)

One issue is that AWX INSTALL.md doesn't have any mention of minimal requirements thus making the transition from v17 to v19 even harder since what worked before, may no longer work as "default". While I understand requirements may change, it would also be nice to know that the minimal requirements/default have changed.

tchellomello · 2021-05-11T18:55:58Z

@bandwiches I hear you, I agree that the documentation has lots of room to improve, and please if you see any place that could use some enhancement, do not hesitate to submit a PR.

In regards to the https://github.com/ansible/awx-operator/blob/devel/deploy/crds/awx_v1beta1_molecule.yaml, that is used on the molecule tests here -> https://github.com/ansible/awx-operator/blob/devel/molecule/test-local/converge.yml#L31 so that is totally different scenario and should not necessarily be consistent as for this test we don't need to allocate that mount of memory and cpu.

bandwiches · 2021-05-11T19:10:32Z

@bandwiches I hear you, I agree that the documentation has lots of room to improve, and please if you see any place that could use some enhancement, do not hesitate to submit a PR.

I would love to, except I think the awx repo is outpacing awx-operator and making the inconsistencies impossible to fix.

In regards to your response about system settings - understood and that's fair, no qualms about that.

I was running into another issue once I was able to resolve the resources issue and I feel it's actually still appropriate here. The awx-service was not externally reachable by default (regardless of Ingress or NodePort). The issue was actually related to IPTABLES not adding a rule to allow the destination port for the service.

minikube service awx-service --url returns the IP:PORT, but that PORT is never allowed through iptables. Adding a rule to the DOCKER chain on the dport jumping to ACCEPT fixed this.

Second issue - minikube service IP. I don't see anywhere that this is configurable, however I'll admit that I may be overlooking it given how many different repo's I've had to visit today. This actually presents 2 issues (1) now we're required to route to the host first for the underlying subnet access and (2) there's no consideration for organizational overlap if that subnet is already in use. I believe the default underlying network is 192.168.49.0/24 which is huge for a bridge/transit network and increases the risk of overlap.

fsdrw08 · 2021-07-13T09:46:37Z

Hi bandwiches
Great thanks for your hint, I have the same issue to deploy ansible awx on k3s cluseter in a VM, and no idea what happen and how to trouble shooting, regarding to your post, finally I increase my ansible awx VM host memory and CPU core, and the problem get fix.

dgoldsmith · 2022-03-11T03:32:46Z

I am having an issue installing AWX under K3S on both CentOS 8.5 and Rocky 8.5. I am trying to follow the steps from the below blog post:

https://computingforgeeks.com/install-and-configure-ansible-awx-on-centos/

I have repeated the build multiple times stepping up my VM CPU/RAM resources. Initially tried 2c/4g then 4g/8g, then 8c/16gb. The VM has a single 50GB disk with automatic partitioning from the OS installer. The Minimal software selection was used.

I have disabled both SELinux and firewalld.

K3s installs and I can deploy the AWX Operator. When I try installing AWX with the command "kubectl apply -f awx-instance-deployment.yml -n awx", it appears to successfully deploy the aws-postgresql container but never starts to deploy the 4 pod AWX container.

The output of the following commands is included here:

kubectl get awx -o yaml awx
kubectl describe deployment awx
kubectl describe statefulset awx-postgres
kubectl get pods
kubectl get events

debug.get-awx.txt
debug.describe-deployment.txt
debug.describe-statefulset.txt
debug.get-pods.txt
debug.get-events.txt

Thanks,
David Goldsmith

dgoldsmith · 2022-03-11T03:42:23Z

And then I searched the current issues here and found this:

#824

Seems to be my issue -- chanign nodeport to clusterip allowed it to bring up the AWX pods.

[root@d-1-cfg-awx-c8 awx-operator]# kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-operator-controller-manager-5ddf49cc4f-j94tb 2/2 Running 0 37m
awx-postgres-0 1/1 Running 0 30m
awx-5d9669598d-hd8tk 4/4 Running 0 3m57s

Now I need to work on the ClusterIP access as it does not seem to be mapping to a high port for access:

[root@d-1-cfg-awx-c8 awx-operator]# kubectl get service -n awx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
awx-operator-controller-manager-metrics-service ClusterIP 10.43.87.77 8443/TCP 35m
awx-postgres ClusterIP None 5432/TCP 27m
awx-service ClusterIP 10.43.54.59 80/TCP 110s

jermigonis · 2023-07-20T21:05:38Z

Don't mean to necrobump this, but I just installed it following https://computingforgeeks.com/how-to-install-ansible-awx-on-ubuntu-linux and the issue I had with Redis not launching was that the CPU needed to be set to host in Proxmox, not KVM or QEMU

tchellomello added the state:needs_info label May 5, 2021

tchellomello added the worked_for_me label May 11, 2021

bandwiches mentioned this issue May 11, 2021

Stop using alpha software and improve upgrades ansible/awx#10166

Closed

tchellomello mentioned this issue May 25, 2021

Introducing service type definition and reworking Ingress rules #330

Merged

shanemcd closed this as completed in #330 Jun 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWX Operator Fails to Install AWX Containers/Instance #284

AWX Operator Fails to Install AWX Containers/Instance #284

bandwiches commented May 5, 2021

bandwiches commented May 5, 2021 •

edited

tchellomello commented May 5, 2021

exodusprime1337 commented May 6, 2021 •

edited

bandwiches commented May 6, 2021 •

edited

exodusprime1337 commented May 6, 2021

bandwiches commented May 6, 2021

abcqwertz commented May 7, 2021 •

edited

tchellomello commented May 11, 2021 •

edited

tchellomello commented May 11, 2021 •

edited

bandwiches commented May 11, 2021 •

edited

tchellomello commented May 11, 2021 •

edited

bandwiches commented May 11, 2021

fsdrw08 commented Jul 13, 2021

dgoldsmith commented Mar 11, 2022

dgoldsmith commented Mar 11, 2022

jermigonis commented Jul 20, 2023

AWX Operator Fails to Install AWX Containers/Instance #284

AWX Operator Fails to Install AWX Containers/Instance #284

Comments

bandwiches commented May 5, 2021

ISSUE TYPE

SUMMARY

ENVIRONMENT

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS

ADDITIONAL INFORMATION

AWX-OPERATOR LOGS

bandwiches commented May 5, 2021 • edited

tchellomello commented May 5, 2021

exodusprime1337 commented May 6, 2021 • edited

bandwiches commented May 6, 2021 • edited

exodusprime1337 commented May 6, 2021

bandwiches commented May 6, 2021

abcqwertz commented May 7, 2021 • edited

tchellomello commented May 11, 2021 • edited

tchellomello commented May 11, 2021 • edited

bandwiches commented May 11, 2021 • edited

tchellomello commented May 11, 2021 • edited

bandwiches commented May 11, 2021

fsdrw08 commented Jul 13, 2021

dgoldsmith commented Mar 11, 2022

dgoldsmith commented Mar 11, 2022

jermigonis commented Jul 20, 2023

bandwiches commented May 5, 2021 •

edited

exodusprime1337 commented May 6, 2021 •

edited

bandwiches commented May 6, 2021 •

edited

abcqwertz commented May 7, 2021 •

edited

tchellomello commented May 11, 2021 •

edited

tchellomello commented May 11, 2021 •

edited

bandwiches commented May 11, 2021 •

edited

tchellomello commented May 11, 2021 •

edited