Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue creating persistent storage for /var/lib/awx/projects and postgresql - using awx-operator 0.8.0 #260

Closed
PugTheBlack opened this issue Apr 30, 2021 · 11 comments · Fixed by #303
Labels
type:bug Something isn't working

Comments

@PugTheBlack
Copy link

PugTheBlack commented Apr 30, 2021

I am currently testing out Kubernetes, and have set up a 4 node Microk8s Cluster on Ubuntu 20.04.2 LTS with Ceph (using the Rook operator branch 1.16)

4 virtual machines (4vCPU, 16GB RAM, 60GB OS disk + 200GB RAW disk for Ceph)

~# snap list
Name      Version   Rev    Tracking       Publisher   Notes
core18    20210309  1997   latest/stable  canonical✓  base
lxd       4.0.5     19647  4.0/stable/…   canonical✓  -
microk8s  v1.21.0   2128   latest/stable  canonical✓  classic
snapd     2.49.2    11588  latest/stable  canonical✓  snapd
~# microk8s status 
microk8s is running
high-availability: yes
  datastore master nodes: 192.168.50.42:19001 192.168.50.43:19001 192.168.50.44:19001
  datastore standby nodes: 192.168.50.41:19001
addons:
  enabled:
    dns                  # CoreDNS
    ha-cluster           # Configure high availability on the current node
    helm3                # Helm 3 - Kubernetes package manager
    ingress              # Ingress controller for external access
    metallb              # Loadbalancer for your Kubernetes cluster
    rbac                 # Role-Based Access Control for authorisation
  disabled:
    ambassador           # Ambassador API Gateway and Ingress
    cilium               # SDN, fast with full network policy
    dashboard            # The Kubernetes dashboard
    fluentd              # Elasticsearch-Fluentd-Kibana logging and monitoring
    gpu                  # Automatic enablement of Nvidia CUDA
    helm                 # Helm 2 - the package manager for Kubernetes
    host-access          # Allow Pods connecting to Host services smoothly
    istio                # Core Istio service mesh services
    jaeger               # Kubernetes Jaeger operator with its simple config
    keda                 # Kubernetes-based Event Driven Autoscaling
    knative              # The Knative framework on Kubernetes.
    kubeflow             # Kubeflow for easy ML deployments
    linkerd              # Linkerd is a service mesh for Kubernetes and other frameworks
    metrics-server       # K8s Metrics Server for API access to service metrics
    multus               # Multus CNI enables attaching multiple network interfaces to pods
    openebs              # OpenEBS is the open-source storage solution for Kubernetes
    openfaas             # openfaas serverless framework
    portainer            # Portainer UI for your Kubernetes cluster
    prometheus           # Prometheus operator for monitoring and logging
    registry             # Private image registry exposed on localhost:32000
    storage              # Storage class; allocates storage from host directory
    traefik              # traefik Ingress controller for external access

~# microk8s kubectl get nodes -o wide
NAME          STATUS   ROLES    AGE   VERSION                    INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
mwg-csm-n02   Ready    <none>   20h   v1.21.0-3+121713cef81e03   192.168.50.42   <none>        Ubuntu 20.04.2 LTS   5.4.0-72-generic   containerd://1.4.4
mwg-csm-n04   Ready    <none>   20h   v1.21.0-3+121713cef81e03   192.168.50.44   <none>        Ubuntu 20.04.2 LTS   5.4.0-72-generic   containerd://1.4.4
mwg-csm-n01   Ready    <none>   20h   v1.21.0-3+121713cef81e03   192.168.50.41   <none>        Ubuntu 20.04.2 LTS   5.4.0-72-generic   containerd://1.4.4
mwg-csm-n03   Ready    <none>   20h   v1.21.0-3+121713cef81e03   192.168.50.43   <none>        Ubuntu 20.04.2 LTS   5.4.0-72-generic   containerd://1.4.4
~# microk8s kubectl get services -A -o wide
NAMESPACE     NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE   SELECTOR
default       kubernetes                 ClusterIP      10.152.183.1     <none>          443/TCP                      20h   <none>
kube-system   kube-dns                   ClusterIP      10.152.183.10    <none>          53/UDP,53/TCP,9153/TCP       20h   k8s-app=kube-dns
rook-ceph     csi-rbdplugin-metrics      ClusterIP      10.152.183.240   <none>          8080/TCP,8081/TCP            20h   contains=csi-rbdplugin-metrics
rook-ceph     csi-cephfsplugin-metrics   ClusterIP      10.152.183.141   <none>          8080/TCP,8081/TCP            20h   contains=csi-cephfsplugin-metrics
rook-ceph     rook-ceph-mon-a            ClusterIP      10.152.183.207   <none>          6789/TCP,3300/TCP            20h   app=rook-ceph-mon,ceph_daemon_id=a,mon=a,mon_cluster=rook-ceph,rook_cluster=rook-ceph
rook-ceph     rook-ceph-mon-b            ClusterIP      10.152.183.235   <none>          6789/TCP,3300/TCP            20h   app=rook-ceph-mon,ceph_daemon_id=b,mon=b,mon_cluster=rook-ceph,rook_cluster=rook-ceph
rook-ceph     rook-ceph-mon-c            ClusterIP      10.152.183.197   <none>          6789/TCP,3300/TCP            20h   app=rook-ceph-mon,ceph_daemon_id=c,mon=c,mon_cluster=rook-ceph,rook_cluster=rook-ceph
rook-ceph     rook-ceph-mgr              ClusterIP      10.152.183.119   <none>          9283/TCP                     20h   app=rook-ceph-mgr,ceph_daemon_id=a,rook_cluster=rook-ceph
rook-ceph     rook-ceph-mgr-dashboard    ClusterIP      10.152.183.165   <none>          8443/TCP                     20h   app=rook-ceph-mgr,ceph_daemon_id=a,rook_cluster=rook-ceph
ingress       ingress                    LoadBalancer   10.152.183.229   192.168.50.99   80:31981/TCP,443:31086/TCP   16h   name=nginx-ingress-microk8s
default       awx-operator-metrics       ClusterIP      10.152.183.159   <none>          8383/TCP,8686/TCP            9h    name=awx-operator
default       awx-postgres               ClusterIP      None             <none>          5432/TCP                     9h    app.kubernetes.io/component=database,app.kubernetes.io/managed-by=awx-operator,app.kubernetes.io/name=awx-postgres
~# microk8s kubectl get pods -A -o wide
NAMESPACE        NAME                                                    READY   STATUS              RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
kube-system      calico-node-bkd9r                                       1/1     Running             1          20h   192.168.50.41   mwg-csm-n01   <none>           <none>
kube-system      calico-kube-controllers-f7868dd95-lr5zj                 1/1     Running             0          20h   10.1.127.1      mwg-csm-n01   <none>           <none>
rook-ceph        rook-ceph-operator-95f44b96c-jdskq                      1/1     Running             0          19h   10.1.238.193    mwg-csm-n03   <none>           <none>
rook-ceph        csi-cephfsplugin-btxvs                                  3/3     Running             0          19h   10.1.238.195    mwg-csm-n03   <none>           <none>
rook-ceph        csi-rbdplugin-9scw5                                     3/3     Running             0          19h   10.1.238.196    mwg-csm-n03   <none>           <none>
kube-system      calico-node-6dsdf                                       1/1     Running             0          20h   192.168.50.42   mwg-csm-n02   <none>           <none>
rook-ceph        csi-cephfsplugin-v592p                                  3/3     Running             0          19h   10.1.127.3      mwg-csm-n01   <none>           <none>
rook-ceph        csi-rbdplugin-r6hzm                                     3/3     Running             0          19h   10.1.127.4      mwg-csm-n01   <none>           <none>
rook-ceph        csi-cephfsplugin-7p8b7                                  3/3     Running             0          19h   10.1.63.66      mwg-csm-n02   <none>           <none>
rook-ceph        csi-rbdplugin-provisioner-7bcb95bc5d-dcf8s              6/6     Running             0          19h   10.1.127.2      mwg-csm-n01   <none>           <none>
rook-ceph        csi-rbdplugin-9r4fx                                     3/3     Running             0          19h   10.1.63.68      mwg-csm-n02   <none>           <none>
rook-ceph        csi-cephfsplugin-provisioner-58d557d5-694fc             6/6     Running             0          19h   10.1.63.67      mwg-csm-n02   <none>           <none>
rook-ceph        csi-rbdplugin-p5tzb                                     3/3     Running             1          19h   10.1.237.131    mwg-csm-n04   <none>           <none>
rook-ceph        csi-cephfsplugin-czjbx                                  3/3     Running             0          19h   10.1.237.132    mwg-csm-n04   <none>           <none>
rook-ceph        csi-rbdplugin-provisioner-7bcb95bc5d-74k42              6/6     Running             4          19h   10.1.238.197    mwg-csm-n03   <none>           <none>
kube-system      calico-node-9znr4                                       1/1     Running             0          20h   192.168.50.44   mwg-csm-n04   <none>           <none>
kube-system      calico-node-tpkzz                                       1/1     Running             0          20h   192.168.50.43   mwg-csm-n03   <none>           <none>
rook-ceph        rook-ceph-mon-a-765df6dd4b-hs2f8                        1/1     Running             0          19h   10.1.63.71      mwg-csm-n02   <none>           <none>
kube-system      coredns-7f9c69c78c-cg9lm                                1/1     Running             0          20h   10.1.237.129    mwg-csm-n04   <none>           <none>
rook-ceph        csi-cephfsplugin-provisioner-58d557d5-6kgtq             6/6     Running             0          19h   10.1.237.130    mwg-csm-n04   <none>           <none>
rook-ceph        rook-ceph-mon-b-66d6484f6c-25wb9                        1/1     Running             0          19h   10.1.127.6      mwg-csm-n01   <none>           <none>
rook-ceph        rook-ceph-mon-c-86db886cb7-v7gz6                        1/1     Running             0          19h   10.1.238.198    mwg-csm-n03   <none>           <none>
rook-ceph        rook-ceph-mgr-a-76bff945b9-ngjxv                        1/1     Running             0          19h   10.1.237.133    mwg-csm-n04   <none>           <none>
rook-ceph        rook-ceph-osd-0-8bf5b8df8-pt467                         1/1     Running             0          19h   10.1.63.73      mwg-csm-n02   <none>           <none>
rook-ceph        rook-ceph-osd-1-d4fcc896-5jxl6                          1/1     Running             0          19h   10.1.127.8      mwg-csm-n01   <none>           <none>
rook-ceph        rook-ceph-osd-2-6b8dd7f784-pmvng                        1/1     Running             0          19h   10.1.238.201    mwg-csm-n03   <none>           <none>
rook-ceph        rook-ceph-crashcollector-mwg-csm-n01-5f64744b86-x2xgr   1/1     Running             0          19h   10.1.127.9      mwg-csm-n01   <none>           <none>
rook-ceph        rook-ceph-crashcollector-mwg-csm-n04-7f594db9c-lr7s8    1/1     Running             0          19h   10.1.237.137    mwg-csm-n04   <none>           <none>
rook-ceph        rook-ceph-osd-3-6f8897cb66-fwcpz                        1/1     Running             0          19h   10.1.237.136    mwg-csm-n04   <none>           <none>
rook-ceph        rook-ceph-tools-57787758df-ldhvd                        1/1     Running             0          19h   10.1.63.77      mwg-csm-n02   <none>           <none>
rook-ceph        rook-ceph-mds-myfs-a-96bb847f6-g7xp4                    1/1     Running             0          19h   10.1.63.78      mwg-csm-n02   <none>           <none>
rook-ceph        rook-ceph-crashcollector-mwg-csm-n02-669cbd97c-ktr67    1/1     Running             0          19h   10.1.63.79      mwg-csm-n02   <none>           <none>
rook-ceph        rook-ceph-mds-myfs-b-5cff486654-btj67                   1/1     Running             0          19h   10.1.238.204    mwg-csm-n03   <none>           <none>
rook-ceph        rook-ceph-crashcollector-mwg-csm-n03-795c4676d6-5pm8b   1/1     Running             0          19h   10.1.238.203    mwg-csm-n03   <none>           <none>
ingress          nginx-ingress-microk8s-controller-dkn42                 1/1     Running             0          17h   10.1.127.11     mwg-csm-n01   <none>           <none>
ingress          nginx-ingress-microk8s-controller-7sssx                 1/1     Running             0          17h   10.1.238.205    mwg-csm-n03   <none>           <none>
ingress          nginx-ingress-microk8s-controller-dh4hq                 1/1     Running             0          17h   10.1.237.140    mwg-csm-n04   <none>           <none>
ingress          nginx-ingress-microk8s-controller-7v5v6                 1/1     Running             0          17h   10.1.63.80      mwg-csm-n02   <none>           <none>
rook-ceph        rook-ceph-osd-prepare-mwg-csm-n01-f76x2                 0/1     Completed           0          16h   10.1.127.20     mwg-csm-n01   <none>           <none>
rook-ceph        rook-ceph-osd-prepare-mwg-csm-n03-zb55j                 0/1     Completed           0          16h   10.1.238.209    mwg-csm-n03   <none>           <none>
rook-ceph        rook-ceph-osd-prepare-mwg-csm-n04-dj6sn                 0/1     Completed           0          16h   10.1.237.144    mwg-csm-n04   <none>           <none>
rook-ceph        rook-ceph-osd-prepare-mwg-csm-n02-7ll8z                 0/1     Completed           0          16h   10.1.63.84      mwg-csm-n02   <none>           <none>
metallb-system   controller-559b68bfd8-wx6t5                             1/1     Running             0          15h   10.1.127.22     mwg-csm-n01   <none>           <none>
metallb-system   speaker-kq5fz                                           1/1     Running             0          15h   192.168.50.41   mwg-csm-n01   <none>           <none>
metallb-system   speaker-2sdd2                                           1/1     Running             0          15h   192.168.50.44   mwg-csm-n04   <none>           <none>
metallb-system   speaker-6rmrp                                           1/1     Running             0          15h   192.168.50.43   mwg-csm-n03   <none>           <none>
metallb-system   speaker-8dqxh                                           1/1     Running             0          15h   192.168.50.42   mwg-csm-n02   <none>           <none>
default          awx-operator-fcc84f5df-w6jfd                            1/1     Running             0          9h    10.1.237.147    mwg-csm-n04   <none>           <none>
default          awx-86899bfb7b-pjjfs                                    0/4     ContainerCreating   0          9h    <none>          mwg-csm-n03   <none>           <none>

I installed the awx-operator without modifications and created a "my-awx.yaml" file to deploy a fairly basic install using rook-ceph-block storage for the PVCs

my-awx.yaml:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  tower_ingress_type: LoadBalancer
  tower_loadbalancer_protocol: http
  tower_postgres_resource_requirements:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      cpu: 1000m
      memory: 4Gi
  tower_postgres_storage_requirements:
    requests:
      storage: 8Gi
    limits:
      storage: 50Gi
  tower_postgres_storage_class: rook-ceph-block
  tower_web_resource_requirements:
    requests:
      cpu: 1000m
      memory: 2Gi
    limits:
      cpu: 2000m
      memory: 4Gi
  tower_task_resource_requirements:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 1000m
      memory: 2Gi
  tower_projects_persistence: true
  tower_projects_storage_class: rook-ceph-block
  tower_projects_storage_access_mode: ReadWriteOnce
  tower_projects_storage_size: 20Gi

As you can see from the pod list the awx-86899bfb7b-pjjfs is stuck in ContainerCreating

~# microk8s kubectl describe pod awx-86899bfb7b-pjjfs
Name:           awx-86899bfb7b-pjjfs
Namespace:      default
Priority:       0
Node:           mwg-csm-n03/192.168.50.43
Start Time:     Thu, 29 Apr 2021 19:24:33 +0000
Labels:         app.kubernetes.io/component=awx
                app.kubernetes.io/managed-by=awx-operator
                app.kubernetes.io/name=awx
                app.kubernetes.io/part-of=awx
                app.kubernetes.io/version=19.0.0
                pod-template-hash=86899bfb7b
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/awx-86899bfb7b
Containers:
  redis:
    Container ID:  
    Image:         redis:latest
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      redis-server
      /etc/redis.conf
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from awx-redis-data (rw)
      /etc/redis.conf from awx-redis-config (ro,path="redis.conf")
      /var/run/redis from awx-redis-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-blb96 (ro)
  awx-web:
    Container ID:   
    Image:          quay.io/ansible/awx:19.0.0
    Image ID:       
    Port:           8052/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:     1
      memory:  2Gi
    Environment:
      MY_POD_NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /etc/nginx/nginx.conf from awx-nginx-conf (ro,path="nginx.conf")
      /etc/tower/SECRET_KEY from awx-secret-key (ro,path="SECRET_KEY")
      /etc/tower/conf.d/ from awx-application-credentials (ro)
      /etc/tower/settings.py from awx-settings (ro,path="settings.py")
      /var/lib/awx/rsyslog from rsyslog-dir (rw)
      /var/run/awx-rsyslog from rsyslog-socket (rw)
      /var/run/redis from awx-redis-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-blb96 (ro)
      /var/run/supervisor from supervisor-socket (rw)
  awx-task:
    Container ID:  
    Image:         quay.io/ansible/awx:19.0.0
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      /usr/bin/launch_awx_task.sh
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:     500m
      memory:  1Gi
    Environment:
      SUPERVISOR_WEB_CONFIG_PATH:  /etc/supervisord.conf
      AWX_SKIP_MIGRATIONS:         1
      MY_POD_UID:                   (v1:metadata.uid)
      MY_POD_IP:                    (v1:status.podIP)
      MY_POD_NAMESPACE:            default (v1:metadata.namespace)
    Mounts:
      /etc/tower/SECRET_KEY from awx-secret-key (ro,path="SECRET_KEY")
      /etc/tower/conf.d/ from awx-application-credentials (ro)
      /etc/tower/settings.py from awx-settings (ro,path="settings.py")
      /var/lib/awx/projects from awx-projects (rw)
      /var/lib/awx/rsyslog from rsyslog-dir (rw)
      /var/run/awx-rsyslog from rsyslog-socket (rw)
      /var/run/receptor from receptor-socket (rw)
      /var/run/redis from awx-redis-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-blb96 (ro)
      /var/run/supervisor from supervisor-socket (rw)
  awx-ee:
    Container ID:  
    Image:         quay.io/ansible/awx-ee:0.1.1
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      receptor
      --config
      /etc/receptor.conf
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/receptor.conf from awx-receptor-config (ro,path="receptor.conf")
      /var/lib/awx/projects from awx-projects (rw)
      /var/run/receptor from receptor-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-blb96 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  awx-application-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  awx-app-credentials
    Optional:    false
  awx-secret-key:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  awx-secret-key
    Optional:    false
  awx-settings:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  awx-nginx-conf:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  awx-redis-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  awx-redis-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  awx-redis-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  supervisor-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  rsyslog-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  receptor-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  rsyslog-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  awx-receptor-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  awx-projects:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  awx-projects-claim
    ReadOnly:   false
  kube-api-access-blb96:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                   From     Message
  ----     ------       ----                  ----     -------
  Warning  FailedMount  7m42s (x244 over 9h)  kubelet  MountVolume.MountDevice failed for volume "pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a" : rpc error: code = InvalidArgument desc = staging path /var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a/globalmount does not exist on node
  Warning  FailedMount  114s (x278 over 9h)   kubelet  (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[awx-projects], unattached volumes=[supervisor-socket awx-redis-data kube-api-access-blb96 awx-nginx-conf awx-application-credentials awx-redis-socket rsyslog-dir awx-projects receptor-socket rsyslog-socket awx-settings awx-secret-key awx-redis-config awx-receptor-config]: timed out waiting for the condition

So basically what it is saying is that the PVC does not exist on node - which means what exactly?

~# microk8s kubectl get pvc -A -o wide
NAMESPACE   NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE   VOLUMEMODE
default     awx-projects-claim   Bound    pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a   20Gi       RWO            rook-ceph-block   9h    Filesystem
~# microk8s kubectl -n rook-ceph exec -it $(microk8s kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- bash
[root@rook-ceph-tools-57787758df-ldhvd /]# ceph status
  cluster:
    id:     3869ad1c-3402-41b7-b561-1d8fc29487e6
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum a,b,c (age 20h)
    mgr: a(active, since 20h)
    mds: myfs:1 {0=myfs-a=up:active} 1 up:standby-replay
    osd: 4 osds: 4 up (since 20h), 4 in (since 20h)
 
  data:
    pools:   4 pools, 97 pgs
    objects: 28 objects, 2.3 KiB
    usage:   4.0 GiB used, 796 GiB / 800 GiB avail
    pgs:     97 active+clean
 
  io:
    client:   852 B/s rd, 1 op/s rd, 0 op/s wr

~# microk8s kubectl -n rook-ceph logs csi-rbdplugin-provisioner-7bcb95bc5d-74k42 -c csi-provisioner
......
I0429 19:24:29.806037       1 controller.go:1317] provision "default/awx-projects-claim" class "rook-ceph-block": started
I0429 19:24:29.807382       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"awx-projects-claim", UID:"ae007b9b-fe8d-42fe-9975-24ad1a79c67a", APIVersion:"v1", ResourceVersion:"140453", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/awx-projects-claim"
I0429 19:24:32.212610       1 controller.go:1420] provision "default/awx-projects-claim" class "rook-ceph-block": volume "pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a" provisioned
I0429 19:24:32.212787       1 controller.go:1437] provision "default/awx-projects-claim" class "rook-ceph-block": succeeded
E0429 19:24:32.270790       1 controller.go:1443] couldn't create key for object pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a: object has no meta: object does not implement the Object interfaces
I0429 19:24:32.271017       1 controller.go:1317] provision "default/awx-projects-claim" class "rook-ceph-block": started
I0429 19:24:32.271018       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"awx-projects-claim", UID:"ae007b9b-fe8d-42fe-9975-24ad1a79c67a", APIVersion:"v1", ResourceVersion:"140453", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a
I0429 19:24:32.271174       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"awx-projects-claim", UID:"ae007b9b-fe8d-42fe-9975-24ad1a79c67a", APIVersion:"v1", ResourceVersion:"140453", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/awx-projects-claim"
I0429 19:24:32.369198       1 controller.go:1420] provision "default/awx-projects-claim" class "rook-ceph-block": volume "pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a" provisioned
I0429 19:24:32.369283       1 controller.go:1437] provision "default/awx-projects-claim" class "rook-ceph-block": succeeded
E0429 19:24:32.392952       1 controller.go:1443] couldn't create key for object pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a: object has no meta: object does not implement the Object interfaces
I0429 19:24:32.393630       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"awx-projects-claim", UID:"ae007b9b-fe8d-42fe-9975-24ad1a79c67a", APIVersion:"v1", ResourceVersion:"140453", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a

I0429 19:24:29.806037       1 controller.go:1317] provision "default/awx-projects-claim" class "rook-ceph-block": started
I0429 19:24:29.807382       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"awx-projects-claim", UID:"ae007b9b-fe8d-42fe-9975-24ad1a79c67a", APIVersion:"v1", ResourceVersion:"140453", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/awx-projects-claim"
I0429 19:24:32.212610       1 controller.go:1420] provision "default/awx-projects-claim" class "rook-ceph-block": volume "pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a" provisioned
I0429 19:24:32.212787       1 controller.go:1437] provision "default/awx-projects-claim" class "rook-ceph-block": succeeded
E0429 19:24:32.270790       1 controller.go:1443] couldn't create key for object pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a: object has no meta: object does not implement the Object interfaces
I0429 19:24:32.271017       1 controller.go:1317] provision "default/awx-projects-claim" class "rook-ceph-block": started
I0429 19:24:32.271018       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"awx-projects-claim", UID:"ae007b9b-fe8d-42fe-9975-24ad1a79c67a", APIVersion:"v1", ResourceVersion:"140453", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a
I0429 19:24:32.271174       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"awx-projects-claim", UID:"ae007b9b-fe8d-42fe-9975-24ad1a79c67a", APIVersion:"v1", ResourceVersion:"140453", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/awx-projects-claim"
I0429 19:24:32.369198       1 controller.go:1420] provision "default/awx-projects-claim" class "rook-ceph-block": volume "pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a" provisioned
I0429 19:24:32.369283       1 controller.go:1437] provision "default/awx-projects-claim" class "rook-ceph-block": succeeded
E0429 19:24:32.392952       1 controller.go:1443] couldn't create key for object pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a: object has no meta: object does not implement the Object interfaces
I0429 19:24:32.393630       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"awx-projects-claim", UID:"ae007b9b-fe8d-42fe-9975-24ad1a79c67a", APIVersion:"v1", ResourceVersion:"140453", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a
microk8s ctr image ls |awk '{print $1}'|grep -v sha256                       
REF
docker.io/calico/cni:v3.13.2
docker.io/calico/kube-controllers:v3.13.2
docker.io/calico/node:v3.13.2
docker.io/calico/pod2daemon-flexvol:v3.13.2
docker.io/ceph/ceph:v15.2.11
docker.io/metallb/controller:v0.9.3
docker.io/metallb/speaker:v0.9.3
docker.io/rook/ceph:v1.6.1
k8s.gcr.io/ingress-nginx/controller:v0.44.0
k8s.gcr.io/pause:3.1
k8s.gcr.io/sig-storage/csi-attacher:v3.0.2
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1
k8s.gcr.io/sig-storage/csi-provisioner:v2.0.4
k8s.gcr.io/sig-storage/csi-resizer:v1.0.1
k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.2
quay.io/cephcsi/cephcsi:v3.3.1

Like I said, I am completely green at this, so might be all manner of stuff wrong with my setup. Would be awesome if you had some pointers on where to look first though.

-Marius

@tchellomello
Copy link
Contributor

@PugTheBlack it looks like a problem on the storage provisioner other than the awx-operator itself.

Events:
  Type     Reason       Age                   From     Message
  ----     ------       ----                  ----     -------
  Warning  FailedMount  7m42s (x244 over 9h)  kubelet  MountVolume.MountDevice failed for volume "pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a" : rpc error: code = InvalidArgument desc = staging path /var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-ae007b9b-fe8d-42fe-9975-24ad1a79c67a/globalmount does not exist on node

I would investigate the issue related with the rook-ceph and the message above which might lead you to the direction on fixing it. I also have rook-ceph implemented here and in my tests, it worked fine.

@PugTheBlack
Copy link
Author

Yea, it might be a rook-ceph problem more than awx-operator - the reason why I posted it here was mostly the part where the pvc is listed as bound. Will check it out with the rook-ceph guys and see if they can help :)

@PugTheBlack
Copy link
Author

@tchellomello like you said the problem was with the rook-ceph implementation. For some reason I had to create the rook operator with the full snap-path /var/snap/microk8s/common/var/lib/kubelet and not /var/lib/kubelet, then the awx-operator ran its course.... so now I have a different set of problems :)

# microk8s kubectl describe svc awx-postgres        
Name:              awx-postgres
Namespace:         default
Labels:            app.kubernetes.io/component=database
                   app.kubernetes.io/managed-by=awx-operator
                   app.kubernetes.io/name=awx-postgres
                   app.kubernetes.io/part-of=awx
Annotations:       <none>
Selector:          app.kubernetes.io/component=database,app.kubernetes.io/managed-by=awx-operator,app.kubernetes.io/name=awx-postgres
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                None
IPs:               None
Port:              <unset>  5432/TCP
TargetPort:        5432/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>
# microk8s kubectl get svc -A -o wide                         
NAMESPACE     NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE     SELECTOR
default       kubernetes                  ClusterIP      10.152.183.1     <none>           443/TCP                      4d23h   <none>
kube-system   kube-dns                    ClusterIP      10.152.183.10    <none>           53/UDP,53/TCP,9153/TCP       4d23h   k8s-app=kube-dns
rook-ceph     rook-ceph-mon-a             ClusterIP      10.152.183.207   <none>           6789/TCP,3300/TCP            4d22h   app=rook-ceph-mon,ceph_daemon_id=a,mon=a,mon_cluster=rook-ceph,rook_cluster=rook-ceph
rook-ceph     rook-ceph-mon-b             ClusterIP      10.152.183.235   <none>           6789/TCP,3300/TCP            4d22h   app=rook-ceph-mon,ceph_daemon_id=b,mon=b,mon_cluster=rook-ceph,rook_cluster=rook-ceph
rook-ceph     rook-ceph-mon-c             ClusterIP      10.152.183.197   <none>           6789/TCP,3300/TCP            4d22h   app=rook-ceph-mon,ceph_daemon_id=c,mon=c,mon_cluster=rook-ceph,rook_cluster=rook-ceph
rook-ceph     rook-ceph-mgr               ClusterIP      10.152.183.119   <none>           9283/TCP                     4d22h   app=rook-ceph-mgr,ceph_daemon_id=a,rook_cluster=rook-ceph
ingress       ingress                     LoadBalancer   10.152.183.229   192.168.50.99    80:31981/TCP,443:31086/TCP   4d19h   name=nginx-ingress-microk8s
default       awx-operator-metrics        ClusterIP      10.152.183.159   <none>           8383/TCP,8686/TCP            4d12h   name=awx-operator
kube-system   metrics-server              ClusterIP      10.152.183.177   <none>           443/TCP                      3d21h   k8s-app=metrics-server
kube-system   dashboard-metrics-scraper   ClusterIP      10.152.183.117   <none>           8000/TCP                     3d21h   k8s-app=dashboard-metrics-scraper
kube-system   kubernetes-dashboard        LoadBalancer   10.152.183.192   192.168.50.100   443:31066/TCP                3d21h   k8s-app=kubernetes-dashboard
rook-ceph     rook-ceph-mgr-dashboard     ClusterIP      10.152.183.165   <none>           8443/TCP                     4d22h   app=rook-ceph-mgr,ceph_daemon_id=a,rook_cluster=rook-ceph
rook-ceph     csi-rbdplugin-metrics       ClusterIP      10.152.183.34    <none>           8080/TCP,8081/TCP            67m     contains=csi-rbdplugin-metrics
rook-ceph     csi-cephfsplugin-metrics    ClusterIP      10.152.183.30    <none>           8080/TCP,8081/TCP            67m     contains=csi-cephfsplugin-metrics
default       awx-postgres                ClusterIP      None             <none>           5432/TCP                     60m     app.kubernetes.io/component=database,app.kubernetes.io/managed-by=awx-operator,app.kubernetes.io/name=awx-postgres
# microk8s kubectl describe pod awx-86899bfb7b-ccg76
Name:         awx-86899bfb7b-ccg76
Namespace:    default
Priority:     0
Node:         mwg-csm-n03/192.168.50.43
Start Time:   Tue, 04 May 2021 06:42:02 +0000
Labels:       app.kubernetes.io/component=awx
              app.kubernetes.io/managed-by=awx-operator
              app.kubernetes.io/name=awx
              app.kubernetes.io/part-of=awx
              app.kubernetes.io/version=19.0.0
              pod-template-hash=86899bfb7b
Annotations:  cni.projectcalico.org/podIP: 10.1.238.230/32
              cni.projectcalico.org/podIPs: 10.1.238.230/32
Status:       Running
IP:           10.1.238.230
IPs:
  IP:           10.1.238.230
Controlled By:  ReplicaSet/awx-86899bfb7b
Containers:
  redis:
    Container ID:  containerd://aca4d5dbe1670c14371abe31c57ce7525e1216a2c0f6e7182f6ff1953909fd95
    Image:         redis:latest
    Image ID:      docker.io/library/redis@sha256:eff56acc5fc7b909183da93236ba09d3b8cb7d6db31d5b25e9a46dac9b5e699b
    Port:          <none>
    Host Port:     <none>
    Args:
      redis-server
      /etc/redis.conf
    State:          Running
      Started:      Tue, 04 May 2021 06:42:14 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from awx-redis-data (rw)
      /etc/redis.conf from awx-redis-config (ro,path="redis.conf")
      /var/run/redis from awx-redis-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gktwp (ro)
  awx-web:
    Container ID:   containerd://d19efb9e0af7b1f2ef346bc9b355c493d4175de9a7678bbf9339fa87560d6ca7
    Image:          quay.io/ansible/awx:19.0.0
    Image ID:       quay.io/ansible/awx@sha256:db165b894507fb520d3f53ac68eb2b49f2a5fd2cc63c7ac7aaa7bd904970b1b2
    Port:           8052/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 04 May 2021 06:42:15 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:     1
      memory:  2Gi
    Environment:
      MY_POD_NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /etc/nginx/nginx.conf from awx-nginx-conf (ro,path="nginx.conf")
      /etc/tower/SECRET_KEY from awx-secret-key (ro,path="SECRET_KEY")
      /etc/tower/conf.d/ from awx-application-credentials (ro)
      /etc/tower/settings.py from awx-settings (ro,path="settings.py")
      /var/lib/awx/rsyslog from rsyslog-dir (rw)
      /var/run/awx-rsyslog from rsyslog-socket (rw)
      /var/run/redis from awx-redis-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gktwp (ro)
      /var/run/supervisor from supervisor-socket (rw)
  awx-task:
    Container ID:  containerd://c79f56ec95c46ea947f3aa954a71f0a97a532397025b3983c1f8f9f55331940b
    Image:         quay.io/ansible/awx:19.0.0
    Image ID:      quay.io/ansible/awx@sha256:db165b894507fb520d3f53ac68eb2b49f2a5fd2cc63c7ac7aaa7bd904970b1b2
    Port:          <none>
    Host Port:     <none>
    Args:
      /usr/bin/launch_awx_task.sh
    State:          Running
      Started:      Tue, 04 May 2021 06:42:15 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:     500m
      memory:  1Gi
    Environment:
      SUPERVISOR_WEB_CONFIG_PATH:  /etc/supervisord.conf
      AWX_SKIP_MIGRATIONS:         1
      MY_POD_UID:                   (v1:metadata.uid)
      MY_POD_IP:                    (v1:status.podIP)
      MY_POD_NAMESPACE:            default (v1:metadata.namespace)
    Mounts:
      /etc/tower/SECRET_KEY from awx-secret-key (ro,path="SECRET_KEY")
      /etc/tower/conf.d/ from awx-application-credentials (ro)
      /etc/tower/settings.py from awx-settings (ro,path="settings.py")
      /var/lib/awx/projects from awx-projects (rw)
      /var/lib/awx/rsyslog from rsyslog-dir (rw)
      /var/run/awx-rsyslog from rsyslog-socket (rw)
      /var/run/receptor from receptor-socket (rw)
      /var/run/redis from awx-redis-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gktwp (ro)
      /var/run/supervisor from supervisor-socket (rw)
  awx-ee:
    Container ID:  containerd://06a3f2d9b7bc55c992c95d09af7c148798cd0bf1ac0eaa54cdf76ae0ecd099cf
    Image:         quay.io/ansible/awx-ee:0.1.1
    Image ID:      quay.io/ansible/awx-ee@sha256:76ecd20bd375c1cc4c1fd2fc9f43e7a321136905b45f3c0fdd0304de59467b93
    Port:          <none>
    Host Port:     <none>
    Args:
      receptor
      --config
      /etc/receptor.conf
    State:          Running
      Started:      Tue, 04 May 2021 06:42:16 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/receptor.conf from awx-receptor-config (ro,path="receptor.conf")
      /var/lib/awx/projects from awx-projects (rw)
      /var/run/receptor from receptor-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gktwp (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  awx-application-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  awx-app-credentials
    Optional:    false
  awx-secret-key:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  awx-secret-key
    Optional:    false
  awx-settings:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  awx-nginx-conf:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  awx-redis-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  awx-redis-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  awx-redis-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  supervisor-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  rsyslog-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  receptor-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  rsyslog-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  awx-receptor-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  awx-projects:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  awx-projects-claim
    ReadOnly:   false
  kube-api-access-gktwp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>
# microk8s kubectl logs -f awx-86899bfb7b-ccg76 -c awx-web


2021-05-04 07:49:40,911 ERROR    [-] awx.conf.settings Database settings are not available, using defaults.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not translate host name "awx-postgres" to address: Name or service not known


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/conf/settings.py", line 81, in _ctit_db_wrapper
    yield
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/conf/settings.py", line 416, in __getattr__
    value = self._get_local(name)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/conf/settings.py", line 360, in _get_local
    setting = Setting.objects.filter(key=name, user__isnull=True).order_by('pk').first()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/query.py", line 653, in first
    for obj in (self if self.ordered else self.order_by('pk'))[:1]:
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/query.py", line 274, in __iter__
    self._fetch_all()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/query.py", line 55, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/sql/compiler.py", line 1140, in execute_sql
    cursor = self.connection.cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 256, in cursor
    return self._cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 233, in _cursor
    self.ensure_connection()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not translate host name "awx-postgres" to address: Name or service not known

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
^C    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection

So basically - the containers can't find awx-postgres

Also - I can't really see any PVC for postgres...

# microk8s kubectl get pvc -A -o wide                     
NAMESPACE   NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE   VOLUMEMODE
default     awx-projects-claim   Bound    pvc-5e7a4861-7432-4c64-abf2-05b5480ac3d6   20Gi       RWO            rook-ceph-block   72m   Filesystem

@tchellomello
Copy link
Contributor

On the logs provided, you did not show if the PostgreSQL statefulset is running. So what happens if you do:

kubectl get pods -o wide  awx-postgres-0

Then if you take the IP address from that particular pod, and from your awx-task container, are you able to hit that IP?

Furthermore, what happens if you deploy the latest 0.9.0 operator version and try to create a new awx instance with a different name?

@PugTheBlack
Copy link
Author

# microk8s kubectl get pods -o wide awx-postgres-0
Error from server (NotFound): pods "awx-postgres-0" not found

So no it's not running.

I will try to deploy a new awx instance with a different name using the 0.9.0 operator later today :)

@PugTheBlack
Copy link
Author

But I did a "wipe" of the old awx deployment

# microk8s kubectl delete -f my-awx.yaml 
# microk8s kubectl delete -f awx-operator.yaml

Then cloned the new awx-operator from github - and now it all seems to be working much better :)

# microk8s kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP             NODE          NOMINATED NODE   READINESS GATES
awx-operator-5d9d764bcd-hdfxk   1/1     Running   0          8m56s   10.1.238.235   mwg-csm-n03   <none>           <none>
awx-postgres-0                  1/1     Running   0          6m30s   10.1.238.236   mwg-csm-n03   <none>           <none>
# microk8s kubectl get pvc -o wide
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE     VOLUMEMODE
postgres-awx-postgres-0   Bound    pvc-7b7cdd59-3f39-4f23-a76c-f01e3fe4fba0   8Gi        RWO            rook-ceph-block   7m43s   Filesystem
awx-projects-claim        Bound    pvc-5afc16bb-5089-4b70-92fd-778f3c7d2b60   20Gi       RWO            rook-ceph-block   7m18s   Filesystem

So hopefully I will have a working deployment sometime today :)

@PugTheBlack
Copy link
Author

# microk8s kubectl describe pods awx-postgres-0        
Name:         awx-postgres-0
Namespace:    default
Priority:     0
Node:         mwg-csm-n03/192.168.50.43
Start Time:   Thu, 06 May 2021 05:51:48 +0000
Labels:       app.kubernetes.io/component=database
              app.kubernetes.io/managed-by=awx-operator
              app.kubernetes.io/name=awx-postgres
              app.kubernetes.io/part-of=awx
              controller-revision-hash=awx-postgres-c64cb47d9
              statefulset.kubernetes.io/pod-name=awx-postgres-0
Annotations:  cni.projectcalico.org/podIP: 10.1.238.236/32
              cni.projectcalico.org/podIPs: 10.1.238.236/32
Status:       Running
IP:           10.1.238.236
IPs:
  IP:           10.1.238.236
Controlled By:  StatefulSet/awx-postgres
Containers:
  postgres:
    Container ID:   containerd://fbde7aa71f53344fe30b214de85b40b16088773db41398249bacc7554ca096f8
    Image:          postgres:12
    Image ID:       docker.io/library/postgres@sha256:06277995d7028f4455e56f21864146ee2425c83308dc96283c7a96e6881cc826
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 06 May 2021 05:52:29 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  4Gi
    Requests:
      cpu:     500m
      memory:  2Gi
    Environment:
      POSTGRESQL_DATABASE:        <set to the key 'database' in secret 'awx-postgres-configuration'>  Optional: false
      POSTGRESQL_USER:            <set to the key 'username' in secret 'awx-postgres-configuration'>  Optional: false
      POSTGRESQL_PASSWORD:        <set to the key 'password' in secret 'awx-postgres-configuration'>  Optional: false
      POSTGRES_DB:                <set to the key 'database' in secret 'awx-postgres-configuration'>  Optional: false
      POSTGRES_USER:              <set to the key 'username' in secret 'awx-postgres-configuration'>  Optional: false
      POSTGRES_PASSWORD:          <set to the key 'password' in secret 'awx-postgres-configuration'>  Optional: false
      PGDATA:                     /var/lib/postgresql/data/pgdata
      POSTGRES_INITDB_ARGS:       --auth-host=scram-sha-256
      POSTGRES_HOST_AUTH_METHOD:  scram-sha-256
    Mounts:
      /var/lib/postgresql/data from postgres (rw,path="data")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-z4bwr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  postgres:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  postgres-awx-postgres-0
    ReadOnly:   false
  kube-api-access-z4bwr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From                     Message
  ----     ------                  ----  ----                     -------
  Warning  FailedScheduling        13m   default-scheduler        0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               13m   default-scheduler        Successfully assigned default/awx-postgres-0 to mwg-csm-n03
  Normal   SuccessfulAttachVolume  13m   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-7b7cdd59-3f39-4f23-a76c-f01e3fe4fba0"
  Normal   Pulling                 12m   kubelet                  Pulling image "postgres:12"
  Normal   Pulled                  12m   kubelet                  Successfully pulled image "postgres:12" in 17.444457891s
  Normal   Created                 12m   kubelet                  Created container postgres
  Normal   Started                 12m   kubelet                  Started container postgres

@PugTheBlack
Copy link
Author

# microk8s kubectl describe pods awx-operator-5d9d764bcd-hdfxk
Name:         awx-operator-5d9d764bcd-hdfxk
Namespace:    default
Priority:     0
Node:         mwg-csm-n03/192.168.50.43
Start Time:   Thu, 06 May 2021 05:49:21 +0000
Labels:       name=awx-operator
              pod-template-hash=5d9d764bcd
Annotations:  cni.projectcalico.org/podIP: 10.1.238.235/32
              cni.projectcalico.org/podIPs: 10.1.238.235/32
Status:       Running
IP:           10.1.238.235
IPs:
  IP:           10.1.238.235
Controlled By:  ReplicaSet/awx-operator-5d9d764bcd
Containers:
  awx-operator:
    Container ID:   containerd://932ed30d4d9f776c2f3fb75d8274bfa78f658e75e42979ae5d3a12bad1812b9b
    Image:          quay.io/ansible/awx-operator:0.9.0
    Image ID:       quay.io/ansible/awx-operator@sha256:00bb025d3607f072cb1967f72817e8c31deb485b91503b4fb3b1ca62ff857103
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 06 May 2021 05:50:01 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:6789/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Environment:
      WATCH_NAMESPACE:    
      POD_NAME:           awx-operator-5d9d764bcd-hdfxk (v1:metadata.name)
      OPERATOR_NAME:      awx-operator
      ANSIBLE_GATHERING:  explicit
    Mounts:
      /tmp/ansible-operator/runner from runner (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-krlz9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  runner:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-krlz9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  17m   default-scheduler  Successfully assigned default/awx-operator-5d9d764bcd-hdfxk to mwg-csm-n03
  Normal  Pulling    17m   kubelet            Pulling image "quay.io/ansible/awx-operator:0.9.0"
  Normal  Pulled     17m   kubelet            Successfully pulled image "quay.io/ansible/awx-operator:0.9.0" in 30.290555877s
  Normal  Created    17m   kubelet            Created container awx-operator
  Normal  Started    17m   kubelet            Started container awx-operator

@PugTheBlack
Copy link
Author

Still seems to be something fishy here though ... the regular "awx-" pod is not created, and the awx-operator

{"level":"error","ts":1620281677.3637104,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"awx-controller","request":"default/awx","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"}

--------------------------- Ansible Task StdOut -------------------------------

 TASK [Apply Resources] ******************************** 
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'tower_loadbalancer_annotations' is undefined\n\nThe error appears to be in '/opt/ansible/roles/installer/tasks/resources_configuration.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Apply Resources\n  ^ here\n"}

@tchellomello
Copy link
Contributor

I was able to reproduce the issue by creating the following spec:

apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx-lb-annotations
  namespace: default
spec:
  kind: AWX
  tower_admin_user: admin
  tower_ingress_type: LoadBalancer <===== added this

Here is the error:

[awx-operator-84694f9865-kvbc2] ok: [localhost] => (item=tower_persistent) => {"ansible_loop_var": "item", "changed": false, "item": "tower_persistent", "result": {"results": []}} 
[awx-operator-84694f9865-kvbc2] fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'tower_loadbalancer_annotations' is undefined\n\nThe error appears to be in '/opt/ansible/roles/installer/tasks/resources_configuration.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Apply Resources\n  ^ here\n"} 
 awx-operator-84694f9865-kvbc2] 
 awx-operator-84694f9865-kvbc2] PLAY RECAP *********************************************************************
 awx-operator-84694f9865-kvbc2] localhost                  : ok=29   changed=0    unreachable=0    failed=1    skipped=26   rescued=0    ignored=0   

It's a bug that must be fixed.

@PugTheBlack
Copy link
Author

Changed the "my-awx.yaml" file to:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  tower_ingress_type: LoadBalancer
  tower_loadbalancer_annotations: ''    <===== Added this
  tower_loadbalancer_protocol: http
  tower_postgres_resource_requirements:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      cpu: 1000m
      memory: 4Gi
  tower_postgres_storage_requirements:
    requests:
      storage: 8Gi
    limits:
      storage: 50Gi
  tower_postgres_storage_class: rook-ceph-block
  tower_web_resource_requirements:
    requests:
      cpu: 1000m
      memory: 2Gi
    limits:
      cpu: 2000m
      memory: 4Gi
  tower_task_resource_requirements:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 1000m
      memory: 2Gi
  tower_projects_persistence: true
  tower_projects_storage_class: rook-ceph-block
  tower_projects_storage_access_mode: ReadWriteOnce
  tower_projects_storage_size: 20Gi

And then the deployment went through as I expected :)

# microk8s kubectl get pods -o wide   
NAME                            READY   STATUS    RESTARTS   AGE   IP             NODE          NOMINATED NODE   READINESS GATES
awx-operator-5d9d764bcd-hdfxk   1/1     Running   0          5d    10.1.238.235   mwg-csm-n03   <none>           <none>
awx-postgres-0                  1/1     Running   0          77m   10.1.238.207   mwg-csm-n03   <none>           <none>
awx-6f94b56cdf-jddq8            4/4     Running   0          76m   10.1.238.208   mwg-csm-n03   <none>           <none>
# microk8s kubectl get svc -o wide   
NAME                   TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)             AGE   SELECTOR
kubernetes             ClusterIP      10.152.183.1     <none>           443/TCP             11d   <none>
awx-operator-metrics   ClusterIP      10.152.183.55    <none>           8383/TCP,8686/TCP   5d    name=awx-operator
awx-postgres           ClusterIP      None             <none>           5432/TCP            73m   app.kubernetes.io/component=database,app.kubernetes.io/managed-by=awx-operator,app.kubernetes.io/name=awx-postgres
awx-service            LoadBalancer   10.152.183.224   192.168.50.101   80:30350/TCP        73m   app.kubernetes.io/component=awx,app.kubernetes.io/managed-by=awx-operator,app.kubernetes.io/name=awx

# microk8s kubectl get pvc -o wide   
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE   VOLUMEMODE
postgres-awx-postgres-0   Bound    pvc-4f4e9caa-961e-42ea-926d-daa3007f12bd   8Gi        RWO            rook-ceph-block   75m   Filesystem
awx-projects-claim        Bound    pvc-cb3634da-362f-4953-af0d-6b8f6be0e195   20Gi       RWO            rook-ceph-block   74m   Filesystem

Agree that it's a bug that needs fixing, but the workaround seems to be working fine, so I'm happy :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants