Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong number of AWX instances (unmanaged PostgreSQL, replicas>1) #462

Closed
tklsnk opened this issue Jul 9, 2021 · 13 comments
Closed

Wrong number of AWX instances (unmanaged PostgreSQL, replicas>1) #462

tklsnk opened this issue Jul 9, 2021 · 13 comments
Labels
component:operator type:bug Something isn't working

Comments

@tklsnk
Copy link

tklsnk commented Jul 9, 2021

ISSUE TYPE
  • Bug Report
SUMMARY

After creation AWX deployment with external (unmanaged) PostgreSQL api/v2/instances shows count < replicas

ENVIRONMENT
  • AWX version: 19.1.0
  • Operator version: 0.12.0
  • Kubernetes version: 1.17.9
  • AWX install method: operator
STEPS TO REPRODUCE

kubectl apply -f awx-deploy.yml

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  replicas: 2
  image_version: 19.1.0
  admin_user: admin
  admin_password_secret: awx-admin-password
  ingress_type: ingress
  ingress_annotations: |
   kubernetes.io/ingress.class: nginx
  hostname: awx-demo.example.com
  ingress_tls_secret: awx-ingress-tls
  web_resource_requirements:
     requests:
       cpu: 400m
       memory: 2Gi
     limits:
       cpu: 1000m
       memory: 4Gi
  task_resource_requirements:
     requests:
       cpu: 250m
       memory: 1Gi
     limits:
       cpu: 500m
       memory: 2Gi
  ee_resource_requirements:
     requests:
       cpu: 250m
       memory: 1Gi
     limits:
       cpu: 500m
       memory: 2Gi

---
apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: XXXX
  port: "XXXX"
  database: XXX
  username: XXX
  password: XXX
  type: unmanaged
type: Opaque
EXPECTED RESULTS

AWX HA configuration with 2 instances

ACTUAL RESULTS

api/v2/ping/

{
    "ha": false,
    "version": "19.1.0",
    "active_node": "awx-5776c59677-h9mrj",
    "install_uuid": "ba8b8bc6-1010-4e09-b5b2-08cc06901800",
    "instances": [
        {
            "node": "awx-5776c59677-h9mrj",
            "uuid": "5b18352d-24e7-47ce-a18d-e0e4cbd994d5",
            "heartbeat": "2021-07-09T09:09:26.165742Z",
            "capacity": 0,
            "version": "19.1.0"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        }
    ]
}

api/v2/instances/

{
    "count": 1,
    "next": null,
    "previous": null,
    "results": [
        {
            "id": 1,
            "type": "instance",
            "url": "/api/v2/instances/1/",
            "related": {
                "jobs": "/api/v2/instances/1/jobs/",
                "instance_groups": "/api/v2/instances/1/instance_groups/"
            },
            "uuid": "5b18352d-24e7-47ce-a18d-e0e4cbd994d5",
            "hostname": "awx-5776c59677-h9mrj",
            "created": "2021-07-09T09:08:31.072893Z",
            "modified": "2021-07-09T09:09:26.165742Z",
            "capacity_adjustment": "1.00",
            "version": "19.1.0",
            "capacity": 0,
            "consumed_capacity": 0,
            "percent_capacity_remaining": 0.0,
            "jobs_running": 0,
            "jobs_total": 0,
            "cpu": 0,
            "memory": 0,
            "cpu_capacity": 0,
            "mem_capacity": 0,
            "enabled": true,
            "managed_by_policy": true
        }
    ]
}

kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-5776c59677-74964 4/4 Running 0 14m
awx-5776c59677-h9mrj 4/4 Running 0 14m

kubectl exec pod/awx-5776c59677-74964 -n awx -c awx-web -it -- /bin/bash
bash-4.4$ awx-manage check_db
Database Version: PostgreSQL 12.7 (Ubuntu 12.7-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

kubectl logs -n awx pod/awx-5776c59677-74964 -c awx-task

...
File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/managers.py", line 107, in me
raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id
2021-07-09 09:17:27,304 INFO exited: callback-receiver (exit status 1; not expected)
...

ADDITIONAL INFORMATION

With managed PostgreSQL no such issue was noticed.

The problem is being solved after

kubectl rollout restart -n awx deployment/awx

kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-686dd7df69-52kgh 4/4 Running 0 4m26s
awx-686dd7df69-v8w2g 4/4 Running 0 4m23s

api/v2/ping/

{
    "ha": true,
    "version": "19.1.0",
    "active_node": "awx-686dd7df69-52kgh",
    "install_uuid": "ba8b8bc6-1010-4e09-b5b2-08cc06901800",
    "instances": [
        {
            "node": "awx-686dd7df69-52kgh",
            "uuid": "ea773db2-7007-47a8-9987-16ddc79d6ec3",
            "heartbeat": "2021-07-09T09:33:46.787935Z",
            "capacity": 0,
            "version": "19.1.0"
        },
        {
            "node": "awx-686dd7df69-v8w2g",
            "uuid": "acef28b0-3977-4dbe-8c10-e9c4f11adab8",
            "heartbeat": "2021-07-09T09:33:52.378214Z",
            "capacity": 0,
            "version": "19.1.0"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        }
    ]
}

api/v2/instances/

{
    "count": 2,
    "next": null,
    "previous": null,
    "results": [
        {
            "id": 3,
            "type": "instance",
            "url": "/api/v2/instances/3/",
            "related": {
                "jobs": "/api/v2/instances/3/jobs/",
                "instance_groups": "/api/v2/instances/3/instance_groups/"
            },
            "uuid": "ea773db2-7007-47a8-9987-16ddc79d6ec3",
            "hostname": "awx-686dd7df69-52kgh",
            "created": "2021-07-09T09:33:46.194006Z",
            "modified": "2021-07-09T09:33:46.787935Z",
            "capacity_adjustment": "1.00",
            "version": "19.1.0",
            "capacity": 0,
            "consumed_capacity": 0,
            "percent_capacity_remaining": 0.0,
            "jobs_running": 0,
            "jobs_total": 0,
            "cpu": 0,
            "memory": 0,
            "cpu_capacity": 0,
            "mem_capacity": 0,
            "enabled": true,
            "managed_by_policy": true
        },
        {
            "id": 4,
            "type": "instance",
            "url": "/api/v2/instances/4/",
            "related": {
                "jobs": "/api/v2/instances/4/jobs/",
                "instance_groups": "/api/v2/instances/4/instance_groups/"
            },
            "uuid": "acef28b0-3977-4dbe-8c10-e9c4f11adab8",
            "hostname": "awx-686dd7df69-v8w2g",
            "created": "2021-07-09T09:33:51.780698Z",
            "modified": "2021-07-09T09:33:52.378214Z",
            "capacity_adjustment": "1.00",
            "version": "19.1.0",
            "capacity": 0,
            "consumed_capacity": 0,
            "percent_capacity_remaining": 0.0,
            "jobs_running": 0,
            "jobs_total": 0,
            "cpu": 0,
            "memory": 0,
            "cpu_capacity": 0,
            "mem_capacity": 0,
            "enabled": true,
            "managed_by_policy": true
        }
    ]
}

Later, if scale to 3 replicas, the problem is the same, but is also solved using

kubectl rollout restart -n awx deployment/awx

AWX-OPERATOR LOGS
@tklsnk
Copy link
Author

tklsnk commented Jul 9, 2021

logs.tar.gz

operator and containers log files

@shanemcd
Copy link
Member

shanemcd commented Jul 9, 2021

@tchellomello @rooftopcellist If either of you have some time next week, could you help look into this?

@tklsnk
Copy link
Author

tklsnk commented Jul 9, 2021

If it's is needed I can reproduce and provide you with kubeconfig and external ip for the api for a few hours.

@tchellomello
Copy link
Contributor

Worked for me with an external DB on my end but I'll dig a little bit more.

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-7pdrp
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.023s

{
    "ha": true,
    "version": "19.2.2",
    "active_node": "awx-toca-657778f5cb-7pdrp",
    "install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
    "instances": [
        {
            "node": "awx-toca-657778f5cb-7pdrp",
            "uuid": "3a34f8fe-8336-4910-8c47-7193694a9536",
            "heartbeat": "2021-07-10T19:35:26.186881Z",
            "capacity": 296,
            "version": "19.2.2"
        },
        {
            "node": "awx-toca-657778f5cb-lm776",
            "uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
            "heartbeat": "2021-07-10T19:35:52.645367Z",
            "capacity": 293,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        },
        {
            "name": "controlplane",
            "capacity": 589,
            "instances": [
                "awx-toca-657778f5cb-7pdrp",
                "awx-toca-657778f5cb-lm776"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
 
kubectl get pods -w  | grep awx-toca                                                              15:35:01
awx-toca-657778f5cb-7pdrp                                  4/4     Running     0          2d14h
awx-toca-657778f5cb-lm776                                  0/4     Pending     0          0s
awx-toca-657778f5cb-lm776                                  0/4     Pending     0          0s
awx-toca-657778f5cb-lm776                                  0/4     Init:0/1    0          0s
awx-toca-657778f5cb-lm776                                  0/4     PodInitializing   0          1s
awx-toca-657778f5cb-lm776                                  4/4     Running           0          23s


[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO     [-] awx.main.consumers client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group. 
[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO     [-] awx.main.consumers client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group. 
[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO     client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group. 
[awx-toca-657778f5cb-lm776 awx-toca-web] RESULT 2 

@tklsnk
Copy link
Author

tklsnk commented Jul 12, 2021

The issue probably occurs when image_version=19.1.0
I tried with 19.2.2, creation of 2 replicas was successful.

But if then set replicas: 3 (scale from 2 to 3)

kubectl apply -f awx-deploy.yml

api/v2/ping/

{
    "ha": false,
    "version": "19.2.2",
    "active_node": "awx-848f64cdb4-29pcv",
    "install_uuid": "88b63b97-2942-49c5-bc5f-e5006a7b5456",
    "instances": [
        {
            "node": "awx-848f64cdb4-spt82",
            "uuid": "27494bf7-6fa2-489e-bdb3-82466edbd49c",
            "heartbeat": "2021-07-12T09:48:24.680690Z",
            "capacity": 79,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "controlplane",
            "capacity": 79,
            "instances": [
                "awx-848f64cdb4-spt82"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
    ]
}
kubectl rollout restart -n awx deployment/awx

api/v2/ping/

{
    "ha": true,
    "version": "19.2.2",
    "active_node": "awx-657cd5b84-t5htk",
    "install_uuid": "88b63b97-2942-49c5-bc5f-e5006a7b5456",
    "instances": [
        {
            "node": "awx-657cd5b84-g8kx2",
            "uuid": "30e28fc4-8c88-4922-a7e1-0196fe790f2f",
            "heartbeat": "2021-07-12T10:00:33.162404Z",
            "capacity": 79,
            "version": "19.2.2"
        },
        {
            "node": "awx-657cd5b84-rg9v4",
            "uuid": "501a0ff7-9043-46f4-baae-4602de3107d2",
            "heartbeat": "2021-07-12T10:00:36.591979Z",
            "capacity": 79,
            "version": "19.2.2"
        },
        {
            "node": "awx-657cd5b84-t5htk",
            "uuid": "a3308acc-04e9-4da7-88cf-71048d666ffb",
            "heartbeat": "2021-07-12T10:00:38.958448Z",
            "capacity": 79,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "controlplane",
            "capacity": 237,
            "instances": [
                "awx-657cd5b84-g8kx2",
                "awx-657cd5b84-rg9v4",
                "awx-657cd5b84-t5htk"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
    ]
}

@tchellomello
Copy link
Contributor

With replicas:2 this was the output from AWX API

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-lm776
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.015s

{
    "ha": true,
    "version": "19.2.2",
    "active_node": "awx-toca-657778f5cb-lm776",
    "install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
    "instances": [
        {
            "node": "awx-toca-657778f5cb-4bzps",
            "uuid": "617ccf03-2231-44ef-b512-7b97d3207feb",
            "heartbeat": "2021-07-25T03:09:29.263282Z",
            "capacity": 293,
            "version": "19.2.2"
        },
        {
            "node": "awx-toca-657778f5cb-lm776",
            "uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
            "heartbeat": "2021-07-25T03:09:49.130909Z",
            "capacity": 293,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        },
        {
            "name": "controlplane",
            "capacity": 586,
            "instances": [
                "awx-toca-657778f5cb-4bzps",
                "awx-toca-657778f5cb-lm776"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
    ]
}

Then modified the AWX spec kubectl edit awx awx-toca and set replicas:3 got the 3 as expected:

kubectl get pods -w | grep awx                                                                                          23:10:10
awx-operator-df789fd9c-rqn2k                               1/1     Running     0          32h
awx-toca-657778f5cb-4bzps                                  4/4     Running     0          32h
awx-toca-657778f5cb-lm776                                  4/4     Running     78         14d
awx-toca-657778f5cb-28fq9                                  0/4     Pending     0          0s
awx-toca-657778f5cb-28fq9                                  0/4     Pending     0          0s
awx-toca-657778f5cb-28fq9                                  0/4     Init:0/1    0          0s
awx-toca-657778f5cb-28fq9                                  0/4     PodInitializing   0          2s
awx-toca-657778f5cb-28fq9                                  4/4     Running           0          4s

Looking the API, it worked as expected:

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-4bzps
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.014s

{
    "ha": true,
    "version": "19.2.2",
    "active_node": "awx-toca-657778f5cb-4bzps",
    "install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
    "instances": [
        {
            "node": "awx-toca-657778f5cb-28fq9",
            "uuid": "7801777c-93de-416f-841e-0eb9a1b721d2",
            "heartbeat": "2021-07-25T03:10:55.501238Z",
            "capacity": 296,
            "version": "19.2.2"
        },
        {
            "node": "awx-toca-657778f5cb-4bzps",
            "uuid": "617ccf03-2231-44ef-b512-7b97d3207feb",
            "heartbeat": "2021-07-25T03:11:29.447748Z",
            "capacity": 293,
            "version": "19.2.2"
        },
        {
            "node": "awx-toca-657778f5cb-lm776",
            "uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
            "heartbeat": "2021-07-25T03:10:49.231003Z",
            "capacity": 293,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        },
        {
            "name": "controlplane",
            "capacity": 882,
            "instances": [
                "awx-toca-657778f5cb-28fq9",
                "awx-toca-657778f5cb-4bzps",
                "awx-toca-657778f5cb-lm776"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
    ]
}

Please keep in mind that any kubecl scale --replicas command issued manually will be overridden by the operator. All changes must be performed directly in the AWX spec. @tklsnk as I'm unable to reproduce it, could you confirm the steps you followed to scale it up?

@tklsnk
Copy link
Author

tklsnk commented Jul 25, 2021

After deploy with replicas=2 i edit awx-deploy.yml (set replicas=3) and exec kubectl apply -f awx-deploy.yml

@tchellomello
Copy link
Contributor

@tklsnk yes that was what I did on my end here, however I cannot reproduce the same issue.

@tklsnk
Copy link
Author

tklsnk commented Jul 26, 2021

Ok I'll try with another k8s cluster.
Thank you.

@tchellomello
Copy link
Contributor

Ok I'll try with another k8s cluster.
Thank you.

Any updates on this @tklsnk?

@tklsnk
Copy link
Author

tklsnk commented Jul 31, 2021

I'm sorry, haven't had a chance to try this yet. Hope to do this week.

@tklsnk
Copy link
Author

tklsnk commented Aug 2, 2021

Works as expected with the alternative k8s cluster. Probably a problem with the specific k8s implementation of a specific cloud provider.

@tklsnk tklsnk closed this as completed Aug 2, 2021
@tchellomello
Copy link
Contributor

Thank you for the feedback @tklsnk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:operator type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants