-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong number of AWX instances (unmanaged PostgreSQL, replicas>1) #462
Comments
operator and containers log files |
@tchellomello @rooftopcellist If either of you have some time next week, could you help look into this? |
If it's is needed I can reproduce and provide you with kubeconfig and external ip for the api for a few hours. |
Worked for me with an external DB on my end but I'll dig a little bit more. HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-7pdrp
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.023s
{
"ha": true,
"version": "19.2.2",
"active_node": "awx-toca-657778f5cb-7pdrp",
"install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
"instances": [
{
"node": "awx-toca-657778f5cb-7pdrp",
"uuid": "3a34f8fe-8336-4910-8c47-7193694a9536",
"heartbeat": "2021-07-10T19:35:26.186881Z",
"capacity": 296,
"version": "19.2.2"
},
{
"node": "awx-toca-657778f5cb-lm776",
"uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
"heartbeat": "2021-07-10T19:35:52.645367Z",
"capacity": 293,
"version": "19.2.2"
}
],
"instance_groups": [
{
"name": "tower",
"capacity": 0,
"instances": []
},
{
"name": "controlplane",
"capacity": 589,
"instances": [
"awx-toca-657778f5cb-7pdrp",
"awx-toca-657778f5cb-lm776"
]
},
{
"name": "default",
"capacity": 0,
"instances": []
}
kubectl get pods -w | grep awx-toca 15:35:01
awx-toca-657778f5cb-7pdrp 4/4 Running 0 2d14h
awx-toca-657778f5cb-lm776 0/4 Pending 0 0s
awx-toca-657778f5cb-lm776 0/4 Pending 0 0s
awx-toca-657778f5cb-lm776 0/4 Init:0/1 0 0s
awx-toca-657778f5cb-lm776 0/4 PodInitializing 0 1s
awx-toca-657778f5cb-lm776 4/4 Running 0 23s
[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO [-] awx.main.consumers client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group.
[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO [-] awx.main.consumers client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group.
[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group.
[awx-toca-657778f5cb-lm776 awx-toca-web] RESULT 2 |
The issue probably occurs when image_version=19.1.0 But if then set replicas: 3 (scale from 2 to 3) kubectl apply -f awx-deploy.yml api/v2/ping/ {
"ha": false,
"version": "19.2.2",
"active_node": "awx-848f64cdb4-29pcv",
"install_uuid": "88b63b97-2942-49c5-bc5f-e5006a7b5456",
"instances": [
{
"node": "awx-848f64cdb4-spt82",
"uuid": "27494bf7-6fa2-489e-bdb3-82466edbd49c",
"heartbeat": "2021-07-12T09:48:24.680690Z",
"capacity": 79,
"version": "19.2.2"
}
],
"instance_groups": [
{
"name": "controlplane",
"capacity": 79,
"instances": [
"awx-848f64cdb4-spt82"
]
},
{
"name": "default",
"capacity": 0,
"instances": []
}
]
} kubectl rollout restart -n awx deployment/awx api/v2/ping/ {
"ha": true,
"version": "19.2.2",
"active_node": "awx-657cd5b84-t5htk",
"install_uuid": "88b63b97-2942-49c5-bc5f-e5006a7b5456",
"instances": [
{
"node": "awx-657cd5b84-g8kx2",
"uuid": "30e28fc4-8c88-4922-a7e1-0196fe790f2f",
"heartbeat": "2021-07-12T10:00:33.162404Z",
"capacity": 79,
"version": "19.2.2"
},
{
"node": "awx-657cd5b84-rg9v4",
"uuid": "501a0ff7-9043-46f4-baae-4602de3107d2",
"heartbeat": "2021-07-12T10:00:36.591979Z",
"capacity": 79,
"version": "19.2.2"
},
{
"node": "awx-657cd5b84-t5htk",
"uuid": "a3308acc-04e9-4da7-88cf-71048d666ffb",
"heartbeat": "2021-07-12T10:00:38.958448Z",
"capacity": 79,
"version": "19.2.2"
}
],
"instance_groups": [
{
"name": "controlplane",
"capacity": 237,
"instances": [
"awx-657cd5b84-g8kx2",
"awx-657cd5b84-rg9v4",
"awx-657cd5b84-t5htk"
]
},
{
"name": "default",
"capacity": 0,
"instances": []
}
]
} |
With HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-lm776
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.015s
{
"ha": true,
"version": "19.2.2",
"active_node": "awx-toca-657778f5cb-lm776",
"install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
"instances": [
{
"node": "awx-toca-657778f5cb-4bzps",
"uuid": "617ccf03-2231-44ef-b512-7b97d3207feb",
"heartbeat": "2021-07-25T03:09:29.263282Z",
"capacity": 293,
"version": "19.2.2"
},
{
"node": "awx-toca-657778f5cb-lm776",
"uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
"heartbeat": "2021-07-25T03:09:49.130909Z",
"capacity": 293,
"version": "19.2.2"
}
],
"instance_groups": [
{
"name": "tower",
"capacity": 0,
"instances": []
},
{
"name": "controlplane",
"capacity": 586,
"instances": [
"awx-toca-657778f5cb-4bzps",
"awx-toca-657778f5cb-lm776"
]
},
{
"name": "default",
"capacity": 0,
"instances": []
}
]
} Then modified the AWX spec kubectl get pods -w | grep awx 23:10:10
awx-operator-df789fd9c-rqn2k 1/1 Running 0 32h
awx-toca-657778f5cb-4bzps 4/4 Running 0 32h
awx-toca-657778f5cb-lm776 4/4 Running 78 14d
awx-toca-657778f5cb-28fq9 0/4 Pending 0 0s
awx-toca-657778f5cb-28fq9 0/4 Pending 0 0s
awx-toca-657778f5cb-28fq9 0/4 Init:0/1 0 0s
awx-toca-657778f5cb-28fq9 0/4 PodInitializing 0 2s
awx-toca-657778f5cb-28fq9 4/4 Running 0 4s Looking the API, it worked as expected: HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-4bzps
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.014s
{
"ha": true,
"version": "19.2.2",
"active_node": "awx-toca-657778f5cb-4bzps",
"install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
"instances": [
{
"node": "awx-toca-657778f5cb-28fq9",
"uuid": "7801777c-93de-416f-841e-0eb9a1b721d2",
"heartbeat": "2021-07-25T03:10:55.501238Z",
"capacity": 296,
"version": "19.2.2"
},
{
"node": "awx-toca-657778f5cb-4bzps",
"uuid": "617ccf03-2231-44ef-b512-7b97d3207feb",
"heartbeat": "2021-07-25T03:11:29.447748Z",
"capacity": 293,
"version": "19.2.2"
},
{
"node": "awx-toca-657778f5cb-lm776",
"uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
"heartbeat": "2021-07-25T03:10:49.231003Z",
"capacity": 293,
"version": "19.2.2"
}
],
"instance_groups": [
{
"name": "tower",
"capacity": 0,
"instances": []
},
{
"name": "controlplane",
"capacity": 882,
"instances": [
"awx-toca-657778f5cb-28fq9",
"awx-toca-657778f5cb-4bzps",
"awx-toca-657778f5cb-lm776"
]
},
{
"name": "default",
"capacity": 0,
"instances": []
}
]
} Please keep in mind that any |
After deploy with replicas=2 i edit awx-deploy.yml (set replicas=3) and exec kubectl apply -f awx-deploy.yml |
@tklsnk yes that was what I did on my end here, however I cannot reproduce the same issue. |
Ok I'll try with another k8s cluster. |
Any updates on this @tklsnk? |
I'm sorry, haven't had a chance to try this yet. Hope to do this week. |
Works as expected with the alternative k8s cluster. Probably a problem with the specific k8s implementation of a specific cloud provider. |
Thank you for the feedback @tklsnk |
ISSUE TYPE
SUMMARY
After creation AWX deployment with external (unmanaged) PostgreSQL api/v2/instances shows count < replicas
ENVIRONMENT
STEPS TO REPRODUCE
kubectl apply -f awx-deploy.yml
EXPECTED RESULTS
AWX HA configuration with 2 instances
ACTUAL RESULTS
api/v2/ping/
api/v2/instances/
kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-5776c59677-74964 4/4 Running 0 14m
awx-5776c59677-h9mrj 4/4 Running 0 14m
kubectl exec pod/awx-5776c59677-74964 -n awx -c awx-web -it -- /bin/bash
bash-4.4$ awx-manage check_db
Database Version: PostgreSQL 12.7 (Ubuntu 12.7-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
kubectl logs -n awx pod/awx-5776c59677-74964 -c awx-task
...
File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/managers.py", line 107, in me
raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id
2021-07-09 09:17:27,304 INFO exited: callback-receiver (exit status 1; not expected)
...
ADDITIONAL INFORMATION
With managed PostgreSQL no such issue was noticed.
The problem is being solved after
kubectl rollout restart -n awx deployment/awx
kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-686dd7df69-52kgh 4/4 Running 0 4m26s
awx-686dd7df69-v8w2g 4/4 Running 0 4m23s
api/v2/ping/
api/v2/instances/
Later, if scale to 3 replicas, the problem is the same, but is also solved using
kubectl rollout restart -n awx deployment/awx
AWX-OPERATOR LOGS
The text was updated successfully, but these errors were encountered: