Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

ig job stays in Completed state for a long time #1192

Open
f0rmiga opened this issue Oct 13, 2020 · 1 comment
Open

ig job stays in Completed state for a long time #1192

f0rmiga opened this issue Oct 13, 2020 · 1 comment
Labels
bug Something isn't working unscheduled

Comments

@f0rmiga
Copy link
Member

f0rmiga commented Oct 13, 2020

I experienced an ig job that got into the Completed state and stayed there for a few minutes. Eventually, the controllers picked its output and moved the cluster state forward. While it was in this Completed state, the KubeCF cluster was broken, with a few pods delete. E.g. the following is the pod list in an HA deployment. Notice the missing api, uaa, diego-cell and router replicas.

NAME                                     READY   STATUS      RESTARTS   AGE
api-0                                    15/15   Running     5          24m
auctioneer-0                             4/4     Running     1          28m
bosh-dns-755d6b884b-cwqgw                1/1     Running     0          13m
bosh-dns-755d6b884b-h92mh                1/1     Running     0          13m
cc-worker-0                              4/4     Running     2          27m
cf-apps-dns-564fc5cf4d-jzbcv             1/1     Running     0          14m
cf-apps-dns-564fc5cf4d-qnw46             1/1     Running     0          14m
credhub-0                                6/6     Running     0          27m
credhub-1                                6/6     Running     0          29m
database-0                               2/2     Running     0          13m
database-seeder-8f24862205dd7db3-46p5n   0/2     Completed   0          118m
diego-api-0                              6/6     Running     2          28m
diego-cell-0                             7/7     Running     2          22m
diego-cell-1                             7/7     Running     1          25m
doppler-0                                4/4     Running     0          27m
doppler-1                                4/4     Running     0          27m
doppler-2                                4/4     Running     0          28m
ig-a01395ca9859fa55-rv65v                0/22    Completed   0          13m
log-api-0                                7/7     Running     0          27m
log-cache-0                              8/8     Running     0          28m
nats-0                                   4/4     Running     0          28m
nats-1                                   4/4     Running     0          28m
router-0                                 5/5     Running     0          27m
routing-api-0                            4/4     Running     2          27m
scheduler-0                              10/10   Running     6          27m
tcp-router-0                             5/5     Running     0          28m
uaa-0                                    7/7     Running     0          25m

The following is a dump of the cf-operator and quarks-job controllers:

cf_operator_logs.txt
quarks_job_logs.txt

@f0rmiga f0rmiga added the bug Something isn't working label Oct 13, 2020
@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175255805

The labels on this github issue will be updated when the story is started.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working unscheduled
Projects
None yet
Development

No branches or pull requests

2 participants