Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CassandraRoleManager skipped default role setup: some nodes were not ready #397

Open
kong62 opened this issue Nov 28, 2020 · 2 comments
Open

Comments

@kong62
Copy link

kong62 commented Nov 28, 2020

I kept getting an error when I redeployed because sts cannot create pods in parallel
I need podManagementPolicy: Parallel, but see github code commit:

@@ -242,7 +241,7 @@ private V1beta2StatefulSet generateStatefulSet(DataCenterKey dataCenterKey, V1Co
                )
                .spec(new V1beta2StatefulSetSpec()
                        .serviceName("cassandra")
                        .podManagementPolicy("Parallel")
                        //.podManagementPolicy("Parallel")
                        .replicas(dataCenter.getSpec().getReplicas().intValue())
                        .selector(new V1LabelSelector().putMatchLabelsItem("cassandra-datacenter", dataCenterKey.name))
                        .template(new V1PodTemplateSpec()
# kubectl  get pod                                  
NAME                                  READY   STATUS             RESTARTS   AGE
cassandra-cassandra-dc1-dc1-rack1-0   1/2     Running            0          6m12s
cassandra-operator-6f685694c5-l7m27   1/1     Running            0          4d7h

# kubectl  get pvc
NAME                                              STATUS   VOLUME                                      CAPACITY   ACCESS MODES   STORAGECLASS                             AGE
data-volume-cassandra-cassandra-dc1-dc1-rack1-0   Bound    disk-29d3cfdd-dc5a-457e-bae1-6b72dcc34c37   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h12m
data-volume-cassandra-cassandra-dc1-dc1-rack1-1   Bound    disk-9a8621f6-3f8b-428e-b69d-72cde007c7cf   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h6m
data-volume-cassandra-cassandra-dc1-dc1-rack1-2   Bound    disk-1971e0c4-fdf5-4adf-85fa-c1e9e53b7658   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h5m
data-volume-cassandra-cassandra-dc1-dc1-rack1-3   Bound    disk-5be7e523-a3cc-4b32-9149-6a3ab5e44ed2   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h3m
data-volume-cassandra-cassandra-dc1-dc1-rack1-4   Bound    disk-4a4d235b-871f-45ff-be57-c4ed7c9b4ad2   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h2m
data-volume-cassandra-cassandra-dc1-dc1-rack1-5   Bound    disk-b9c45b99-f169-413b-b8dc-65b97d205264   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h
data-volume-cassandra-cassandra-dc1-dc1-rack1-6   Bound    disk-c2bf3596-a986-4099-b746-316ddaf36c8f   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   72m
data-volume-cassandra-cassandra-dc1-dc1-rack1-7   Bound    disk-89fae7ec-9f5a-4b2f-9191-631f66ac71b8   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   57m


# kubectl  logs -f  cassandra-cassandra-dc1-dc1-rack1-0 -c cassandra
INFO  [main] Server.java:159 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
INFO  [main] CassandraDaemon.java:564 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
INFO  [main] CassandraDaemon.java:650 Startup complete
WARN  [OptionalTasks:1] CassandraRoleManager.java:377 CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] CassandraRoleManager.java:416 Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] CassandraRoleManager.java:377 CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] CassandraRoleManager.java:416 Setup task failed with error, rescheduling
@smiklosovic
Copy link
Collaborator

Can you elaborate on why you need it to be in parallel?

@kong62
Copy link
Author

kong62 commented Nov 30, 2020

When the cluster has been in a crash after an avalanche( a lot of nodes crash), I deleted the statefulset and then re-created it, so that pod started one by one, pod0 will not be ready because it is looking for other members, but other members must start pod0 successfully before starting.

# kubectl  get pod                                  
NAME                                  READY   STATUS             RESTARTS   AGE
cassandra-cassandra-dc1-dc1-rack1-0   0/2     CrashLoopBackOff   5          4d7h
cassandra-cassandra-dc1-dc1-rack1-1   2/2     Running            7          4d7h
cassandra-cassandra-dc1-dc1-rack1-2   1/2     Running            4          4d7h
cassandra-cassandra-dc1-dc1-rack1-3   0/2     CrashLoopBackOff   11         4d7h
cassandra-cassandra-dc1-dc1-rack1-4   0/2     CrashLoopBackOff   4          4d7h
cassandra-cassandra-dc1-dc1-rack1-5   0/2     CrashLoopBackOff   8          4d7h


#  kubectl  delete -f example-dc.yaml
#  kubectl  apply -f example-dc.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants