Fixing a k8ssandracluster state after manually removing a cleanup k8ssandratask

In accordance to this message @ github, I managed to successfully stop an automatically launched cleanup process by simply:

  • deleting the relevant k8ssandratask object
  • stopping the cleanup process manually via the nodetool

I reached my goal (not going to cover it here), but the problem is, now both my k8ssandracluster and cassandradatacenter objects are apparently stuck in the “Updating” state:

k8ssandracluster k8s status
$ kubectl get k8ssandraclusters.k8ssandra.io dc1 -o yaml
...
status:
  ...
  datacenters:
    data:
      cassandra:
        cassandraOperatorProgress: Updating
        conditions:
        - ...
          status: "True"
          type: Healthy
        - ...
          status: "False"
          type: Stopped
        - ...
          status: "False"
          type: ReplacingNodes
        - ...
          status: "True"
          type: Updating
        - .....
          status: "False"
          type: RollingRestart
        - ...
          status: "False"
          type: Resuming
        - ...
          status: "False"
          type: ScalingDown
        - ...
          status: "True"
          type: Valid
        - ...
          status: "True"
          type: Initialized
        - ...
          status: "True"
          type: Ready
        - ...
          status: "True"
          type: ScalingUp
        ...
        trackedTasks:
        - name: cleanup-1724866034
          namespace: ns1
  error: None
cassandradatacenter k8s status
$ kubectl get cassandradatacenters.cassandra.datastax.com dc1 -o yaml
...
status:
  cassandraOperatorProgress: Updating
  conditions:
  - ...
    status: "True"
    type: Healthy
  - ...
    status: "False"
    type: Stopped
  - ...
    status: "False"
    type: ReplacingNodes
  - ...
    status: "True"
    type: Updating
  - ...
    status: "False"
    type: RollingRestart
  - ...
    status: "False"
    type: Resuming
  - ...
    status: "False"
    type: ScalingDown
  - ...
    status: "True"
    type: Valid
  - ...
    status: "True"
    type: Initialized
  - ...
    status: "True"
    type: Ready
  - ...
    status: "True"
    type: ScalingUp
  ...
  trackedTasks:
  - name: cleanup-1724866034
    namespace: ns1

As you can see, for both objects above these lines are present:

status:
  cassandraOperatorProgress: Updating
  conditions:
  ...
  - ...
    status: "True"
    type: Updating
  ...
  - ...
    status: "True"
    type: ScalingUp
  ...
  trackedTasks:
  - name: cleanup-1724866034
    namespace: ns1

, although there are no new nodes being added anymore, and the k8ssandratask mentioned (cleanup-1724866034) doesn’t exist anymore (I removed it manually via kubectl delete).
edit: Also, if I add new nodes to the cluster in question - auto-cleanup doesn’t launch anymore.

The question: how can I restore the cluster status back to normal?
The only idea I have for now is to edit both k8ssandracluster and cassandradatacenter objects’ statuses via kubectl (not even sure is it technically possible or not) and to remove the “trackedTasks” block (and pray it actually helps and doesn’t break anything else :slight_smile: ).

Are there any other [more proper] ways to fix this?

I managed to discuss this issue at the k8ssandra’s discord server - Discord
Thanks @burmanm

Short summary (say hello to ChatGPT :slight_smile: ):
Michael suggested that deleting the “trackedTask” will solve the issue and advised adding an annotation (cassandra.datastax.com/no-cleanup: true) to prevent automated cleanup tasks in the future. Michael also mentioned that the reconciliation might not happen immediately, and restarting the operator could speed up the process.

full discord discussion

someAlex — Today at 10:56 AM

In short:

  • I deleted a k8ssandratask object to stop an automatically launched cleanup process
  • Now both my k8ssandracluster and cassandradatacenter objects are in “Updating”/“ScalingUp” states, and both of them have a “trackedTasks” reference to the task I deleted.
  • If I add new nodes to the cluster, cleanup does not start automatically anymore

The question: how do I deal with it? The only idea I have is to delete the “trackedTasks” key from both k8ssandracluster/cassandradatacenter statuses.

Michael Burman — Today at 10:57 AM
Yeah, deleting trackedTask would do it
But for your “3” no automated cleanup task, you can also set an annotation to prevent that
cassandra.datastax.com/no-cleanup: true to your CassandraDatacenter

someAlex — Today at 11:03 AM

Yeah, deleting trackedTask would do it

Thanks for the reply. I assume I need to delete it from the cassandradatacenter object first, right?
edit: Or the order doesn’t matter?

But for your “3” no automated cleanup task, you can also set an annotation to prevent that
cassandra.datastax.com/no-cleanup: true to your CassandraDatacenter

Thanks, I knew about it from here Stop automating cleanup operations by default when scaling · Issue #496 · k8ssandra/cass-operator · GitHub , but in my case I wanted to stop an already running cleanup process (it was supposed to run for weeks at that cluster, and we didn’t know about that label prior), so I wanted some quick and dirty solutions :upside_down_face:

Michael Burman — Today at 11:04 AM
Well, CassandraDatacenter must turn to “Ready” before K8ssandraCluster will turn to “Ready”
But I guess in other ways it doesn’t matter

someAlex — Today at 11:05 AM
Got it, thanks a lot, I’ll try it and post an update just in case.

Michael Burman — Today at 11:06 AM
Note that changing status object will not trigger a reconcile
So it might take a while for the operator to refresh the status
Unless you manually make it notice it…
In which case just restarting the operator is probably the most simplest way for cass-operator
K8ssandraCluster will then get reconciled when CassandraDatacenter starts changing its status

someAlex — Today at 11:07 AM
Thanks for the info, I appreciate it.

P.S. And, yes - it worked like a charm in my case.