Cassandra Cluster is not starting after backup restore

Hi,
I did a manual restore in a cluster created with k8ssandra and now the cluster is not healthy and some pods are restarting time to time.

I restored a medusa backup made in a different cluster but manually (because I saw the operator was not automatically creating the backup objects, I guess because it need to read from local storage (NFS)). I added initial_tokens to the cassandra.yaml file and paste the SSTables where they need to be.

All nodes are reading the data properly and they become healthy individually but the full cluster never become healthy. My cluster have 5 nodes and I saw 3 nodes healthy but eventually the operator starts to kill some pods and adding/deleting them from the seed-node svc.

My guess is the cluster is taking too much time to start and the operator try to fix it restarting some pods, but that is not working either.

I’m using the helm chart to configure everything and I didn’t see any key configuration to configure the operator. Can you help me here?

Thanks for your time!

Sadly,
restoring from another cluster’s Medusa backup to K8ssandra is not yet supported in an automated fashion, we’re still working on it. Here be dragons if you try to do this manually, there are numbers of pitfalls, which I guess you’re experiencing.

One interesting thing to look at would be the logs from the server-system-logger container from both nodes that seem healthy and nodes that don’t. There should be some messages there that can give us a better idea of what’s going on.

A few questions though:
Which sstables did you restore? All of them including the system ones?
Did you use the same cluster name, datacenter name and rack names (and assignements) for both the source and target clusters?
Did you clean up the commit log directories when you restored the sstables?

Hi,
I restored it in a clean cluster.

The steps I follow are:

  1. Edit cassandra container command to edit cassandra.yaml file with initial_tokens just before the command to run ‘/tini -g – /docker-entrypoint.sh mgmtapi’ command.
  2. Apply CassandraDatacenter object with previous changes and wait until the cluster is up and healthy (this step created the PVC with the storage system and everything related)
  3. Download medusa backup (I added a new sidecar container to help with the task), ignoring system keyspaces. (command: medusa download --ignore-system-keyspaces …)
  4. Restore the schema from the backup (command: cqlsh --file backup/schema.cql
  5. Move all SSTables from the backup to the data folder on the new table-UIDs (table UIDs changed because I applied the old schema). I didn’t restore system tables.
  6. Restart the cluster.

At this point up to 3/5 nodes are healthy until the operator start to kill/restart pods.

The content of /var/log/cassandra/stdout.log and stderr.log don’t show any relevant information. Everything looks fine.

I just deleted all nodes and started the full cluster again to check for the logs.
Logs from the first node to start doesn’t show an error. It only restarted:

At that time I can see cassandra mgmtapi report unhealthy node (and I guess it restarts the node)

And cassandra-operator pod throws errors also because some nodes started to answer with http 500

Notes:
I disabled authentication and authorization to make it simple for now.
The cluster is completely offline. No apps is trying to send queries to it.

Here the CassandraDatacenter object I’m using, after my changes:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: datacenter1
  namespace: cassandra
spec:
  allowMultipleNodesPerWorker: true
  clusterName: lynx
  config:
    cassandra-yaml:
      authenticator: AllowAllAuthenticator
      authorizer: AllowAllAuthorizer
      credentials_update_interval_in_ms: 3600000
      credentials_validity_in_ms: 3600000
      num_tokens: 256
      permissions_update_interval_in_ms: 3600000
      permissions_validity_in_ms: 3600000
      role_manager: CassandraRoleManager
      roles_update_interval_in_ms: 3600000
      roles_validity_in_ms: 3600000
    jvm-options:
      additional-jvm-opts: null
  configBuilderImage: datastax/cass-config-builder:1.0.4
  disableSystemLoggerSidecar: true
  dockerImageRunsAsCassandra: true
  podTemplateSpec:
    metadata:
      creationTimestamp: null
    spec:
      containers:
        - name: cassandra
          command:
            - bash
            - -c
            - |
              echo initial_token: $(cat /cassandra-init-tokens/$(hostname)) >> /config/cassandra.yaml
              /tini -g -- /docker-entrypoint.sh mgmtapi
          volumeMounts:
            - mountPath: /etc/cassandra
              name: cassandra-config
            - mountPath: /etc/podinfo
              name: podinfo
            - mountPath: /cassandra-init-tokens/
              name: cassandra-init-tokens
        - name: medusa-cli
          image: drazul/cassandra-medusa:3.11.10
          command:
            - bash
            - -c
            - "trap : TERM INT; sleep infinity & wait"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /etc/medusa
              name: k8ssandra-medusa
            - mountPath: /etc/cassandra
              name: cassandra-config
            - mountPath: /var/lib/cassandra
              name: server-data
            - mountPath: /mnt/backups
              name: medusa-backups
      initContainers:
        - args:
            - -c
            - cp -r /etc/cassandra/* /cassandra-base-config/
          command:
            - /bin/sh
          image: k8ssandra/cass-management-api:3.11.11-v0.1.28
          imagePullPolicy: IfNotPresent
          name: base-config-init
          volumeMounts:
            - mountPath: /cassandra-base-config/
              name: cassandra-config
        - name: server-config-init
      volumes:
        - name: cassandra-config
        - downwardAPI:
            items:
              - fieldRef:
                  fieldPath: metadata.labels
                path: labels
          name: podinfo
        - configMap:
            items:
              - key: medusa.ini
                path: medusa.ini
            name: k8ssandra-medusa-custom
          name: k8ssandra-medusa
        - configMap:
            name: lynx-datacenter1-initial-tokens
          name: cassandra-init-tokens
        - name: medusa-backups
          mountPath: /mnt/backups
          nfs:
            path: /medusa-backup
            readOnly: true
            server: "****"
  racks:
    - name: rack1
  resources:
    limits:
      cpu: "4"
      memory: 12Gi
    requests:
      cpu: "4"
      memory: 12Gi
  serverImage: k8ssandra/cass-management-api:3.11.11-v0.1.28
  serverType: cassandra
  serverVersion: 3.11.11
  size: 5
  storageConfig:
    cassandraDataVolumeClaimSpec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 6Ti
      storageClassName: standard
  systemLoggerImage: k8ssandra/system-logger:9c4c3692

What happens if instead of restarting the cluster in step 6, you instead call “nodetool refresh” on all Cassandra pods?
I’d really need the logs from the pods that fail to restart in order to understand what’s happening.
The logs from the cass-operator pod would help as well if you could upload them to a gist or something similar.

Thanks

Hi,
I tried to do a nodetool refresh, as you suggested, and Cassandra started to crash. So I guess it could be a problem with Cassandra itself and not with the operator.

I still need to investigate further but it will take a while. I will post here again if I find an issue with the operator or a solution to my problem.

Thanks for your help!

Any updates on this? I’ve had a couple unrecoverable crashes with the same pattern as this. To me, it seems like the liveness and readiness checks timeout are kicking in and killing the pods while Cassandra is still initializing. In out case, there’s an 800GB table that can take up to 2 hours to initialize and the pods are being restarted way before that.

2 hours seems like a lot of time for a startup. In any case, the liveness probes should only kick in after the pod was marked as ready, which doesn’t seem to be the case here since Cassandra didn’t start up yet.
Do you know what Cassandra is doing during those 2 hours? Could it be possible that it was OOM killed due to high memory consumption?
Any interesting warnings in the Cassandra logs while it is starting up?