Help Needed: Cassandra Pod Fails to Start Due to Corrupted Schema Table

Hi everyone,

We’re currently facing an issue with one of our Cassandra pods in a K8ssandra-cluster. One of the pods in the StatefulSet is unable to start due to what appears to be schema corruption. Here’s the error message we’re seeing in the logs:

No partition columns found for table ks_sv.test_test in system_schema.columns. This may be due to corruption or concurrent dropping and altering of a table. If this table is supposed to be dropped, restart cassandra with -Dcassandra.ignore_corrupted_schema_tables=true and run the following query to cleanup: 
"DELETE FROM system_schema.tables WHERE keyspace_name = 'ks_sv' AND table_name = 'test_test'; 
DELETE FROM system_schema.columns WHERE keyspace_name = 'ks_sv' AND table_name = 'test_test';" 
If the table is not supposed to be dropped, restore system_schema.columns sstables from backups.

We’re unsure how to proceed safely. Specifically, we’d like guidance on:

How to safely restart the affected pod with the -Dcassandra.ignore_corrupted_schema_tables=true flag in a K8ssandra-managed environment and run the cleanup query as suggested by the message from the logs. Or if we should restore the entire cluster from a back-up.

Any advice or steps from others who’ve encountered similar issues would be greatly appreciated. We’re trying to avoid data loss and ensure cluster stability.

After investigating the issue where a Cassandra pod failed to start due to a corrupted schema table, we resolved it by running a replacenode task using the K8ssandra Operator. This approach removes the corrupted node and its associated PersistentVolume (PV), allowing a clean node to start without any data.

Important: Only use this method if you’re confident that the remaining nodes in the cluster contain all the necessary data. This ensures that the new node can safely rebuild from the healthy replicas.

Here’s the exact task we used:

apiVersion: control.k8ssandra.io/v1alpha1
kind: K8ssandraTask
metadata:
  name: replacenode-task
  namespace: k8ssandra-operator
spec:
  cluster:
    name: k8ssandra
  template:
    jobs:
      - name: replacenode-job
        command: replacenode
        args:
          pod_name: k8ssandra-dc1-default-sts-0

This task:

  • Stops the corrupted pod.
  • Deletes its associated PV.
  • Starts a fresh pod that bootstraps from the rest of the cluster.