Hi,
I did a manual restore in a cluster created with k8ssandra and now the cluster is not healthy and some pods are restarting time to time.
I restored a medusa backup made in a different cluster but manually (because I saw the operator was not automatically creating the backup objects, I guess because it need to read from local storage (NFS)). I added initial_tokens to the cassandra.yaml file and paste the SSTables where they need to be.
All nodes are reading the data properly and they become healthy individually but the full cluster never become healthy. My cluster have 5 nodes and I saw 3 nodes healthy but eventually the operator starts to kill some pods and adding/deleting them from the seed-node svc.
My guess is the cluster is taking too much time to start and the operator try to fix it restarting some pods, but that is not working either.
I’m using the helm chart to configure everything and I didn’t see any key configuration to configure the operator. Can you help me here?
Sadly,
restoring from another cluster’s Medusa backup to K8ssandra is not yet supported in an automated fashion, we’re still working on it. Here be dragons if you try to do this manually, there are numbers of pitfalls, which I guess you’re experiencing.
One interesting thing to look at would be the logs from the server-system-logger container from both nodes that seem healthy and nodes that don’t. There should be some messages there that can give us a better idea of what’s going on.
A few questions though:
Which sstables did you restore? All of them including the system ones?
Did you use the same cluster name, datacenter name and rack names (and assignements) for both the source and target clusters?
Did you clean up the commit log directories when you restored the sstables?
Edit cassandra container command to edit cassandra.yaml file with initial_tokens just before the command to run ‘/tini -g – /docker-entrypoint.sh mgmtapi’ command.
Apply CassandraDatacenter object with previous changes and wait until the cluster is up and healthy (this step created the PVC with the storage system and everything related)
Download medusa backup (I added a new sidecar container to help with the task), ignoring system keyspaces. (command: medusa download --ignore-system-keyspaces …)
Restore the schema from the backup (command: cqlsh --file backup/schema.cql
Move all SSTables from the backup to the data folder on the new table-UIDs (table UIDs changed because I applied the old schema). I didn’t restore system tables.
Restart the cluster.
At this point up to 3/5 nodes are healthy until the operator start to kill/restart pods.
The content of /var/log/cassandra/stdout.log and stderr.log don’t show any relevant information. Everything looks fine.
I just deleted all nodes and started the full cluster again to check for the logs.
Logs from the first node to start doesn’t show an error. It only restarted:
Notes:
I disabled authentication and authorization to make it simple for now.
The cluster is completely offline. No apps is trying to send queries to it.
Here the CassandraDatacenter object I’m using, after my changes:
What happens if instead of restarting the cluster in step 6, you instead call “nodetool refresh” on all Cassandra pods?
I’d really need the logs from the pods that fail to restart in order to understand what’s happening.
The logs from the cass-operator pod would help as well if you could upload them to a gist or something similar.
Hi,
I tried to do a nodetool refresh, as you suggested, and Cassandra started to crash. So I guess it could be a problem with Cassandra itself and not with the operator.
I still need to investigate further but it will take a while. I will post here again if I find an issue with the operator or a solution to my problem.
Any updates on this? I’ve had a couple unrecoverable crashes with the same pattern as this. To me, it seems like the liveness and readiness checks timeout are kicking in and killing the pods while Cassandra is still initializing. In out case, there’s an 800GB table that can take up to 2 hours to initialize and the pods are being restarted way before that.
2 hours seems like a lot of time for a startup. In any case, the liveness probes should only kick in after the pod was marked as ready, which doesn’t seem to be the case here since Cassandra didn’t start up yet.
Do you know what Cassandra is doing during those 2 hours? Could it be possible that it was OOM killed due to high memory consumption?
Any interesting warnings in the Cassandra logs while it is starting up?