-
Does k8ssandra provide any functions to replace / remove a specific cassandra node?
-
K8ssandraTask runs a command specified by K8ssandraTask.spec.template.jobs.command. Is that an arbitrary string, e.g. “/usr/bin/ls”?
Thanks.
Does k8ssandra provide any functions to replace / remove a specific cassandra node?
K8ssandraTask runs a command specified by K8ssandraTask.spec.template.jobs.command. Is that an arbitrary string, e.g. “/usr/bin/ls”?
Thanks.
Hi @vs123,
yes you can replace a broken node using a ReplaceNode
K8ssandraTask
looking like this:
apiVersion: control.k8ssandra.io/v1alpha1
kind: K8ssandraTask
metadata:
name: test-replacenode1
namespace: <namespace>
spec:
cluster:
name: <cluster name>
namespace: <namespace>
datacenters:
- dc1
template:
jobs:
- args:
pod_name: test-dc1-r1-sts-2
command: replacenode
name: ''
It will reboostrap the node with a new PV, keeping token ownership and streaming data from the remaining replicas.
Jobs can’t run arbitrary commands, just a fixed set.
OK, I tried that. I don’t see any changes in the cluster.
k8ssandra-operator-cass-operator (1.16.0) logs show:
2024-03-12T21:44:47.828Z ERROR Reconciler error {“controller”: “cassandratask”, “controllerGroup”: “control.k8ssandra.io”, “controllerKind”: “CassandraTask”, “CassandraTask”: {“name”:“replace-node1-dc1”,“namespace”:“k8ssandra-operator”}, “namespace”: “k8ssandra-operator”, “name”: “replace-node1-dc1”, “reconcileID”: “55d72e4a-593f-4856-88a1-ad54312a8658”, “error”: “CassandraTask.control.k8ssandra.io "replace-node1-dc1" is invalid: [conditions[0].message: Required value, conditions[0].reason: Required value]”}
“kubectl describe K8ssandraTask -A” outputs for this task:
Status:
Conditions:
Last Transition Time: 2024-03-12T21:22:56Z
Message:
Reason: Running
Status: False
Type: Running
Last Transition Time: 2024-03-12T21:22:56Z
Message:
Reason: Failed
Status: False
Type: Failed
Datacenters:
dc1:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreateCassandraTask 34m k8ssandratask-controller Created CassandraTask k8ssandra-operator.replace-node1-dc1
I tried running similar command later, now cass-operator logs show:
2024-03-12T23:32:58.885Z DEBUG this job isn’t allowed to run due to ConcurrencyPolicy restrictions {“controller”: “cassandratask”, “controllerGroup”: “control.k8ssandra.io”, “controllerKind”: “CassandraTask”, “CassandraTask”: {“name”:“replace-node4-dc1”,“namespace”:“k8ssandra-operator”}, “namespace”: “k8ssandra-operator”, “name”: “replace-node4-dc1”, “reconcileID”: “0f421de2-bdc7-4275-ae2f-888de97840ce”, “datacenterName”: “dc1”, “clusterName”: “test-cassandra”, “activeTasks”: 1}
You should check the cass-operator pod logs to see what’s the underlying error.
Please share with us the manifest you used for the K8ssandraTask and also the manifest of the corresponding generated CassandraTask.
Then you’ll have to delete your K8ssandraTasks before trying again it seems.
I collected more information. Thanks for looking into this.
Using k8ssandra-operator-1.12.0.
replace.yml:
apiVersion: control.k8ssandra.io/v1alpha1
kind: K8ssandraTask
metadata:
name: replace-node7
namespace: k8ssandra-operator
spec:
cluster:
name: test-cassandra
namespace: k8ssandra-operator
#datacenters:
template:
jobs:
- args:
pod_name: test-cassandra-dc1-us-west-2a-sts-0
rack: us-west-2a
command: replacenode
name: ''
$ kubectl describe K8ssandraTask -A
Name: replace-node7
Namespace: k8ssandra-operator
Labels: <none>
Annotations: <none>
API Version: control.k8ssandra.io/v1alpha1
Kind: K8ssandraTask
Metadata:
Creation Timestamp: 2024-04-08T23:41:41Z
Finalizers:
control.k8ssandra.io/finalizer
Generation: 1
Owner References:
API Version: k8ssandra.io/v1alpha1
Block Owner Deletion: true
Controller: true
Kind: K8ssandraCluster
Name: test-cassandra
UID: fac9a86b-d312-4766-bfe6-147950266252
Resource Version: 77439
UID: cafee656-8f9d-477c-9665-93b351a0acb9
Spec:
Cluster:
Name: test-cassandra
Namespace: k8ssandra-operator
Template:
Jobs:
Args:
pod_name: test-cassandra-dc1-us-west-2a-sts-0
Rack: us-west-2a
Command: replacenode
Name:
Status:
Conditions:
Last Transition Time: 2024-04-08T23:41:42Z
Message:
Reason: Running
Status: False
Type: Running
Last Transition Time: 2024-04-08T23:41:42Z
Message:
Reason: Failed
Status: False
Type: Failed
Datacenters:
dc1:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreateCassandraTask 103s k8ssandratask-controller Created CassandraTask k8ssandra-operator.replace-node7-dc1
$ kubectl describe CassandraTask -A
Name: replace-node7-dc1
Namespace: k8ssandra-operator
Labels: app.kubernetes.io/component=cassandra
app.kubernetes.io/created-by=cass-operator
app.kubernetes.io/instance=cassandra-test-cassandra
app.kubernetes.io/managed-by=cass-operator
app.kubernetes.io/name=cassandra
app.kubernetes.io/part-of=k8ssandra
app.kubernetes.io/version=4.1.3
cassandra.datastax.com/cluster=test-cassandra
cassandra.datastax.com/datacenter=dc1
control.k8ssandra.io/status=active
k8ssandra.io/task-name=replace-node7
k8ssandra.io/task-namespace=k8ssandra-operator
Annotations: <none>
API Version: control.k8ssandra.io/v1alpha1
Kind: CassandraTask
Metadata:
Creation Timestamp: 2024-04-08T23:41:41Z
Generation: 2
Owner References:
API Version: cassandra.datastax.com/v1beta1
Block Owner Deletion: true
Controller: true
Kind: CassandraDatacenter
Name: dc1
UID: bcec25f8-9652-4a33-86ec-d47c02730e7b
Resource Version: 77437
UID: 06ed1aa7-a7ca-4495-849c-f41bb6a3f0bf
Spec:
Concurrency Policy: Forbid
Datacenter:
Name: dc1
Namespace: k8ssandra-operator
Jobs:
Args:
pod_name: test-cassandra-dc1-us-west-2a-sts-0
Rack: us-west-2a
Command: replacenode
Name:
Restart Policy: Never
Ttl Seconds After Finished: 0
Events: <none>
k8ssandra-operator-cass-operator logs:
2024-04-08T23:41:42.030Z INFO KubeAPIWarningLogger unknown field "status.conditions[0].lastProbeTime"
2024-04-08T23:41:42.030Z ERROR Reconciler error {"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"replace-node7-dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "replace-node7-dc1", "reconcileID": "878cadc5-a80b-4f58-b306-25bbd6f9fd0e", "error": "CassandraTask.control.k8ssandra.io \"replace-node7-dc1\" is invalid: [conditions[0].message: Required value, conditions[0].reason: Required value]"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235
Any comments? Thanks.
Hey, this is usually an indication of failed CRD upgrade (you have an old CassandraTask CRD installed in the system) or incompatible Kubernetes version.
Did you install k8ssandra-operator using Kustomize or Helm?
k8ssandra-operator is installed using helm, APP VERSION 1.12.0
kubectl version:
Server Version: v1.28.8-eks-adc7111
How do I check the version of CassandraTask CRD? I ran
kubectl get crd cassandratasks.control.k8ssandra.io -oyaml
It is recently created:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.12.0
creationTimestamp: "2024-04-15T18:41:21Z"
generation: 1
name: cassandratasks.control.k8ssandra.io
resourceVersion: "4267"
uid: 84b320dc-a728-461c-a802-86b63eb49770
You were right, we had some image tag mismatch. Fixed those, and replacenode works now.
Thanks.