Scenario
I have a K8ssandra cluster with 1 DC named argo-we-dc1 which has 6 nodes and 3 racks (r1, r2, r3). The cluster also has 3 stargate pods (1 per rack). Medusa is deployed in each cassandra pod.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
argo-argo-we-dc1-r1-stargate-deployment-5fc988476-pctt2 1/1 Running 0 4h29m 10.129.35.138 aks-dbaspot-73504503-vmss0000sm <none> <none>
argo-argo-we-dc1-r1-sts-0 3/3 Running 0 4h35m 10.129.33.8 aks-dbaspot-73504503-vmss0000sf <none> <none>
argo-argo-we-dc1-r1-sts-1 3/3 Running 0 4h35m 10.129.33.241 aks-dbaspot-73504503-vmss0000sh <none> <none>
argo-argo-we-dc1-r2-stargate-deployment-fb757c568-6ndtm 1/1 Running 0 4h29m 10.129.34.129 aks-dbaspot-73504503-vmss0000sk <none> <none>
argo-argo-we-dc1-r2-sts-0 3/3 Running 0 4h35m 10.129.35.149 aks-dbaspot-73504503-vmss0000sm <none> <none>
argo-argo-we-dc1-r2-sts-1 3/3 Running 0 4h35m 10.129.34.24 aks-dbaspot-73504503-vmss0000sj <none> <none>
argo-argo-we-dc1-r3-stargate-deployment-69bfbb49d4-dgg4t 1/1 Running 0 4h29m 10.129.35.144 aks-dbaspot-73504503-vmss0000sm <none> <none>
argo-argo-we-dc1-r3-sts-0 3/3 Running 0 4h35m 10.129.32.179 aks-dbaspot-73504503-vmss0000si <none> <none>
argo-argo-we-dc1-r3-sts-1 3/3 Running 0 4h19m 10.129.34.134 aks-dbaspot-73504503-vmss0000sk <none> <none>
k8ssandra-operator-b7c97b465-dnj8k 1/1 Running 0 4h20m 10.129.34.28 aks-dbaspot-73504503-vmss0000sj <none> <none>
k8ssandra-operator-b7c97b465-f9nlp 1/1 Running 7 (8h ago) 20h 10.129.33.249 aks-dbaspot-73504503-vmss0000sh <none> <none>
k8ssandra-operator-b7c97b465-fzf5r 1/1 Running 5 (8h ago) 8h 10.129.32.171 aks-dbaspot-73504503-vmss0000si <none> <none>
k8ssandra-operator-cass-operator-b666974f6-2r7hk 1/1 Running 7 (8h ago) 8h 10.129.33.240 aks-dbaspot-73504503-vmss0000sh <none> <none>
k8ssandra-operator-cass-operator-b666974f6-7bhww 1/1 Running 0 7h52m 10.129.34.14 aks-dbaspot-73504503-vmss0000sj <none> <none>
k8ssandra-operator-cass-operator-b666974f6-fxqmp 1/1 Running 0 4h20m 10.129.34.128 aks-dbaspot-73504503-vmss0000sk <none> <none>
AKS version: 1.25.5
K8ssandra-operator version: v1.5.2
Medusa version: 0.14.0-azure
Cassandra version: 4.0.4
Problem description: Backup/Restore problem with K8ssandra
Backups are working, but restores only work when we have DCs with only one cassandra node per rack. When we have DCs with more than one cassandra node per rack, restores don’t work.
Medusa configuration:
medusa:
containerResources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initContainerResources:
requests:
cpu: "1000m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
containerImage:
registry: internal_registry
repository: internal_repo
name: medusa
tag: 0.14.0-azure
pullPolicy: IfNotPresent
pullSecretRef:
name: internal_secret
storageProperties:
storageProvider: azure_blobs
storageSecretRef:
name: medusa-bucket-key
bucketName: k8ssandra
secure: false
Medusa backup configuration:
apiVersion: medusa.k8ssandra.io/v1alpha1
kind: MedusaBackupJob
metadata:
name: demo-backup
spec:
cassandraDatacenter: argo-we-dc1
When we run the get command of the backup object after the backup is completed:
kubectl get medusabackupjob/demo-backup -n infrastructure-k8ssandra -o yaml
The output is the following:
apiVersion: medusa.k8ssandra.io/v1alpha1
kind: MedusaBackupJob
metadata:
creationTimestamp: "2023-06-21T08:54:22Z"
generation: 1
labels:
argocd.argoproj.io/instance: infrastructure-k8ssandra-operator-argocd
managedFields:
- apiVersion: medusa.k8ssandra.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:argocd.argoproj.io/instance: {}
f:spec:
.: {}
f:backupType: {}
f:cassandraDatacenter: {}
manager: argocd-application-controller
operation: Update
time: "2023-06-21T08:54:22Z"
- apiVersion: medusa.k8ssandra.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:ownerReferences:
.: {}
k:{"uid":"8e4f78c2-7890-4b34-8739-3f09116f9634"}: {}
manager: manager
operation: Update
time: "2023-06-21T08:54:22Z"
- apiVersion: medusa.k8ssandra.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:finishTime: {}
f:finished: {}
f:startTime: {}
manager: manager
operation: Update
subresource: status
time: "2023-06-21T08:55:38Z"
name: demo-backup
namespace: infrastructure-k8ssandra
ownerReferences:
- apiVersion: cassandra.datastax.com/v1beta1
blockOwnerDeletion: true
controller: true
kind: CassandraDatacenter
name: argo-we-dc1
uid: 8e4f78c2-7890-4b34-8739-3f09116f9634
resourceVersion: "33751280"
uid: 213c5df9-ba8c-497b-a99c-6d83ebda6b57
spec:
backupType: differential
cassandraDatacenter: argo-we-dc1
status:
finishTime: "2023-06-21T08:55:38Z"
finished:
- argo-argo-we-dc1-r3-sts-0
- argo-argo-we-dc1-r1-sts-1
- argo-argo-we-dc1-r3-sts-1
- argo-argo-we-dc1-r2-sts-1
- argo-argo-we-dc1-r1-sts-0
- argo-argo-we-dc1-r2-sts-0
startTime: "2023-06-21T08:54:22Z"
Final logs of the medusa container after the backup is completed:
INFO:root:Backup done
[2023-06-21 08:55:19,925] INFO: Backup done
INFO:root:- Started: 2023-06-21 08:54:22
- Started extracting data: 2023-06-21 08:54:24
- Finished: 2023-06-21 08:55:19
[2023-06-21 08:55:19,925] INFO: - Started: 2023-06-21 08:54:22
- Started extracting data: 2023-06-21 08:54:24
- Finished: 2023-06-21 08:55:19
INFO:root:- Real duration: 0:00:55.759084 (excludes time waiting for other nodes)
[2023-06-21 08:55:19,926] INFO: - Real duration: 0:00:55.759084 (excludes time waiting for other nodes)
INFO:root:- 432 files, 1.81 GB
[2023-06-21 08:55:19,926] INFO: - 432 files, 1.81 GB
INFO:root:- 432 files copied from host
[2023-06-21 08:55:19,926] INFO: - 432 files copied from host
DEBUG:root:Emitting metrics
[2023-06-21 08:55:19,926] DEBUG: Emitting metrics
DEBUG:root:Done emitting metrics
[2023-06-21 08:55:19,927] DEBUG: Done emitting metrics
INFO:root:Updated from existing status: 0 to new status: 1 for backup id: demo-backup
[2023-06-21 08:55:19,927] INFO: Updated from existing status: 0 to new status: 1 for backup id: demo-backup
DEBUG:root:Done with backup, returning backup result information
[2023-06-21 08:55:19,927] DEBUG: Done with backup, returning backup result information
When we run nodetool status:
kubectl exec -it argo-argo-we-dc1-r1-sts-0 -n infrastructure-k8ssandra -c cassandra -- nodetool -u $CASS_USERNAME -pw $CASS_PASSWORD status
The output is the following:
Datacenter: argo-we-dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.129.35.142 1.78 GiB 16 51.5% 39b3ca3e-8129-45b9-b5cf-6f9e43214e93 r2
UN 10.129.33.161 1.67 GiB 16 51.5% 71c4c25d-71b5-49fe-89f2-6b2ca2ba54d0 r3
UN 10.129.33.241 1.81 GiB 16 51.5% eec3e68c-ba4a-4531-b75c-e81cd66c67d8 r1
UN 10.129.32.244 1.66 GiB 16 48.5% c4b0a92b-651b-42b3-b606-58c48ac9f5a9 r1
UN 10.129.34.6 1.63 GiB 16 48.5% 46088aa0-099c-4e27-a0c2-bbfdfa383de7 r2
UN 10.129.32.182 1.67 GiB 16 48.5% c07b5905-6d71-40c5-a26a-ef02c12eb3cd r3
Medusa restore configuration:
apiVersion: medusa.k8ssandra.io/v1alpha1
kind: MedusaRestoreJob
metadata:
name: argo-restore
namespace: infrastructure-k8ssandra
spec:
cassandraDatacenter: argo-we-dc1
backup: demo-backup
When we run the get command of the restore object after the restore is completed:
kubectl get medusarestorejob/argo-restore -o yaml -n infrastructure-k8ssandra
The output is the following:
apiVersion: medusa.k8ssandra.io/v1alpha1
kind: MedusaRestoreJob
metadata:
creationTimestamp: "2023-06-21T10:00:11Z"
generation: 1
labels:
argocd.argoproj.io/instance: infrastructure-k8ssandra-operator-argocd
managedFields:
- apiVersion: medusa.k8ssandra.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:argocd.argoproj.io/instance: {}
f:spec:
.: {}
f:backup: {}
f:cassandraDatacenter: {}
manager: argocd-application-controller
operation: Update
time: "2023-06-21T10:00:11Z"
- apiVersion: medusa.k8ssandra.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:ownerReferences:
.: {}
k:{"uid":"8e4f78c2-7890-4b34-8739-3f09116f9634"}: {}
manager: manager
operation: Update
time: "2023-06-21T10:00:11Z"
- apiVersion: medusa.k8ssandra.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:datacenterStopped: {}
f:restoreKey: {}
f:restorePrepared: {}
f:startTime: {}
manager: manager
operation: Update
subresource: status
time: "2023-06-21T10:00:26Z"
name: argo-restore
namespace: infrastructure-k8ssandra
ownerReferences:
- apiVersion: cassandra.datastax.com/v1beta1
blockOwnerDeletion: true
controller: true
kind: CassandraDatacenter
name: argo-we-dc1
uid: 8e4f78c2-7890-4b34-8739-3f09116f9634
resourceVersion: "33787691"
uid: e8ce7b35-486f-4cc2-aaf1-ef4e1f584549
spec:
backup: demo-backup
cassandraDatacenter: argo-we-dc1
status:
datacenterStopped: "2023-06-21T10:00:26Z"
restoreKey: a65fae69-9c33-4588-95b6-182323f24e0e
restorePrepared: true
startTime: "2023-06-21T10:00:11Z"
Medusa container logs:
MEDUSA_MODE = GRPC
sleeping for 0 sec
Starting Medusa gRPC service
WARNING:root:The CQL_USERNAME environment variable is deprecated and has been replaced by the MEDUSA_CQL_USERNAME variable
WARNING:root:The CQL_PASSWORD environment variable is deprecated and has been replaced by the MEDUSA_CQL_PASSWORD variable
WARNING:root:The CQL_USERNAME environment variable is deprecated and has been replaced by the MEDUSA_CQL_USERNAME variable
WARNING:root:The CQL_PASSWORD environment variable is deprecated and has been replaced by the MEDUSA_CQL_PASSWORD variable
INFO:root:Init service
[2023-06-21 10:03:23,995] INFO: Init service
DEBUG:root:Loading storage_provider: azure_blobs
[2023-06-21 10:03:23,995] DEBUG: Loading storage_provider: azure_blobs
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dbarepository.blob.core.windows.net:443
[2023-06-21 10:03:23,998] DEBUG: Starting new HTTPS connection (1): dbarepository.blob.core.windows.net:443
DEBUG:urllib3.connectionpool:https://dbarepository.blob.core.windows.net:443 "HEAD /k8ssandra?restype=container HTTP/1.1" 200 0
[2023-06-21 10:03:24,023] DEBUG: https://dbarepository.blob.core.windows.net:443 "HEAD /k8ssandra?restype=container HTTP/1.1" 200 0
INFO:root:Starting server. Listening on port 50051.
[2023-06-21 10:03:24,026] INFO: Starting server. Listening on port 50051.
Medusa-restore final logs in argo-argo-we-dc1-r1-sts-0 for example:
INFO:root:Finished restore of backup demo-backup
[2023-06-21 10:03:22,847] INFO: Finished restore of backup demo-backup
Mapping: {'in_place': True, 'host_map': {'argo-argo-we-dc1-r1-sts-1': {'source': ['argo-argo-we-dc1-r1-sts-1'], 'seed': False}, 'localhost': {'source': ['argo-argo-we-dc1-r1-sts-0'], 'seed': False}, 'argo-argo-we-dc1-r2-sts-1': {'source': ['argo-argo-we-dc1-r2-sts-1'], 'seed': False}, 'argo-argo-we-dc1-r2-sts-0': {'source': ['argo-argo-we-dc1-r2-sts-0'], 'seed': False}, 'argo-argo-we-dc1-r3-sts-1': {'source': ['argo-argo-we-dc1-r3-sts-1'], 'seed': False}, 'argo-argo-we-dc1-r3-sts-0': {'source': ['argo-argo-we-dc1-r3-sts-0'], 'seed': False}}}
Downloading backup demo-backup to /var/lib/cassandra
Medusa-restore final logs in argo-argo-we-dc1-r1-sts-1 which is in the same rack as the one above:
INFO:root:Finished restore of backup demo-backup
[2023-06-21 10:03:22,958] INFO: Finished restore of backup demo-backup
Mapping: {'in_place': True, 'host_map': {'argo-argo-we-dc1-r1-sts-0': {'source': ['argo-argo-we-dc1-r1-sts-1'], 'seed': False}, 'localhost': {'source': ['argo-argo-we-dc1-r1-sts-0'], 'seed': False}, 'argo-argo-we-dc1-r2-sts-1': {'source': ['argo-argo-we-dc1-r2-sts-1'], 'seed': False}, 'argo-argo-we-dc1-r2-sts-0': {'source': ['argo-argo-we-dc1-r2-sts-0'], 'seed': False}, 'argo-argo-we-dc1-r3-sts-1': {'source': ['argo-argo-we-dc1-r3-sts-1'], 'seed': False}, 'argo-argo-we-dc1-r3-sts-0': {'source': ['argo-argo-we-dc1-r3-sts-0'], 'seed': False}}}
Downloading backup demo-backup to /var/lib/cassandra
When we run nodetool status on argo-argo-we-dc1-r1-sts-0:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.129.32.244 ? 16 0.0% c4b0a92b-651b-42b3-b606-58c48ac9f5a9 r1
DN 10.129.34.6 ? 16 0.0% 46088aa0-099c-4e27-a0c2-bbfdfa383de7 r1
DN 10.129.32.182 ? 16 0.0% c07b5905-6d71-40c5-a26a-ef02c12eb3cd r1
Datacenter: argo-we-dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.129.33.149 1.67 GiB 16 100.0% 71c4c25d-71b5-49fe-89f2-6b2ca2ba54d0 r3
UN 10.129.34.24 1.78 GiB 16 100.0% 39b3ca3e-8129-45b9-b5cf-6f9e43214e93 r2
When we run nodetool status on argo-argo-we-dc1-r1-sts-1:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.129.32.244 ? 16 0.0% c4b0a92b-651b-42b3-b606-58c48ac9f5a9 r1
DN 10.129.34.6 ? 16 0.0% 46088aa0-099c-4e27-a0c2-bbfdfa383de7 r1
DN 10.129.32.182 ? 16 0.0% c07b5905-6d71-40c5-a26a-ef02c12eb3cd r1
Datacenter: argo-we-dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.129.33.241 1.81 GiB 16 100.0% eec3e68c-ba4a-4531-b75c-e81cd66c67d8 r1
UN 10.129.33.149 1.67 GiB 16 100.0% 71c4c25d-71b5-49fe-89f2-6b2ca2ba54d0 r3
UN 10.129.34.24 1.78 GiB 16 100.0% 39b3ca3e-8129-45b9-b5cf-6f9e43214e93 r2
When we run nodetool status on argo-argo-we-dc1-r2-sts-0:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.129.32.244 ? 16 0.0% c4b0a92b-651b-42b3-b606-58c48ac9f5a9 r1
DN 10.129.34.6 ? 16 0.0% 46088aa0-099c-4e27-a0c2-bbfdfa383de7 r1
DN 10.129.32.182 ? 16 0.0% c07b5905-6d71-40c5-a26a-ef02c12eb3cd r1
Datacenter: argo-we-dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.129.33.241 1.81 GiB 16 100.0% eec3e68c-ba4a-4531-b75c-e81cd66c67d8 r1
UN 10.129.33.149 1.67 GiB 16 100.0% 71c4c25d-71b5-49fe-89f2-6b2ca2ba54d0 r3
When we run nodetool status on argo-argo-we-dc1-r2-sts-1:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.129.32.244 ? 16 0.0% c4b0a92b-651b-42b3-b606-58c48ac9f5a9 r1
DN 10.129.34.6 ? 16 0.0% 46088aa0-099c-4e27-a0c2-bbfdfa383de7 r1
DN 10.129.32.182 ? 16 0.0% c07b5905-6d71-40c5-a26a-ef02c12eb3cd r1
Datacenter: argo-we-dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.129.33.241 1.81 GiB 16 100.0% eec3e68c-ba4a-4531-b75c-e81cd66c67d8 r1
UN 10.129.33.149 1.67 GiB 16 100.0% 71c4c25d-71b5-49fe-89f2-6b2ca2ba54d0 r3
UN 10.129.34.24 1.78 GiB 16 100.0% 39b3ca3e-8129-45b9-b5cf-6f9e43214e93 r2
When we run nodetool status on argo-argo-we-dc1-r3-sts-0:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.129.32.244 ? 16 0.0% c4b0a92b-651b-42b3-b606-58c48ac9f5a9 r1
DN 10.129.34.6 ? 16 0.0% 46088aa0-099c-4e27-a0c2-bbfdfa383de7 r1
DN 10.129.32.182 ? 16 0.0% c07b5905-6d71-40c5-a26a-ef02c12eb3cd r1
Datacenter: argo-we-dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.129.33.241 1.81 GiB 16 100.0% eec3e68c-ba4a-4531-b75c-e81cd66c67d8 r1
UN 10.129.34.24 1.78 GiB 16 100.0% 39b3ca3e-8129-45b9-b5cf-6f9e43214e93 r2
When we run nodetool status on argo-argo-we-dc1-r3-sts-1:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.129.32.244 ? 16 0.0% c4b0a92b-651b-42b3-b606-58c48ac9f5a9 r1
DN 10.129.34.6 ? 16 0.0% 46088aa0-099c-4e27-a0c2-bbfdfa383de7 r1
DN 10.129.32.182 ? 16 0.0% c07b5905-6d71-40c5-a26a-ef02c12eb3cd r1
Datacenter: argo-we-dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.129.34.134 1.67 GiB 16 100.0% 71c4c25d-71b5-49fe-89f2-6b2ca2ba54d0 r3
UN 10.129.33.241 1.81 GiB 16 100.0% eec3e68c-ba4a-4531-b75c-e81cd66c67d8 r1
UN 10.129.34.24 1.78 GiB 16 100.0% 39b3ca3e-8129-45b9-b5cf-6f9e43214e93 r2
Cluster pods (kubectl get pods -n infrastructure-k8ssandra -o wide):
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
argo-argo-we-dc1-r1-stargate-deployment-5fc988476-pctt2 1/1 Running 0 4h29m 10.129.35.138 aks-dbaspot-73504503-vmss0000sm <none> <none>
argo-argo-we-dc1-r1-sts-0 3/3 Running 0 4h35m 10.129.33.8 aks-dbaspot-73504503-vmss0000sf <none> <none>
argo-argo-we-dc1-r1-sts-1 3/3 Running 0 4h35m 10.129.33.241 aks-dbaspot-73504503-vmss0000sh <none> <none>
argo-argo-we-dc1-r2-stargate-deployment-fb757c568-6ndtm 1/1 Running 0 4h29m 10.129.34.129 aks-dbaspot-73504503-vmss0000sk <none> <none>
argo-argo-we-dc1-r2-sts-0 3/3 Running 0 4h35m 10.129.35.149 aks-dbaspot-73504503-vmss0000sm <none> <none>
argo-argo-we-dc1-r2-sts-1 3/3 Running 0 4h35m 10.129.34.24 aks-dbaspot-73504503-vmss0000sj <none> <none>
argo-argo-we-dc1-r3-stargate-deployment-69bfbb49d4-dgg4t 1/1 Running 0 4h29m 10.129.35.144 aks-dbaspot-73504503-vmss0000sm <none> <none>
argo-argo-we-dc1-r3-sts-0 3/3 Running 0 4h35m 10.129.32.179 aks-dbaspot-73504503-vmss0000si <none> <none>
argo-argo-we-dc1-r3-sts-1 3/3 Running 0 4h19m 10.129.34.134 aks-dbaspot-73504503-vmss0000sk <none> <none>
k8ssandra-operator-b7c97b465-dnj8k 1/1 Running 0 4h20m 10.129.34.28 aks-dbaspot-73504503-vmss0000sj <none> <none>
k8ssandra-operator-b7c97b465-f9nlp 1/1 Running 7 (8h ago) 20h 10.129.33.249 aks-dbaspot-73504503-vmss0000sh <none> <none>
k8ssandra-operator-b7c97b465-fzf5r 1/1 Running 5 (8h ago) 8h 10.129.32.171 aks-dbaspot-73504503-vmss0000si <none> <none>
k8ssandra-operator-cass-operator-b666974f6-2r7hk 1/1 Running 7 (8h ago) 8h 10.129.33.240 aks-dbaspot-73504503-vmss0000sh <none> <none>
k8ssandra-operator-cass-operator-b666974f6-7bhww 1/1 Running 0 7h52m 10.129.34.14 aks-dbaspot-73504503-vmss0000sj <none> <none>
k8ssandra-operator-cass-operator-b666974f6-fxqmp 1/1 Running 0 4h20m 10.129.34.128 aks-dbaspot-73504503-vmss0000sk <none> <none>
Steps to reproduce:
-
Create a K8ssandra cluster with 1 DC named argo-we-dc1 which has 6 nodes and 3 racks (r1, r2, r3). The cluster also has 3 stargate pods (1 per rack). Medusa is deployed in each cassandra pod.
-
Add data to the cluster
-
Create a medusa backup job
-
Create a medusa restore job
-
See the result of nodetool status
Expected result when we run nodetool command and after the restore is completed:
Datacenter: argo-we-dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.129.35.142 1.78 GiB 16 51.5% 39b3ca3e-8129-45b9-b5cf-6f9e43214e93 r2
UN 10.129.33.161 1.67 GiB 16 51.5% 71c4c25d-71b5-49fe-89f2-6b2ca2ba54d0 r3
UN 10.129.33.241 1.81 GiB 16 51.5% eec3e68c-ba4a-4531-b75c-e81cd66c67d8 r1
UN 10.129.32.244 1.66 GiB 16 48.5% c4b0a92b-651b-42b3-b606-58c48ac9f5a9 r1
UN 10.129.34.6 1.63 GiB 16 48.5% 46088aa0-099c-4e27-a0c2-bbfdfa383de7 r2
UN 10.129.32.182 1.67 GiB 16 48.5% c07b5905-6d71-40c5-a26a-ef02c12eb3cd r3
Important note:
When a k8ssandra cluster has DCs with racks that only have just 1 cassandra node, the backup and restore works. It does not work only when a rack has more than 1 cassandra node in it.