K8ssandra Forum

Medusa-restore start on every restart of cassandra DC

Hi cassandra-medusa team,

I found out this behaviour:
After a successfully restore I want to simulate a restart of the cassandra datacenter with this command:

k delete po cassandra-dc1-rack1-sts-0 -n cass-operator

unfortunately I see that this operation trigger all the time a new restore, I can check that is downloading files with this command:

k logs cassandra-dc1-rack1-sts-0 -n cass-operator -c medusa-restore -f

As a result, some tuples are missing, how can I prevent the execution of medusa-restore on cassandra restart?

1 Like

@diego just acknowledging your question to let you know we’re looking into it and we’ll get back to you soon. Cheers! :beers:

Hey @diego

After a successfully restore I want to simulate a restart of the cassandra datacenter

Are you aware that the restore operation already involves a restart of the CassandraDatacenter?

unfortunately I see that this operation trigger all the time a new restore

The medusa-restore init container runs all the time, but it only performs a restore operation when certain conditions are satisfied. When you trigger a restore, medusa-operator adds a couple environment variables to the medusa-restore init container:

  • BACKUP_NAME
  • RESTORE_KEY

Both of these env vars have to be set in order for the restore to be performed.

BACKUP_NAME specifies the name of the backup to restore.

RESTORE_KEY is a uuid generated by medusa-operator. It is written out to a file after the restore completes. A restore is performed if either the file does not exist or if the value in the file differs from the env var.

See here in the entry point script for reference.

Hi everybody,
I tried in another way:

  1. Restore successfully a backup A.
  2. after restart of cassandra I have added more tuples on tables.
  3. Reboot the node (is a single node cluster).
  4. On startup it automatically restore the “backup A” so the added tuples on point 2 are completely lost.
# k get po -n cass-operator -w
NAME                                         READY   STATUS             RESTARTS   AGE
cassandra-cass-operator-799c5d95fc-b4lxt     1/1     Running            1          13h
cassandra-crd-upgrader-job-k8ssandra-qw4vv   0/1     Completed          0          13h
cassandra-dc1-rack1-sts-0                    2/3     Running            3          31m
cassandra-medusa-operator-6c76b85787-2x4q4   1/1     Running            1          13h
cassandra-dc1-rack1-sts-0                    2/3     Terminating        3          35m
cassandra-dc1-rack1-sts-0                    2/3     Terminating        3          35m
cassandra-dc1-rack1-sts-0                    0/3     Terminating        3          35m
cassandra-dc1-rack1-sts-0                    0/3     Terminating        3          35m
cassandra-dc1-rack1-sts-0                    0/3     Terminating        3          35m
cassandra-dc1-rack1-sts-0                    0/3     Pending            0          0s
cassandra-dc1-rack1-sts-0                    0/3     Pending            0          0s
cassandra-dc1-rack1-sts-0                    0/3     Init:0/4           0          0s
cassandra-dc1-rack1-sts-0                    0/3     Init:0/4           0          1s
cassandra-dc1-rack1-sts-0                    0/3     Init:1/4           0          2s
cassandra-dc1-rack1-sts-0                    0/3     Init:1/4           0          3s
cassandra-dc1-rack1-sts-0                    0/3     Init:2/4           0          7s
cassandra-dc1-rack1-sts-0                    0/3     Init:3/4           0          8s
cassandra-dc1-rack1-sts-0                    0/3     Init:3/4           0          9s
cassandra-dc1-rack1-sts-0                    0/3     PodInitializing    0          23s
cassandra-dc1-rack1-sts-0                    1/3     Running            0          24s
cassandra-dc1-rack1-sts-0                    2/3     Running            0          31s
cassandra-dc1-rack1-sts-0                    2/3     Running            0          34s
cassandra-dc1-rack1-sts-0                    2/3     Running            0          40s
cassandra-dc1-rack1-sts-0                    3/3     Running            0          64s
cassandra-dc1-rack1-sts-0                    3/3     Running            0          64s

here below the log of medusa-restore container.

[2021-09-21 10:25:11,950] DEBUG: Cleaning (/tmp/medusa-restore-6e858c9d-ecb6-4c67-8473-793e9c47c2c2)
DEBUG:root:Remove folder /tmp/medusa-restore-6e858c9d-ecb6-4c67-8473-793e9c47c2c2 and content
[2021-09-21 10:25:11,950] DEBUG: Remove folder /tmp/medusa-restore-6e858c9d-ecb6-4c67-8473-793e9c47c2c2 and content
INFO:root:Finished restore of backup backup-202109201001
[2021-09-21 10:25:11,953] INFO: Finished restore of backup backup-202109201001
root@single70:~# velero backup get
NAME                  STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
backup-202109200943   Completed   0        0          2021-09-20 05:43:35 -0400 EDT   364d      default            <none>
backup-202109200957   Completed   0        0          2021-09-20 05:57:23 -0400 EDT   364d      default            <none>
backup-202109201001   Completed   0        0          2021-09-20 06:01:01 -0400 EDT   364d      default            <none>
backup-202109210100   Completed   0        0          2021-09-20 21:00:02 -0400 EDT   1y        default            <none>

Are you performing backups with velero? k8ssandra doesn’t have any integration with velero.

Hi John,
velero is not involved in this kind of test.

I would like to tests the following scenarios after a simple restore:

helm install restore1 k8ssandra/restore -n cass-operator --set name=restore1,backup.name=my-backup,cassandraDatacenter.name=dc1

Cassandra is restored, and ready for usage, so I perform different operations:

  1. kill cassandra process inside the container and let k8s to restart it.
    ok, in this scenario after a while, cassandra container is restarted correctly.

  2. force kubernetes to restart a datacenter.
    In this case medusa-restore is invoked but I think it shouldn’t.

k delete po cassandra-dc1-rack1-sts-0 -n cass-operator

  1. reboot the machine, so k8s should restart everything.
    also in this case in the init phase of cassandra datacenter, medusa-restore is called.

Diego

I asked about Velero because the output you showed above is not from k8ssandra backups.

Killing the cassandra process won’t cause the container to restart. There is another process, the management-api, that is pid 0. You need to kill it to cause the container to restart. None of this is necessary though as medusa-operator will update the status of the CassandraRestore object to let you know when the CassandraDatacenter is ready for usage. When the CassandraDatacenter is ready, medusa-operator will set the finished property in the status of the CassandraRestore object.

k delete po cassandra-dc1-rack1-sts-0 -n cass-operator will not cause a restart of the whole CassandraDatacenter, just that pod. As I mentioned previously, the CassandraDatacenter is already restarted as part of the restore process.

Lastly, the medusa-restore init container will always run when a pod starts. Are you saying that it is performing a restore operation a second time after you delete the pod?

Hi John,
yes you’re right, it looks like it actually restore for the second time.
in this log I see that is downloading again from s3 bucket:

k logs cassandra-dc1-rack1-sts-0 -n cass-operator -c medusa-restore -f

@diego can provide the output of k get pod cassandra-dc1-rack1-sts-0 -n cass-operator -o yaml? Actually I just want to see the environment variables for the medusa-restore initContainer.

Can you also provide the output of k exec -it cassandra-dc1-rack1-sts-0 -n cass-operator -- cat /var/lib/cassandra/.last-restore?

Can I see the output of k get cassandrarestore restore1 -o yaml?

Lastly what version of k8ssandra are you running?

version of k8ssandra is k8ssandra-1.2.0

k exec -it cassandra-dc1-rack1-sts-0 -n cass-operator -- cat /var/lib/cassandra/.last-restore
Defaulting container name to cassandra.
cat: /var/lib/cassandra/.last-restore: No such file or directory

output of this command is huge…

k get pod cassandra-dc1-rack1-sts-0 -n cass-operator -o yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/podIP: 10.0.251.114/32
    cni.projectcalico.org/podIPs: 10.0.251.114/32
  creationTimestamp: "2021-09-21T10:24:49Z"
  generateName: cassandra-dc1-rack1-sts-
  labels:
    app.kubernetes.io/managed-by: cass-operator
    cassandra.datastax.com/cluster: cassandra
    cassandra.datastax.com/datacenter: dc1
    cassandra.datastax.com/node-state: Started
    cassandra.datastax.com/rack: rack1
    cassandra.datastax.com/seed-node: "true"
    controller-revision-hash: cassandra-dc1-rack1-sts-7c4bdd9c5f
    statefulset.kubernetes.io/pod-name: cassandra-dc1-rack1-sts-0
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/managed-by: {}
          f:cassandra.datastax.com/cluster: {}
          f:cassandra.datastax.com/datacenter: {}
          f:cassandra.datastax.com/rack: {}
          f:controller-revision-hash: {}
          f:statefulset.kubernetes.io/pod-name: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"c22b30d5-3d95-4b7c-b230-bbeef5835a6d"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:affinity: {}
        f:containers:
          k:{"name":"cassandra"}:
            .: {}
            f:env:
              .: {}
              k:{"name":"DS_LICENSE"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"DSE_AUTO_CONF_OFF"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"DSE_MGMT_EXPLICIT_START"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"LOCAL_JMX"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"MGMT_API_EXPLICIT_START"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"USE_MGMT_API"}:
                .: {}
                f:name: {}
                f:value: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:lifecycle:
              .: {}
              f:preStop:
                .: {}
                f:exec:
                  .: {}
                  f:command: {}
            f:livenessProbe:
              .: {}
              f:failureThreshold: {}
              f:httpGet:
                .: {}
                f:path: {}
                f:port: {}
                f:scheme: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:name: {}
            f:ports:
              .: {}
              k:{"containerPort":7000,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
              k:{"containerPort":7001,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
              k:{"containerPort":7199,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
              k:{"containerPort":8080,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
              k:{"containerPort":9042,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
              k:{"containerPort":9103,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
              k:{"containerPort":9142,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
              k:{"containerPort":9160,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
            f:readinessProbe:
              .: {}
              f:failureThreshold: {}
              f:httpGet:
                .: {}
                f:path: {}
                f:port: {}
                f:scheme: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:resources:
              .: {}
              f:limits:
                .: {}
                f:memory: {}
              f:requests:
                .: {}
                f:cpu: {}
                f:memory: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/config"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/etc/cassandra"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/etc/encryption/"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/etc/podinfo"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/var/lib/cassandra"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/var/log/cassandra"}:
                .: {}
                f:mountPath: {}
                f:name: {}
          k:{"name":"medusa"}:
            .: {}
            f:env:
              .: {}
              k:{"name":"CQL_PASSWORD"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"CQL_USERNAME"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"MEDUSA_MODE"}:
                .: {}
                f:name: {}
                f:value: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:livenessProbe:
              .: {}
              f:exec:
                .: {}
                f:command: {}
              f:failureThreshold: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:name: {}
            f:ports:
              .: {}
              k:{"containerPort":50051,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:protocol: {}
            f:readinessProbe:
              .: {}
              f:exec:
                .: {}
                f:command: {}
              f:failureThreshold: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:resources: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/etc/cassandra"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/etc/medusa"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/etc/medusa-secrets"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/var/lib/cassandra"}:
                .: {}
                f:mountPath: {}
                f:name: {}
          k:{"name":"server-system-logger"}:
            .: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources:
              .: {}
              f:limits:
                .: {}
                f:cpu: {}
                f:memory: {}
              f:requests:
                .: {}
                f:cpu: {}
                f:memory: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/var/log/cassandra"}:
                .: {}
                f:mountPath: {}
                f:name: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:hostname: {}
        f:initContainers:
          .: {}
          k:{"name":"base-config-init"}:
            .: {}
            f:args: {}
            f:command: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/cassandra-base-config/"}:
                .: {}
                f:mountPath: {}
                f:name: {}
          k:{"name":"jmx-credentials"}:
            .: {}
            f:args: {}
            f:env:
              .: {}
              k:{"name":"REAPER_JMX_PASSWORD"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"REAPER_JMX_USERNAME"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"SUPERUSER_JMX_PASSWORD"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"SUPERUSER_JMX_USERNAME"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/config"}:
                .: {}
                f:mountPath: {}
                f:name: {}
          k:{"name":"medusa-restore"}:
            .: {}
            f:env:
              .: {}
              k:{"name":"BACKUP_NAME"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"CQL_PASSWORD"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"CQL_USERNAME"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"MEDUSA_MODE"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"RESTORE_KEY"}:
                .: {}
                f:name: {}
                f:value: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/etc/cassandra"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/etc/medusa"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/etc/medusa-secrets"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/etc/podinfo"}:
                .: {}
                f:mountPath: {}
                f:name: {}
              k:{"mountPath":"/var/lib/cassandra"}:
                .: {}
                f:mountPath: {}
                f:name: {}
          k:{"name":"server-config-init"}:
            .: {}
            f:env:
              .: {}
              k:{"name":"CONFIG_FILE_DATA"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"DSE_VERSION"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"HOST_IP"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:fieldRef:
                    .: {}
                    f:apiVersion: {}
                    f:fieldPath: {}
              k:{"name":"POD_IP"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:fieldRef:
                    .: {}
                    f:apiVersion: {}
                    f:fieldPath: {}
              k:{"name":"PRODUCT_NAME"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"PRODUCT_VERSION"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"RACK_NAME"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"USE_HOST_IP_FOR_BROADCAST"}:
                .: {}
                f:name: {}
                f:value: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources:
              .: {}
              f:limits:
                .: {}
                f:cpu: {}
                f:memory: {}
              f:requests:
                .: {}
                f:cpu: {}
                f:memory: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/config"}:
                .: {}
                f:mountPath: {}
                f:name: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext:
          .: {}
          f:fsGroup: {}
          f:runAsGroup: {}
          f:runAsUser: {}
        f:serviceAccount: {}
        f:serviceAccountName: {}
        f:terminationGracePeriodSeconds: {}
        f:volumes:
          .: {}
          k:{"name":"cassandra-config"}:
            .: {}
            f:emptyDir: {}
            f:name: {}
          k:{"name":"cassandra-medusa"}:
            .: {}
            f:configMap:
              .: {}
              f:defaultMode: {}
              f:items: {}
              f:name: {}
            f:name: {}
          k:{"name":"encryption-cred-storage"}:
            .: {}
            f:name: {}
            f:secret:
              .: {}
              f:defaultMode: {}
              f:secretName: {}
          k:{"name":"medusa-bucket-key"}:
            .: {}
            f:name: {}
            f:secret:
              .: {}
              f:defaultMode: {}
              f:secretName: {}
          k:{"name":"podinfo"}:
            .: {}
            f:downwardAPI:
              .: {}
              f:defaultMode: {}
              f:items: {}
            f:name: {}
          k:{"name":"server-config"}:
            .: {}
            f:emptyDir: {}
            f:name: {}
          k:{"name":"server-data"}:
            .: {}
            f:name: {}
            f:persistentVolumeClaim:
              .: {}
              f:claimName: {}
          k:{"name":"server-logs"}:
            .: {}
            f:emptyDir: {}
            f:name: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-09-21T10:24:49Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:cni.projectcalico.org/podIP: {}
          f:cni.projectcalico.org/podIPs: {}
    manager: calico
    operation: Update
    time: "2021-09-21T10:24:50Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:cassandra.datastax.com/node-state: {}
          f:cassandra.datastax.com/seed-node: {}
    manager: operator
    operation: Update
    time: "2021-09-21T10:25:29Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          k:{"type":"ContainersReady"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Initialized"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Ready"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
        f:containerStatuses: {}
        f:hostIP: {}
        f:initContainerStatuses: {}
        f:phase: {}
        f:podIP: {}
        f:podIPs:
          .: {}
          k:{"ip":"10.0.251.114"}:
            .: {}
            f:ip: {}
        f:startTime: {}
    manager: kubelet
    operation: Update
    time: "2021-09-21T10:25:53Z"
  name: cassandra-dc1-rack1-sts-0
  namespace: cass-operator
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: cassandra-dc1-rack1-sts
    uid: c22b30d5-3d95-4b7c-b230-bbeef5835a6d
  resourceVersion: "121509"
  uid: e29af75a-6269-43d8-832b-382de9a7b1b2
spec:
  affinity: {}
  containers:
  - env:
    - name: LOCAL_JMX
      value: "no"
    - name: DS_LICENSE
      value: accept
    - name: DSE_AUTO_CONF_OFF
      value: all
    - name: USE_MGMT_API
      value: "true"
    - name: MGMT_API_EXPLICIT_START
      value: "true"
    - name: DSE_MGMT_EXPLICIT_START
      value: "true"
    image: k8ssandra/cass-management-api:3.11.10-v0.1.25
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - wget
          - --output-document
          - /dev/null
          - --no-check-certificate
          - --post-data=''
          - http://localhost:8080/api/v0/ops/node/drain
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /api/v0/probes/liveness
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 15
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 1
    name: cassandra
    ports:
    - containerPort: 9042
      name: native
      protocol: TCP
    - containerPort: 9142
      name: tls-native
      protocol: TCP
    - containerPort: 7000
      name: internode
      protocol: TCP
    - containerPort: 7001
      name: tls-internode
      protocol: TCP
    - containerPort: 7199
      name: jmx
      protocol: TCP
    - containerPort: 8080
      name: mgmt-api-http
      protocol: TCP
    - containerPort: 9103
      name: prometheus
      protocol: TCP
    - containerPort: 9160
      name: thrift
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /api/v0/probes/readiness
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        memory: 12Gi
      requests:
        cpu: "1"
        memory: 8Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/cassandra
      name: cassandra-config
    - mountPath: /etc/podinfo
      name: podinfo
    - mountPath: /var/log/cassandra
      name: server-logs
    - mountPath: /var/lib/cassandra
      name: server-data
    - mountPath: /etc/encryption/
      name: encryption-cred-storage
    - mountPath: /config
      name: server-config
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-fzmss
      readOnly: true
  - env:
    - name: MEDUSA_MODE
      value: GRPC
    - name: CQL_USERNAME
      valueFrom:
        secretKeyRef:
          key: username
          name: cassandra-medusa
    - name: CQL_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: cassandra-medusa
    image: docker.io/k8ssandra/medusa:0.10.1
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/grpc_health_probe
        - -addr=:50051
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: medusa
    ports:
    - containerPort: 50051
      protocol: TCP
    readinessProbe:
      exec:
        command:
        - /bin/grpc_health_probe
        - -addr=:50051
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/medusa
      name: cassandra-medusa
    - mountPath: /etc/cassandra
      name: cassandra-config
    - mountPath: /var/lib/cassandra
      name: server-data
    - mountPath: /etc/medusa-secrets
      name: medusa-bucket-key
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-fzmss
      readOnly: true
  - image: k8ssandra/system-logger:9c4c3692
    imagePullPolicy: IfNotPresent
    name: server-system-logger
    resources:
      limits:
        cpu: 100m
        memory: 64M
      requests:
        cpu: 100m
        memory: 64M
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/log/cassandra
      name: server-logs
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-fzmss
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: cassandra-dc1-rack1-sts-0
  initContainers:
  - args:
    - -c
    - cp -r /etc/cassandra/* /cassandra-base-config/
    command:
    - /bin/sh
    image: k8ssandra/cass-management-api:3.11.10-v0.1.25
    imagePullPolicy: IfNotPresent
    name: base-config-init
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /cassandra-base-config/
      name: cassandra-config
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-fzmss
      readOnly: true
  - env:
    - name: POD_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: HOST_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: USE_HOST_IP_FOR_BROADCAST
      value: "false"
    - name: RACK_NAME
      value: rack1
    - name: PRODUCT_VERSION
      value: 3.11.10
    - name: PRODUCT_NAME
      value: cassandra
    - name: DSE_VERSION
      value: 3.11.10
    - name: CONFIG_FILE_DATA
      value: '{"cassandra-yaml":{"authenticator":"PasswordAuthenticator","authorizer":"CassandraAuthorizer","credentials_update_interval_in_ms":3600000,"credentials_validity_in_ms":3600000,"num_tokens":256,"permissions_update_interval_in_ms":3600000,"permissions_validity_in_ms":3600000,"role_manager":"CassandraRoleManager","roles_update_interval_in_ms":3600000,"roles_validity_in_ms":3600000},"cluster-info":{"name":"cassandra","seeds":"cassandra-seed-service"},"datacenter-info":{"graph-enabled":0,"name":"dc1","solr-enabled":0,"spark-enabled":0},"jvm-options":{"additional-jvm-opts":["-Dcassandra.system_distributed_replication_dc_names=dc1","-Dcassandra.system_distributed_replication_per_dc=1"],"heap_size_young_generation":"1G","initial_heap_size":"1G","max_heap_size":"1G"}}'
    image: docker.io/datastax/cass-config-builder:1.0.4
    imagePullPolicy: IfNotPresent
    name: server-config-init
    resources:
      limits:
        cpu: "1"
        memory: 256M
      requests:
        cpu: "1"
        memory: 256M
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /config
      name: server-config
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-fzmss
      readOnly: true
  - args:
    - /bin/sh
    - -c
    - echo "$REAPER_JMX_USERNAME $REAPER_JMX_PASSWORD" > /config/jmxremote.password
      && echo "$SUPERUSER_JMX_USERNAME $SUPERUSER_JMX_PASSWORD" >> /config/jmxremote.password
    env:
    - name: REAPER_JMX_USERNAME
      valueFrom:
        secretKeyRef:
          key: username
          name: cassandra-reaper-jmx
    - name: REAPER_JMX_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: cassandra-reaper-jmx
    - name: SUPERUSER_JMX_USERNAME
      valueFrom:
        secretKeyRef:
          key: username
          name: cassandra-superuser
    - name: SUPERUSER_JMX_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: cassandra-superuser
    image: busybox
    imagePullPolicy: IfNotPresent
    name: jmx-credentials
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /config
      name: server-config
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-fzmss
      readOnly: true
  - env:
    - name: MEDUSA_MODE
      value: RESTORE
    - name: CQL_USERNAME
      valueFrom:
        secretKeyRef:
          key: username
          name: cassandra-medusa
    - name: CQL_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: cassandra-medusa
    - name: BACKUP_NAME
      value: backup-202109201001
    - name: RESTORE_KEY
      value: 89a0c85b-25e1-458f-84c4-f6505eea1c7b
    image: docker.io/k8ssandra/medusa:0.10.1
    imagePullPolicy: IfNotPresent
    name: medusa-restore
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/medusa
      name: cassandra-medusa
    - mountPath: /etc/cassandra
      name: server-config
    - mountPath: /var/lib/cassandra
      name: server-data
    - mountPath: /etc/podinfo
      name: podinfo
    - mountPath: /etc/medusa-secrets
      name: medusa-bucket-key
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-fzmss
      readOnly: true
  nodeName: single70
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 999
    runAsGroup: 999
    runAsUser: 999
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 120
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: server-data
    persistentVolumeClaim:
      claimName: server-data-cassandra-dc1-rack1-sts-0
  - emptyDir: {}
    name: cassandra-config
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels
        path: labels
    name: podinfo
  - configMap:
      defaultMode: 420
      items:
      - key: medusa.ini
        path: medusa.ini
      name: cassandra-medusa
    name: cassandra-medusa
  - name: medusa-bucket-key
    secret:
      defaultMode: 420
      secretName: medusa-bucket-key
  - emptyDir: {}
    name: server-config
  - emptyDir: {}
    name: server-logs
  - name: encryption-cred-storage
    secret:
      defaultMode: 420
      secretName: dc1-keystore
  - name: default-token-fzmss
    secret:
      defaultMode: 420
      secretName: default-token-fzmss
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-09-21T10:25:12Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-09-21T10:25:53Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-09-21T10:25:53Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-09-21T10:24:49Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://a96a8ccd8ac332bc81860981a9e3d9dadb5267adcfb4cb7ebe37c4b6356e8f40
    image: docker.io/k8ssandra/cass-management-api:3.11.10-v0.1.25
    imageID: sha256:942f68352d0e5c0cd9c293ef4fcf54497dca7f8847a37c419ba8e8ea341753c2
    lastState: {}
    name: cassandra
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-09-21T10:25:12Z"
  - containerID: containerd://5007867fd49c6a522880f9690b14f5c5a332f01c1a6b9fa3fc92bef786784a27
    image: docker.io/k8ssandra/medusa:0.10.1
    imageID: sha256:7a55313927ae0825dae3b700ddef1bc5432c2b9fd152bc89a68422da69198df2
    lastState: {}
    name: medusa
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-09-21T10:25:12Z"
  - containerID: containerd://2251406ae1b82ccce49b57d1bea159473b38bb74cacb9c8d21a2455b5f7bf9a9
    image: docker.io/k8ssandra/system-logger:9c4c3692
    imageID: sha256:4dda0db106cb6543d0c7b7393f6164f03446dceb636d545be013b8e44f9cf4cc
    lastState: {}
    name: server-system-logger
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-09-21T10:25:13Z"
  hostIP: 10.130.42.26
  initContainerStatuses:
  - containerID: containerd://6bf190d337e88af59c9e880ef2b3a18a331503ebd1520aed5dd6c5ea4ffdb181
    image: docker.io/k8ssandra/cass-management-api:3.11.10-v0.1.25
    imageID: sha256:942f68352d0e5c0cd9c293ef4fcf54497dca7f8847a37c419ba8e8ea341753c2
    lastState: {}
    name: base-config-init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://6bf190d337e88af59c9e880ef2b3a18a331503ebd1520aed5dd6c5ea4ffdb181
        exitCode: 0
        finishedAt: "2021-09-21T10:24:50Z"
        reason: Completed
        startedAt: "2021-09-21T10:24:50Z"
  - containerID: containerd://e911983d9839999e21580724ab7c3205e4e1015504c0bce9ca12fac27817f793
    image: docker.io/datastax/cass-config-builder:1.0.4
    imageID: sha256:907e52ff2f78bf63c4fe9225e0f75a571913b6a58e8a25baf3d58420b60b592f
    lastState: {}
    name: server-config-init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://e911983d9839999e21580724ab7c3205e4e1015504c0bce9ca12fac27817f793
        exitCode: 0
        finishedAt: "2021-09-21T10:24:55Z"
        reason: Completed
        startedAt: "2021-09-21T10:24:51Z"
  - containerID: containerd://ed14a0375de39938d371b224b47633c77cc5f0a68e7a06e693487b355dc81d56
    image: docker.io/library/busybox:latest
    imageID: sha256:42b97d3c2ae95232263a04324aaf656dc80e7792dee6629a9eff276cdfb806c0
    lastState: {}
    name: jmx-credentials
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://ed14a0375de39938d371b224b47633c77cc5f0a68e7a06e693487b355dc81d56
        exitCode: 0
        finishedAt: "2021-09-21T10:24:56Z"
        reason: Completed
        startedAt: "2021-09-21T10:24:56Z"
  - containerID: containerd://3a2a3c9aeb18ee2ca5797a23088139d2fd6d2b634b4bbd4283e97b21ca847114
    image: docker.io/k8ssandra/medusa:0.10.1
    imageID: sha256:7a55313927ae0825dae3b700ddef1bc5432c2b9fd152bc89a68422da69198df2
    lastState: {}
    name: medusa-restore
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://3a2a3c9aeb18ee2ca5797a23088139d2fd6d2b634b4bbd4283e97b21ca847114
        exitCode: 0
        finishedAt: "2021-09-21T10:25:11Z"
        reason: Completed
        startedAt: "2021-09-21T10:24:57Z"
  phase: Running
  podIP: 10.0.251.114
  podIPs:
  - ip: 10.0.251.114
  qosClass: Burstable
  startTime: "2021-09-21T10:24:49Z"

k get cassandrarestore restore1 -o yaml -n cass-operator

apiVersion: cassandra.k8ssandra.io/v1alpha1
kind: CassandraRestore
metadata:
  annotations:
    meta.helm.sh/release-name: restore1
    meta.helm.sh/release-namespace: cass-operator
  creationTimestamp: "2021-09-20T21:06:50Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  managedFields:
  - apiVersion: cassandra.k8ssandra.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/managed-by: {}
      f:spec:
        .: {}
        f:backup: {}
        f:cassandraDatacenter:
          .: {}
          f:clusterName: {}
          f:name: {}
        f:inPlace: {}
        f:shutdown: {}
    manager: Go-http-client
    operation: Update
    time: "2021-09-20T21:06:50Z"
  - apiVersion: cassandra.k8ssandra.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:finishTime: {}
        f:restoreKey: {}
        f:startTime: {}
    manager: manager
    operation: Update
    time: "2021-09-20T21:09:25Z"
  name: restore1
  namespace: cass-operator
  resourceVersion: "120006"
  uid: fc60bc9b-29a2-4209-8f23-49528c4af69d
spec:
  backup: backup-202109201001
  cassandraDatacenter:
    clusterName: dc1
    name: dc1
  inPlace: false
  shutdown: true
status:
  finishTime: "2021-09-21T10:16:32Z"
  restoreKey: 8c140e88-24f5-4bbc-ba16-2b3788e1fe98
  startTime: "2021-09-20T21:06:50Z"

Many thanks for your time in advance.

I have run the same test on a similar environment:

  1. restored cassandra db from a previous backup: backup-202109201440
  2. Added tuples on a tables
  3. Reboot the system (it’s a single node cluster)
  4. The added tuples are missing

In the modusa-restore logs I can see that after reboot, the cassandra dc pod during Init have restored twice the backup backup-202109201440:

INFO:root:Finished restore of backup backup-202109201440
[2021-09-22 11:19:07,778] INFO: Finished restore of backup backup-202109201440

INFO:root:Finished restore of backup backup-202109201440
[2021-09-22 11:28:09,346] INFO: Finished restore of backup backup-202109201440

That restore are unexpected, and are causing data lose.
Could you please clarify what triggers the restore? As far as I can see a simple reboot of the host trigger the restore

Thanks
Andrea

In the “kubectl get pod -n cass-operator cassandra-dc1-rack1-sts-0 -o yaml” output as you see both that env variables are set

  • name: BACKUP_NAME
    value: backup-202109201440
    - name: RESTORE_KEY
    value: 508ee3c2-05a9-48d2-a7b4-f8f12eb52f27

So it seems they were not removed after the restore in step 1)

I have run “kubectl edit statefulset cassandra-dc1-rack1-sts -n cass-operator” to cleanup that env variables, before rebooting, and now after reboot no more restore are executed

@Andrea_Cucciarre are you also running 1.2.0?

yes, I suspect the problem is in the docker_entrypoint, the file .last_restore is never created

else
echo “Restoring backup $BACKUP_NAME”
exec python3 -m medusa.service.grpc.restore
echo $RESTORE_KEY > $last_restore_file

the echo is after an exec so it’s not executed and so the file .last_restore is not created

@Andrea_Cucciarre

Thanks for pointing it out. Looks like the regression was introduced in 0.10.1 in Medusa in this commit. I will create a ticket to get this fixed.

To close the loop on this on this discussion, I have created the following tickets: