Medusa Restore doesn't work

alicango · August 6, 2024, 1:22am

Hi,
I implemented the Medusa into my k8ssandra cluster with serviceAccount. Medusa backup is working well, it can create a backup folder structure of the cluster and put the backup in the S3 bucket. But the Restore backup job can’t start when I try to restore action.
Let me show you my configurations and logs;
K8ssandraCluster.yaml

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: cassandra
  namespace: cassandra
spec:
  auth: true
  reaper:
    autoScheduling:
      enabled: true
  cassandra:
    telemetry: 
      prometheus:
        enabled: true
    serverVersion: "4.1.4"
	...
	...
	...
	serviceAccount: my-k8ssandra-backup
  medusa:
    storageProperties:
      storageProvider: s3
      credentialsType: role-based
      storageSecretRef:
        name: ""
      bucketName: my-k8ssandra-backup-store
      prefix: test-dev-2
      region: eu-central-1
      secure: true
      maxBackupAge: 15

My k8ssandra cluster;

$ k get pods -n cassandra | grep dc1
cassandra-dc1-eu-central-1a-sts-0         3/3     Running  0    11h
cassandra-dc1-eu-central-1b-sts-0         3/3     Running  0    11h
cassandra-dc1-eu-central-1c-sts-0         3/3     Running  0    11h
cassandra-dc1-medusa-standalone-abcdefg   1/1     Running  0     8h
cassandra-dc1-reaper-hijklmn              1/1     Running  0     3d11h

serviceAccount has been annotated;

$ k get serviceAccount my-k8ssandra-backup -n cassandra -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/my-k8ssandra-backup
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"my-k8ssandra-backup","namespace":"cassandra"}}
  name: my-k8ssandra-backup
  namespace: cassandra

IAMRole Trust Relationship(Policy has been attached to the Role);

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::123456789123:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/ABCDEFGHIJKLMNOPRSTUVYZX"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.eu-central-1.amazonaws.com/id/ABCDEFGHIJKLMNOPRSTUVYZX:sub": "system:serviceaccount:cassandra:my-k8ssandra-backup",
                    "oidc.eks.eu-central-1.amazonaws.com/id/ABCDEFGHIJKLMNOPRSTUVYZX:aud": "sts.amazonaws.com"
                }
            }
        }
    ]
}

IAM Policy permissions;

{
    "Statement": [
        {
            "Action": "s3:ListBucket",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::my-k8ssandra-backup-store",
            "Sid": "K8ssandraStorageRead"
        },
        {
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::my-k8ssandra-backup-store/*",
            "Sid": "K8ssandraStorageWrite"
        }
    ],
    "Version": "2012-10-17"
}

Medusa container info;

$ k describe pod cassandra-dc1-eu-central-1a-sts-0 -n cassandra

EKS version;

$ kubectl version --short
Client Version: v1.27.2
Kustomize Version: v5.0.1
Server Version: v1.29.6-eks-db838b0

k8ssandra-operator version;

$ k get HelmRelease.helm.toolkit.fluxcd.io xxxxx -n flux-system
NAME     AGE    READY   STATUS
xxxxx    164d   True    Helm upgrade succeeded for release cassandra/xxxxxxxx.v2 with chart k8ssandra-operator@1.14.1

Backup is working properly;

$ k get MedusaBackup -n cassandra
NAME                   STARTED   FINISHED   NODES   FILES   SIZE       COMPLETED   STATUS
test-medusa-backup-1   11h       11h        3       9160    30.19 GB   3           SUCCESS

But if I run a MedusaRestoreJob, it doesn’t start and I see the error logs if I check the standalone pod;

$ cat restoreBackup.yaml
apiVersion: medusa.k8ssandra.io/v1alpha1
kind: MedusaRestoreJob
metadata:
  name: test-restore-backup-1
  namespace: cassandra
spec:
  cassandraDatacenter: dc1
  backup: test-medusa-backup-1

$ kubectl apply -f restoreBackup.yaml
medusarestorejob.medusa.k8ssandra.io/test-restore-backup-1 created

$ k get medusarestorejob -n cassandra
NAME                    STARTED   FINISHED   ERROR
test-restore-backup-1

And this is the Error logs;

$ k logs cassandra-dc1-medusa-standalone-abcdefg -n cassandra -f

MEDUSA_MODE = GRPC
sleeping for 0 sec
Starting Medusa gRPC service
WARNING:root:The CQL_USERNAME environment variable is deprecated and has been replaced by the MEDUSA_CQL_USERNAME variable
WARNING:root:The CQL_PASSWORD environment variable is deprecated and has been replaced by the MEDUSA_CQL_PASSWORD variable
WARNING:root:The CQL_USERNAME environment variable is deprecated and has been replaced by the MEDUSA_CQL_USERNAME variable
WARNING:root:The CQL_PASSWORD environment variable is deprecated and has been replaced by the MEDUSA_CQL_PASSWORD variable
[2024-08-06 00:59:33,909] INFO: Init service
[2024-08-06 00:59:33,911] INFO: Starting server. Listening on port 50051.
[2024-08-06 01:02:20,159] DEBUG: Loading storage_provider: s3
[2024-08-06 01:02:20,167] INFO: Using credentials CensoredCredentials(access_key_id=None, secret_access_key=*****, region=eu-central-1)
[2024-08-06 01:02:20,168] INFO: Connecting to s3 with args {}
[2024-08-06 01:02:20,239] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:02:20,239] WARNING: Having to make a new event loop unexpectedly
[2024-08-06 01:02:20,239] DEBUG: Using selector: EpollSelector
[2024-08-06 01:02:40,361] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:03:20,489] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:04:40,649] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:06:40,829] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:08:41,005] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:10:41,185] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:10:41,262] DEBUG: Disconnecting from S3...
[2024-08-06 01:10:41,263] ERROR: Unexpected [AttributeError] raised by servicer method [/Medusa/GetBackups]
Traceback (most recent call last):
  File "/home/cassandra/medusa/service/grpc/server.py", line 228, in GetBackups
    backups = get_backups(connected_storage, self.config, True)
  File "/home/cassandra/medusa/listing.py", line 26, in get_backups
    cluster_backups = sorted(
  File "/home/cassandra/medusa/storage/__init__.py", line 360, in list_cluster_backups
    node_backups = sorted(
  File "/home/cassandra/medusa/storage/__init__.py", line 181, in list_node_backups
    backup_index_blobs = self.list_backup_index_blobs()
  File "/home/cassandra/medusa/storage/__init__.py", line 272, in list_backup_index_blobs
    return self.storage_driver.list_objects(path)
  File "/home/cassandra/.venv/lib/python3.10/site-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/home/cassandra/.venv/lib/python3.10/site-packages/retrying.py", line 266, in call
    raise attempt.get()
  File "/home/cassandra/.venv/lib/python3.10/site-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/home/cassandra/.venv/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/home/cassandra/.venv/lib/python3.10/site-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/home/cassandra/medusa/storage/abstract_storage.py", line 72, in list_objects
    objects = self.list_blobs(prefix=path)
  File "/home/cassandra/medusa/storage/abstract_storage.py", line 80, in list_blobs
    objects = loop.run_until_complete(self._list_blobs(prefix))
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/cassandra/medusa/storage/s3_base_storage.py", line 250, in _list_blobs
    ).build_full_result()
  File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/paginate.py", line 479, in build_full_result
    for response in self:
  File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/paginate.py", line 269, in __iter__
    response = self._make_request(current_kwargs)
  File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/paginate.py", line 357, in _make_request
    return self._method(**current_kwargs)
  File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 682, in grpc._cython.cygrpc._handle_exceptions
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 802, in _handle_rpc
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 547, in _handle_unary_unary_rpc
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 411, in _finish_handler_with_unary_response
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/cassandra/medusa/service/grpc/server.py", line 237, in GetBackups
    response.status = medusa_pb2.StatusType.UNKNOWN
AttributeError: Protocol message GetBackupsResponse has no "status" field.

k8ssandra-operator logs;

2024-08-06T01:10:41.265Z        ERROR   Failed to prepare restore       {"controller": "medusarestorejob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaRestoreJob", "MedusaRestoreJob": {"name":"test-restore-backup-1","namespace":"cassandra"}, "namespace": "cassandra", "name": "test-restore-backup-1", "reconcileID": "2c51112c-5290-4a05-b73f-d7051d09f106", "medusarestorejob": "cassandra/test-restore-backup-1", "error": "failed to get backups: rpc error: code = Internal desc = Unexpected <class 'AttributeError'>: Protocol message GetBackupsResponse has no \"status\" field."}
github.com/k8ssandra/k8ssandra-operator/controllers/medusa.(*MedusaRestoreJobReconciler).Reconcile
        /workspace/controllers/medusa/medusarestorejob_controller.go:124
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235
2024-08-06T01:10:41.265Z        ERROR   Reconciler error        {"controller": "medusarestorejob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaRestoreJob", "MedusaRestoreJob": {"name":"test-restore-backup-1","namespace":"cassandra"}, "namespace": "cassandra", "name": "test-restore-backup-1", "reconcileID": "2c51112c-5290-4a05-b73f-d7051d09f106", "error": "failed to get backups: rpc error: code = Internal desc = Unexpected <class 'AttributeError'>: Protocol message GetBackupsResponse has no \"status\" field."}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235

is there something configuration I missed? How can I fix this issue?
Many thanks

alicango · August 6, 2024, 4:50pm

can someone help me, please?

alicango · August 30, 2024, 1:26am

I think you don’t give support to users anymore. Right?

alexander · September 3, 2024, 6:22am

Hi @alicango, we still give support but some requests fall through the cracks as we also support cass-operator, medusa, reaper and other subprojects.
It looks like the Medusa standalone pod does not use the configured service account, and then cannot access the s3 bucket.
I recommend to upgrade to a newer version of k8ssandra-operator as we got rid of this standalone pod. The restore operation will be fully carried out by the sts pods, which shouldn’t have a problem accessing the bucket.

Topic		Replies	Views
Medusa restore problem with K8sssandra operator K8ssandra help	3	517	June 23, 2023
Medusa backup job created with no status - no logs in medusa container K8ssandra help	5	387	July 9, 2024
Medusa not able to connect to aws s3 bucket K8ssandra help	3	287	January 2, 2024
Unable to add new node to an existing cluster when medusa is enabled K8ssandra help	0	258	July 20, 2023
Medusa-restore start on every restart of cassandra DC K8ssandra help	16	5247	September 22, 2021

K8ssandra

Medusa Restore doesn't work

Related topics