Hi,
I implemented the Medusa into my k8ssandra cluster with serviceAccount. Medusa backup is working well, it can create a backup folder structure of the cluster and put the backup in the S3 bucket. But the Restore backup job can’t start when I try to restore action.
Let me show you my configurations and logs;
K8ssandraCluster.yaml
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: cassandra
namespace: cassandra
spec:
auth: true
reaper:
autoScheduling:
enabled: true
cassandra:
telemetry:
prometheus:
enabled: true
serverVersion: "4.1.4"
...
...
...
serviceAccount: my-k8ssandra-backup
medusa:
storageProperties:
storageProvider: s3
credentialsType: role-based
storageSecretRef:
name: ""
bucketName: my-k8ssandra-backup-store
prefix: test-dev-2
region: eu-central-1
secure: true
maxBackupAge: 15
My k8ssandra cluster;
$ k get pods -n cassandra | grep dc1
cassandra-dc1-eu-central-1a-sts-0 3/3 Running 0 11h
cassandra-dc1-eu-central-1b-sts-0 3/3 Running 0 11h
cassandra-dc1-eu-central-1c-sts-0 3/3 Running 0 11h
cassandra-dc1-medusa-standalone-abcdefg 1/1 Running 0 8h
cassandra-dc1-reaper-hijklmn 1/1 Running 0 3d11h
serviceAccount has been annotated;
$ k get serviceAccount my-k8ssandra-backup -n cassandra -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/my-k8ssandra-backup
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"my-k8ssandra-backup","namespace":"cassandra"}}
name: my-k8ssandra-backup
namespace: cassandra
IAMRole Trust Relationship(Policy has been attached to the Role);
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789123:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/ABCDEFGHIJKLMNOPRSTUVYZX"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-central-1.amazonaws.com/id/ABCDEFGHIJKLMNOPRSTUVYZX:sub": "system:serviceaccount:cassandra:my-k8ssandra-backup",
"oidc.eks.eu-central-1.amazonaws.com/id/ABCDEFGHIJKLMNOPRSTUVYZX:aud": "sts.amazonaws.com"
}
}
}
]
}
IAM Policy permissions;
{
"Statement": [
{
"Action": "s3:ListBucket",
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-k8ssandra-backup-store",
"Sid": "K8ssandraStorageRead"
},
{
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-k8ssandra-backup-store/*",
"Sid": "K8ssandraStorageWrite"
}
],
"Version": "2012-10-17"
}
Medusa container info;
$ k describe pod cassandra-dc1-eu-central-1a-sts-0 -n cassandra
EKS version;
$ kubectl version --short
Client Version: v1.27.2
Kustomize Version: v5.0.1
Server Version: v1.29.6-eks-db838b0
k8ssandra-operator version;
$ k get HelmRelease.helm.toolkit.fluxcd.io xxxxx -n flux-system
NAME AGE READY STATUS
xxxxx 164d True Helm upgrade succeeded for release cassandra/xxxxxxxx.v2 with chart k8ssandra-operator@1.14.1
Backup is working properly;
$ k get MedusaBackup -n cassandra
NAME STARTED FINISHED NODES FILES SIZE COMPLETED STATUS
test-medusa-backup-1 11h 11h 3 9160 30.19 GB 3 SUCCESS
But if I run a MedusaRestoreJob, it doesn’t start and I see the error logs if I check the standalone pod;
$ cat restoreBackup.yaml
apiVersion: medusa.k8ssandra.io/v1alpha1
kind: MedusaRestoreJob
metadata:
name: test-restore-backup-1
namespace: cassandra
spec:
cassandraDatacenter: dc1
backup: test-medusa-backup-1
$ kubectl apply -f restoreBackup.yaml
medusarestorejob.medusa.k8ssandra.io/test-restore-backup-1 created
$ k get medusarestorejob -n cassandra
NAME STARTED FINISHED ERROR
test-restore-backup-1
And this is the Error logs;
$ k logs cassandra-dc1-medusa-standalone-abcdefg -n cassandra -f
MEDUSA_MODE = GRPC
sleeping for 0 sec
Starting Medusa gRPC service
WARNING:root:The CQL_USERNAME environment variable is deprecated and has been replaced by the MEDUSA_CQL_USERNAME variable
WARNING:root:The CQL_PASSWORD environment variable is deprecated and has been replaced by the MEDUSA_CQL_PASSWORD variable
WARNING:root:The CQL_USERNAME environment variable is deprecated and has been replaced by the MEDUSA_CQL_USERNAME variable
WARNING:root:The CQL_PASSWORD environment variable is deprecated and has been replaced by the MEDUSA_CQL_PASSWORD variable
[2024-08-06 00:59:33,909] INFO: Init service
[2024-08-06 00:59:33,911] INFO: Starting server. Listening on port 50051.
[2024-08-06 01:02:20,159] DEBUG: Loading storage_provider: s3
[2024-08-06 01:02:20,167] INFO: Using credentials CensoredCredentials(access_key_id=None, secret_access_key=*****, region=eu-central-1)
[2024-08-06 01:02:20,168] INFO: Connecting to s3 with args {}
[2024-08-06 01:02:20,239] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:02:20,239] WARNING: Having to make a new event loop unexpectedly
[2024-08-06 01:02:20,239] DEBUG: Using selector: EpollSelector
[2024-08-06 01:02:40,361] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:03:20,489] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:04:40,649] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:06:40,829] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:08:41,005] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:10:41,185] DEBUG: [Storage] Listing objects in test-dev-2/index/backup_index
[2024-08-06 01:10:41,262] DEBUG: Disconnecting from S3...
[2024-08-06 01:10:41,263] ERROR: Unexpected [AttributeError] raised by servicer method [/Medusa/GetBackups]
Traceback (most recent call last):
File "/home/cassandra/medusa/service/grpc/server.py", line 228, in GetBackups
backups = get_backups(connected_storage, self.config, True)
File "/home/cassandra/medusa/listing.py", line 26, in get_backups
cluster_backups = sorted(
File "/home/cassandra/medusa/storage/__init__.py", line 360, in list_cluster_backups
node_backups = sorted(
File "/home/cassandra/medusa/storage/__init__.py", line 181, in list_node_backups
backup_index_blobs = self.list_backup_index_blobs()
File "/home/cassandra/medusa/storage/__init__.py", line 272, in list_backup_index_blobs
return self.storage_driver.list_objects(path)
File "/home/cassandra/.venv/lib/python3.10/site-packages/retrying.py", line 56, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/home/cassandra/.venv/lib/python3.10/site-packages/retrying.py", line 266, in call
raise attempt.get()
File "/home/cassandra/.venv/lib/python3.10/site-packages/retrying.py", line 301, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/home/cassandra/.venv/lib/python3.10/site-packages/six.py", line 719, in reraise
raise value
File "/home/cassandra/.venv/lib/python3.10/site-packages/retrying.py", line 251, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/home/cassandra/medusa/storage/abstract_storage.py", line 72, in list_objects
objects = self.list_blobs(prefix=path)
File "/home/cassandra/medusa/storage/abstract_storage.py", line 80, in list_blobs
objects = loop.run_until_complete(self._list_blobs(prefix))
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/cassandra/medusa/storage/s3_base_storage.py", line 250, in _list_blobs
).build_full_result()
File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/paginate.py", line 479, in build_full_result
for response in self:
File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/paginate.py", line 269, in __iter__
response = self._make_request(current_kwargs)
File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/paginate.py", line 357, in _make_request
return self._method(**current_kwargs)
File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/client.py", line 535, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/client.py", line 980, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 682, in grpc._cython.cygrpc._handle_exceptions
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 802, in _handle_rpc
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 547, in _handle_unary_unary_rpc
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 411, in _finish_handler_with_unary_response
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/cassandra/medusa/service/grpc/server.py", line 237, in GetBackups
response.status = medusa_pb2.StatusType.UNKNOWN
AttributeError: Protocol message GetBackupsResponse has no "status" field.
k8ssandra-operator logs;
2024-08-06T01:10:41.265Z ERROR Failed to prepare restore {"controller": "medusarestorejob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaRestoreJob", "MedusaRestoreJob": {"name":"test-restore-backup-1","namespace":"cassandra"}, "namespace": "cassandra", "name": "test-restore-backup-1", "reconcileID": "2c51112c-5290-4a05-b73f-d7051d09f106", "medusarestorejob": "cassandra/test-restore-backup-1", "error": "failed to get backups: rpc error: code = Internal desc = Unexpected <class 'AttributeError'>: Protocol message GetBackupsResponse has no \"status\" field."}
github.com/k8ssandra/k8ssandra-operator/controllers/medusa.(*MedusaRestoreJobReconciler).Reconcile
/workspace/controllers/medusa/medusarestorejob_controller.go:124
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235
2024-08-06T01:10:41.265Z ERROR Reconciler error {"controller": "medusarestorejob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaRestoreJob", "MedusaRestoreJob": {"name":"test-restore-backup-1","namespace":"cassandra"}, "namespace": "cassandra", "name": "test-restore-backup-1", "reconcileID": "2c51112c-5290-4a05-b73f-d7051d09f106", "error": "failed to get backups: rpc error: code = Internal desc = Unexpected <class 'AttributeError'>: Protocol message GetBackupsResponse has no \"status\" field."}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235
is there something configuration I missed? How can I fix this issue?
Many thanks