Originally published at: Setting Up a Kubernetes CronJob to Create Medusa Backups - K8ssandra, Apache Cassandra® on Kubernetes
If you’ve been following the awesome workshop on the K8ssandra repository or just deployed your first cluster, you are probably wondering: How am I going to automate helm commands to run daily backups?
The good news is: you don’t have to.
As the title already mentions, we are going to use a Kubernetes CronJob to create the Medusa backup for us.
ServiceAccount / ClusterRole / ClusterRoleBinding
In order to do this, we will use this neat trick to directly access the API server from within a pod, using kubectl
to apply a cassandraBackup
manifest.
We first need to create a custom
, ServiceAccount
ClusterRole
and
for the pod that will be created by our CronJob, so let’s do that first.ClusterRoleBinding
Create a file called medusa-backup-sa.yaml
, and paste the following manifest:
apiVersion: v1 kind: ServiceAccount metadata: name: medusa-backup namespace: cass-operator --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: medusa-backup-role rules: - apiGroups: ["cassandra.k8ssandra.io"] resources: ["cassandrabackups"] verbs: ["create", "get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: medusa-backup-rolebinding subjects: - kind: ServiceAccount name: medusa-backup namespace: cass-operator roleRef: kind: ClusterRole name: medusa-backup-role apiGroup: rbac.authorization.k8s.io
Change metadata.name
and metadata.namespace
accordingly to suit your needs, save and apply:
kubectl apply -f medusa-backup-sa.yaml
The clusterRole
is only giving access to the cassandra.k8ssandra.io/cassandrabackups
API resource with enough permissions to create, get and list backups.
We are going to set the schedule for the cron job to every day of the month at 1:20 AM, but you can set the schedule to anything that fits your environment.
I usually like to use Crontab.guru for this, because I just can’t remember the cron syntax.
The cronJob itself will spawn a pod to execute the job that creates a backup called medusa-daily-yyyymmddhhmmss
, for example, medusa-daily-20211007012059
.
The cassandraBackup
manifest will be printed to stdout
using printf
to convert our \n
to new lines, and then piped to kubectl
to apply the manifest from stdin
.
The full command that will be executed from the pod is:
printf "apiVersion: cassandra.k8ssandra.io/v1alpha1\nkind: CassandraBackup\nmetadata:\n name: medusa-daily-timestamp\n namespace: cass-operator\nspec:\n name: medusa-daily-timestamp\n cassandraDatacenter: dc1" | sed "s/timestamp/$(date +%Y%m%d%H%M%S)/g" | kubectl apply -f -
Medusa backup cronJob
Before applying the manifest below, verify the cassandraDatacenter
value for the manifest. In this example we are using dc1
, but yours may differ. Same goes for metadata.name
and metadata.namespace
.
When ready, save the manifest below as medusa-cronjob.yaml
and deploy it with kubectl apply -f medusa-cronjob.yaml
:
apiVersion: batch/v1beta1 kind: CronJob metadata: name: k8ssandra-medusa-backup namespace: cass-operator spec: schedule: "20 1 */1 * *" jobTemplate: spec: template: metadata: name: k8ssandra-medusa-backup spec: serviceAccountName: medusa-backup containers: - name: medusa-backup-cronjob image: bitnami/kubectl:1.17.3 imagePullPolicy: IfNotPresent command: - 'bin/bash' - '-c' - 'printf "apiVersion: cassandra.k8ssandra.io/v1alpha1\nkind: CassandraBackup\nmetadata:\n name: medusa-daily-timestamp\n namespace: cass-operator\nspec:\n name: medusa-daily-timestamp\n cassandraDatacenter: dc1" | sed "s/timestamp/$(date +%Y%m%d%H%M%S)/g" | kubectl apply -f -' restartPolicy: OnFailure
Validation
Let’s analyze what happened when the k8ssandra-medusa-backup
CronJob was triggered (note that the following is just a small piece of the complete describe
command, not the entire result):
~> kubectl get cronjob NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE k8ssandra-medusa-backup 20 1 */1 * * False 0 14h 14h~> kubectl get job
NAME COMPLETIONS DURATION AGE
k8ssandra-medusa-backup-1633569600 1/1 6s 14h~> kubectl logs job/k8ssandra-medusa-backup-1633569600
cassandrabackup.cassandra.k8ssandra.io/medusa-daily-20211007012059 created~> kubectl get cassandrabackups
NAME AGE
medusa-daily-20211007012059 14houtput from backup
~> kubectl describe cassandrabackup/medusa-daily-20211007012059
—snippet—
Finish Time: 2021-10-07T01:21:08Z
Finished:
dev-k8ssandra-dc1-us-east-1c-sts-0
dev-k8ssandra-dc1-us-east-1b-sts-0
dev-k8ssandra-dc1-us-east-1a-sts-0
Start Time: 2021-10-07T01:21:00Z
—snippet—
And there you go! I hope this post has helped you understand the capabilities of CronJobs and running restricted kubectl
commands from a pod.
Let us know what you think of this approach to creating your backups by joining us on the K8ssandra Discord or K8ssandra Forum today. For exclusive posts on all things data, follow DataStax on Medium.
Resources
- DataStax Apache Cassandra on Kubernetes Workshop on GitHub
- Kubernetes Documentation: CronJobs
- Apache Cassandra Backup and Restore Tool on GitHub
- Kubernetes Documentation: Directly Accessing the Rest API
- Chronitor ChronTab Guru Editor
- NetIQ Documentation: Understanding Cron Syntax in the Job Scheduler