Setting Up a Kubernetes CronJob to Create Medusa Backups

Originally published at: Setting Up a Kubernetes CronJob to Create Medusa Backups - K8ssandra, Apache Cassandra® on Kubernetes

If you’ve been following the awesome workshop on the K8ssandra repository or just deployed your first cluster, you are probably wondering: How am I going to automate helm commands to run daily backups?

The good news is: you don’t have to.

As the title already mentions, we are going to use a Kubernetes CronJob to create the Medusa backup for us.

ServiceAccount / ClusterRole / ClusterRoleBinding

In order to do this, we will use this neat trick to directly access the API server from within a pod, using kubectl to apply a cassandraBackup manifest.

We first need to create a custom ServiceAccount, ClusterRole and ClusterRoleBinding for the pod that will be created by our CronJob, so let’s do that first.

Create a file called medusa-backup-sa.yaml, and paste the following manifest:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: medusa-backup
  namespace: cass-operator
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: medusa-backup-role
rules:
  - apiGroups: ["cassandra.k8ssandra.io"]
    resources: ["cassandrabackups"]
    verbs: ["create", "get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: medusa-backup-rolebinding
subjects:
  - kind: ServiceAccount
    name: medusa-backup
    namespace: cass-operator
roleRef:
  kind: ClusterRole
  name: medusa-backup-role
  apiGroup: rbac.authorization.k8s.io

Change metadata.name and metadata.namespace accordingly to suit your needs, save and apply:

kubectl apply -f medusa-backup-sa.yaml

The clusterRole is only giving access to the cassandra.k8ssandra.io/cassandrabackups API resource with enough permissions to create, get and list backups.

We are going to set the schedule for the cron job to every day of the month at 1:20 AM, but you can set the schedule to anything that fits your environment.

I usually like to use Crontab.guru for this, because I just can’t remember the cron syntax.

The cronJob itself will spawn a pod to execute the job that creates a backup called medusa-daily-yyyymmddhhmmss, for example, medusa-daily-20211007012059.

The cassandraBackup manifest will be printed to stdout using printf to convert our \n to new lines, and then piped to kubectl to apply the manifest from stdin.

The full command that will be executed from the pod is:

printf "apiVersion: cassandra.k8ssandra.io/v1alpha1\nkind: 
CassandraBackup\nmetadata:\n  name: medusa-daily-timestamp\n  
namespace: cass-operator\nspec:\n  name: medusa-daily-timestamp\n  
cassandraDatacenter: dc1" | sed "s/timestamp/$(date +%Y%m%d%H%M%S)/g" 
| kubectl apply -f -

Medusa backup cronJob

Before applying the manifest below, verify the cassandraDatacenter value for the manifest. In this example we are using dc1, but yours may differ. Same goes for metadata.name and metadata.namespace.

When ready, save the manifest below as medusa-cronjob.yaml and deploy it with kubectl apply -f medusa-cronjob.yaml:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: k8ssandra-medusa-backup
  namespace: cass-operator
spec:
  schedule: "20 1 */1 * *"
  jobTemplate:
    spec:
      template:
        metadata:
          name: k8ssandra-medusa-backup
        spec:
          serviceAccountName: medusa-backup
          containers:
          - name: medusa-backup-cronjob
            image: bitnami/kubectl:1.17.3
            imagePullPolicy: IfNotPresent
            command:
             - 'bin/bash'
             - '-c'
             - 'printf "apiVersion: 
cassandra.k8ssandra.io/v1alpha1\nkind: CassandraBackup\nmetadata:\n  name: medusa-daily-timestamp\n  namespace: cass-operator\nspec:\n  name: medusa-daily-timestamp\n  cassandraDatacenter: dc1" | sed "s/timestamp/$(date +%Y%m%d%H%M%S)/g" | kubectl apply -f -'
          restartPolicy: OnFailure

Validation

Let’s analyze what happened when the k8ssandra-medusa-backup CronJob was triggered (note that the following is just a small piece of the complete describe command, not the entire result):

~> kubectl get cronjob
NAME                      SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
k8ssandra-medusa-backup   20 1 */1 * *   False     0        14h             14h

~> kubectl get job
NAME COMPLETIONS DURATION AGE
k8ssandra-medusa-backup-1633569600 1/1 6s 14h

~> kubectl logs job/k8ssandra-medusa-backup-1633569600
cassandrabackup.cassandra.k8ssandra.io/medusa-daily-20211007012059 created

~> kubectl get cassandrabackups
NAME AGE
medusa-daily-20211007012059 14h

output from backup

~> kubectl describe cassandrabackup/medusa-daily-20211007012059
—snippet—
Finish Time: 2021-10-07T01:21:08Z
Finished:
dev-k8ssandra-dc1-us-east-1c-sts-0
dev-k8ssandra-dc1-us-east-1b-sts-0
dev-k8ssandra-dc1-us-east-1a-sts-0
Start Time: 2021-10-07T01:21:00Z
—snippet—

And there you go! I hope this post has helped you understand the capabilities of CronJobs and running restricted kubectl commands from a pod.

Let us know what you think of this approach to creating your backups by joining us on the K8ssandra Discord or K8ssandra Forum today. For exclusive posts on all things data, follow DataStax on Medium.

Resources

  1. DataStax Apache Cassandra on Kubernetes Workshop on GitHub
  2. Kubernetes Documentation: CronJobs
  3. Apache Cassandra Backup and Restore Tool on GitHub
  4. Kubernetes Documentation: Directly Accessing the Rest API
  5. Chronitor ChronTab Guru Editor
  6. NetIQ Documentation: Understanding Cron Syntax in the Job Scheduler