K8ssandra cluster horizontal autoscaling

I have tried 3-4 different ways to use HPA on my cluster deployment. It shows up in Kubernetes as a statefulset rather than a deployment so I have been going that route. I have several metric servers and they all respond fine to kubeclt top commands and raw requests but no matter what I try it alway comes back with the same error

k8ssandra 12s Warning FailedComputeMetricsReplicas horizontalpodautoscaler/keda-hpa-k8ssandra invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu
k8ssandra 12s Warning FailedGetResourceMetric horizontalpodautoscaler/keda-hpa-k8ssandra failed to get cpu utilization: missing request for cpu

I’ve tried with Kubernetes and Keda. Using Helm and standalone yamls

Is there a guide somewhere thats is specific to this deployment ? anyone with any good experience with getting this set up ?

helm install -f k8ssandra-value.yaml k8ssandra k8ssandra/k8ssandra --namespace k8ssandra
Here is my helm value file
cassandra:
version: “4.0.3”
cassandraLibDirVolume:
storageClass: kops-ssd-1-17
size: 5Gi
allowMultipleNodesPerWorker: true
heap:
size: 1G
newGenSize: 1G
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 2000m
memory: 5Gi
datacenters:

  • name: dc1
    size: 3
    averageCpuUtilization: 50
    medusa:
    enabled: true
    bucketName: k8ssandra-bucket-dev
    storageSecret: medusa-bucket-key
    storage: s3
    storage_properties:
    region: ca-central-1
    racks:
    • name: default
      ingress:
      enabled: true
      kube-prometheus-stack:
      grafana:
      adminUser: admin
      adminPassword: CrInkleB0rk!
      stargate:
      enabled: true
      replicas: 1
      heapMB: 500
      cpuReqMillicores: 100
      cpuLimMillicores: 1000
1 Like

Added namespace to the hpa.yaml and it took off like a rabbit. added stress test and it spun up the max amount of pods, then cassandra stopped taking connections as the endpoint disappeared and my reaper pod went into backout state and eventually stopped. So I will assume that the cass operator has no idea what was going on and stopped working as well, hence no end point

I attempted a simple scale command to add 1 pod and had the same results. Begs the question, do I have to use helm upgrade after adding nodes in the value file when I get cpu alerts or is there a way to automate this so I can sleep at night ?

1 Like

btw, here is my hpa.yaml file
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
namespace: k8ssandra
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: statefulset
name: k8ssandra-dc1-default-sts
minReplicas: 3
maxReplicas: 12
metrics:

  • type: Resource
    resource:
    name: cpu
    targetAverageUtilization: 50
1 Like

I am also looking for the answer!

@waterskibum Have you managed to make it work?

We do not support horizontal autoscaling for Cassandra pods. Scaling a Cassandra cluster can take a lot of time for each node as data needs to stream to the new ones (depending on the data size, we’re talking about many hours to start a single new node). Adding/removing nodes too fast can have implications on data availability as token movements are involved and shouldn’t be dealt with automatically based on cpu metrics IMO.