Cannot ALTER KEYSPACE when Stargate is enabled

I did some test to setup the multi-cluster with k8ssandra helm release 1.3.0.
If stargate is disabled, after following the whole steps in this document, I am able to setup the inter-cluster HA.
If stargate is enabled, I met the following error when execute the “alter keyspace” commands on dc1.
Error:

ALTER KEYSPACE system_auth
  WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3};
ConfigurationException: Cannot alter RF while some endpoints
  are not in normal state (no range movements):
  [/10.151.29.188:7000, /10.151.29.107:7000, /10.151.25.144:7000, /10.151.29.129:7000, /10.151.25.166:7000]

The 10.151.25.x is the node IPs of stargate pods in primary cluster of dc1.

k8ssandra-dc1-stargate-5474f54959-6kbpm               1/1     Running   0          65m   10.151.25.144   aks-casspool-37043819-vmss000001   <none>           <none>
k8ssandra-dc1-stargate-5474f54959-kl472               1/1     Running   0          65m   10.151.25.109   aks-casspool-37043819-vmss000000   <none>           <none>
k8ssandra-dc1-stargate-5474f54959-kqb49               1/1     Running   0          65m   10.151.25.166   aks-casspool-37043819-vmss000002   <none>           <none>

The 10.151.29.x is the node IPs of stargate pods in secondary cluster of dc2.

How could this happen? Thanks!

1 Like

Thanks for the question, @yonglinh !

I’ve moved it into its own topic and will get the K8ssandra engineers to respond. Cheers! :beers:

Hi @yonglinh

Can you share logs from your Stargate pods?

Referring k8ssandra/stargate.yaml at main · k8ssandra/k8ssandra · GitHub

Does that mean stargate still can not support customized rack name? In reaper UI, I can see the the stargate endpoint is not available. Is it caused by the wrong rack name “dc2/default”?

In my k8ssandra.yaml, the are three racks defined as:

cassandra:
  # Version of Apache Cassandra to deploy
  version: 4.0.0

  # Configuration for the /var/lib/cassandra mount point
  cassandraLibDirVolume:
    # Azure provides this storage class on EKS clusters out of the box. Note we
    # are using `managed-premium` here as it has `volumeBindingMode:
    # WaitForFirstConsumer` which is important during scheduling.
    storageClass: managed-premium

    # The recommended live data size is 1 - 1.5 TB. A 2 TB volume supports this
    # much data along with room for compactions.
    size: 2048Gi

  additionalSeeds: []

  auth:
    superuser:
      secret: ${k8ssandra_superuser_secret}

  heap:
    size: 8G
    newGenSize: 8G

  resources:
    requests:
      cpu: 4000m
      memory: 32Gi
    limits:
      cpu: 4000m
      memory: 32Gi

  # This key defines the logical topology of your cluster. The rack names and
  # labels should be updated to reflect the Availability Zones where your GKE
  # cluster is deployed.
  datacenters:
  - name: dc2
    size: 3
    racks:
    - name: eastus2-1
      affinityLabels:
        agentpool: casspool
        topology.kubernetes.io/zone: eastus2-1
    - name: eastus2-2
      affinityLabels:
        agentpool: casspool
        topology.kubernetes.io/zone: eastus2-2
    - name: eastus2-3
      affinityLabels:
        agentpool: casspool
        topology.kubernetes.io/zone: eastus2-3

reaper:
  enabled: true
  autoschedule: true
  autoschedule_properties:
    initialDelayPeriod: PT15S
    periodBetweenPolls: PT10M
    timeBeforeFirstSchedule: PT5M
    scheduleSpreadPeriod: PT6H
    excludedKeyspaces: []
    excludedClusters: []

stargate:
  enabled: true
  replicas: 3 
  heapMB: 512
  cpuReqMillicores: 1000
  cpuLimMillicores: 1000

medusa:
  enabled: true
  multiTenant: true
  storage: azure_blobs
  bucketName: ${backup_blob_name}
  storageSecret: ${medusa_bucket_key}
  
kube-prometheus-stack:
  grafana:
    adminUser: ${k8ssandra_grafana_username}
    adminPassword: ${k8ssandra_grafana_pwd}

@jsanda @ErickRamirezAU

Does that mean stargate still can not support customized rack name?

Correct. I will create an issue to make racks configurable for Stargate.

In reaper UI, I can see the the stargate endpoint is not available. Is it caused by the wrong rack name “dc2/default”?

It may be, but I am not certain. I am going to investigate and do some testing.

1 Like

I tested with and without racks as well as with and without auth. I was not able to change replication of keyspaces. Then I tested with Cassandra 3.11.10, and it worked fine. I went ahead and created Cannot alter replication of keyspaces in Cassandra 4.0 cluster when Stargate is enabled · Issue #1026 · k8ssandra/k8ssandra · GitHub.

1 Like

I created Make rack name for Stargate configurable · Issue #1028 · k8ssandra/k8ssandra · GitHub to make the rack name configurable for Stargate. As I mentioned in the issue, full rack support for Stargate will be addressed in the the k8ssandra-operator project.

1 Like

Thank you for following up this! When will the two fixes be ready?
@jsanda

We are doing some backlog grooming now. They might make into the 1.4.0 release. Stay tuned!