I am analyzing k8ssandra and noticed that Cassandra Rack topology is implemented by defining NodeAffinityLabels in cassdc.yaml and it uses topology.kubernetes.io/zone k8s label to assign pod to nodes.
I am an engineer working on k8ssandra. I’ll try to answer your questions. First though I should point out that you can use arbitrary labels for your racks. When you declare the racks in your manifest it would look like this:
The affinityLabels property accepts a map of arbitrary keys/values.
Now for your questions…
In case of Rack, why community did not go with TopologySpreadConstraint?
The rack topology stuff was implemented before I got involved with the project but I think it due to the feature being developed before TopologySpreadConstraint was available.
What problem do you see if we use TopologySpreadConstraint to implement Rack Topology?
I don’t know. We would need to better understand how behavior would change.
If Node has enough resources(CPU, Memory, Disk), can we deploy multiple pods on same nodes?
Yes. In the CassandraDatacenter spec set allowMultipleNodesPerWorker: true allow multiple Cassandra pods to be scheduled onto the same k8s worker node.
As per the logic written Cass-Operator, in function SplitRacks in file cassandradatacenter_types.go, it will distribute pods unevenly if Node=7, Rack=3, and RF=3 so Topology value will be [3, 2, 2].
Will community consider using Topologyspreadconstraint instead of Affinity in future releases?
What’s your recommendation for Node, Racks & RF? I mean is there any relationship between them when we have multiple Racks.
Example:
The number of racks should match the replication factor in the Keyspaces.
Number of pods should be a multiple of Racks count?
cass-operator makes a best effort to create evenly distributed racks. In your example, it is not possible since the node count is not a multiple of the rack count.
Will community consider using Topologyspreadconstraint instead of Affinity in future releases?
Sure it is definitely possible. My suggestion is to keep the discussion going here on the forum. It would be great if you could provide some details and examples of how TopologySpreadConstraints differ from node affinity. Hopefully some sort of design doc could materialize from the discussion. With a design in hand, we can scope out work and get some tickets created.
I think k8s has documented it well. Sharing the k8s link Here
We could define 2 constraints under spec.TopologySpreadConstraint in statefuleset yaml like this:
Add Zone (Rack) Constraint:
- Set maxSkew to 1
- Set topologyKey to Zone
- Set whenUnsatisfiable to DoNotSchedule
- Set labelSelector as per user define labels
Add Node Constraint:
- Set maxSkew to 1
- Set topologyKey to Node
- Set whenUnsatisfiable to DoNotSchedule
- Set labelSelector as per user define labels
NOTE . Affinity/AntiAffinity can be used with TopologySpreadConstraint.
Just to note if you are using cass-operator with the ValidatingAdmissionWebhook enabled CRDs with a size that is not a multiple of the total rack count will be rejected. Assuming keyspace replication factor equals the rack count then an entire copy of the data will exist within each rack. I have personally experienced issues with this type of deployment when the number of nodes per rack is not consistent.