In Rack topology, Why Affinity Rules are preferred over TopologySpreadConstraint?

Hi Team,

I am analyzing k8ssandra and noticed that Cassandra Rack topology is implemented by defining NodeAffinityLabels in cassdc.yaml and it uses topology.kubernetes.io/zone k8s label to assign pod to nodes.

When I was searching on internet, I came across 2 posts:
User Stories & TopologySpreadConstraint vs Affinity.

In User Stories, Issue 2 talked about even distribution of pods across multiple Availability Zone.

Now coming to any questions:

  1. In case of Rack, why community did not go with TopologySpreadConstraint?
  2. What problem do you see if we use TopologySpreadConstraint to implement Rack Topology?
  3. If Node has enough resources(CPU, Memory, Disk), can we deploy multiple pods on same nodes? If not, WHY?

Thank you in advance.

2 Likes

That’s a good question! Let me reach out to the engineers here and will get them to respond to you. Cheers! :beers:

Hi @pushpendra_rajpoot

I am an engineer working on k8ssandra. I’ll try to answer your questions. First though I should point out that you can use arbitrary labels for your racks. When you declare the racks in your manifest it would look like this:

racks:
- name: r1
  affinityLabels:
    topology.kubernetes.io/zone: us-east1-b
    env: qa
- name: r2
  affinityLabels:
    topology.kubernetes.io/zone: us-east1-a
    env: qa
- name: r3
  affinityLabels:
    topology.kubernetes.io/zone: us-east1-c
    env: qa

The affinityLabels property accepts a map of arbitrary keys/values.

Now for your questions…

In case of Rack, why community did not go with TopologySpreadConstraint?

The rack topology stuff was implemented before I got involved with the project but I think it due to the feature being developed before TopologySpreadConstraint was available.

What problem do you see if we use TopologySpreadConstraint to implement Rack Topology?

I don’t know. We would need to better understand how behavior would change.

If Node has enough resources(CPU, Memory, Disk), can we deploy multiple pods on same nodes?

Yes. In the CassandraDatacenter spec set allowMultipleNodesPerWorker: true allow multiple Cassandra pods to be scheduled onto the same k8s worker node.

Cheers

John

Thank you @jsanda for quick reply.

As per the logic written Cass-Operator, in function SplitRacks in file cassandradatacenter_types.go, it will distribute pods unevenly if Node=7, Rack=3, and RF=3 so Topology value will be [3, 2, 2].

  • Will community consider using Topologyspreadconstraint instead of Affinity in future releases?
  • What’s your recommendation for Node, Racks & RF? I mean is there any relationship between them when we have multiple Racks.

Example:

  1. The number of racks should match the replication factor in the Keyspaces.
  2. Number of pods should be a multiple of Racks count?

Thanks.

cass-operator makes a best effort to create evenly distributed racks. In your example, it is not possible since the node count is not a multiple of the rack count.

Will community consider using Topologyspreadconstraint instead of Affinity in future releases?

Sure it is definitely possible. My suggestion is to keep the discussion going here on the forum. It would be great if you could provide some details and examples of how TopologySpreadConstraints differ from node affinity. Hopefully some sort of design doc could materialize from the discussion. With a design in hand, we can scope out work and get some tickets created.

I think k8s has documented it well. Sharing the k8s link Here

We could define 2 constraints under spec.TopologySpreadConstraint in statefuleset yaml like this:

  • Add Zone (Rack) Constraint:
    - Set maxSkew to 1
    - Set topologyKey to Zone
    - Set whenUnsatisfiable to DoNotSchedule
    - Set labelSelector as per user define labels
  • Add Node Constraint:
    - Set maxSkew to 1
    - Set topologyKey to Node
    - Set whenUnsatisfiable to DoNotSchedule
    - Set labelSelector as per user define labels

NOTE . Affinity/AntiAffinity can be used with TopologySpreadConstraint.

Let me know if you need more info.

1 Like

@pushpendra_rajpoot would you mind logging a feature request ticket so we at least have it on the board so it doesn’t get forgotten? Cheers! :beers:

Create a ticket in ‘Future Sprint’ (Backlog) under K8SSAND-506 - Pod Scheduling Control

K8SSAND-788 - In Rack Topology, Replace Affinity Rules with TopologySpreadConstraint

Let me know if I mapped it in wrong category/field or any change required in it.

Thanks.

1 Like

Thanks! Much appreciated! :beers:

Just to note if you are using cass-operator with the ValidatingAdmissionWebhook enabled CRDs with a size that is not a multiple of the total rack count will be rejected. Assuming keyspace replication factor equals the rack count then an entire copy of the data will exist within each rack. I have personally experienced issues with this type of deployment when the number of nodes per rack is not consistent.

1 Like