In Rack topology, Why Affinity Rules are preferred over TopologySpreadConstraint?

pushpendra_rajpoot · August 10, 2021, 3:43am

Hi Team,

I am analyzing k8ssandra and noticed that Cassandra Rack topology is implemented by defining NodeAffinityLabels in cassdc.yaml and it uses topology.kubernetes.io/zone k8s label to assign pod to nodes.

When I was searching on internet, I came across 2 posts:
User Stories & TopologySpreadConstraint vs Affinity.

In User Stories, Issue 2 talked about even distribution of pods across multiple Availability Zone.

Now coming to any questions:

In case of Rack, why community did not go with TopologySpreadConstraint?
What problem do you see if we use TopologySpreadConstraint to implement Rack Topology?
If Node has enough resources(CPU, Memory, Disk), can we deploy multiple pods on same nodes? If not, WHY?

Thank you in advance.

ErickRamirezAU · August 10, 2021, 11:12pm

That’s a good question! Let me reach out to the engineers here and will get them to respond to you. Cheers!

jsanda · August 11, 2021, 3:15am

Hi @pushpendra_rajpoot

I am an engineer working on k8ssandra. I’ll try to answer your questions. First though I should point out that you can use arbitrary labels for your racks. When you declare the racks in your manifest it would look like this:

racks:
- name: r1
  affinityLabels:
    topology.kubernetes.io/zone: us-east1-b
    env: qa
- name: r2
  affinityLabels:
    topology.kubernetes.io/zone: us-east1-a
    env: qa
- name: r3
  affinityLabels:
    topology.kubernetes.io/zone: us-east1-c
    env: qa

The affinityLabels property accepts a map of arbitrary keys/values.

Now for your questions…

In case of Rack, why community did not go with TopologySpreadConstraint?

The rack topology stuff was implemented before I got involved with the project but I think it due to the feature being developed before TopologySpreadConstraint was available.

What problem do you see if we use TopologySpreadConstraint to implement Rack Topology?

I don’t know. We would need to better understand how behavior would change.

If Node has enough resources(CPU, Memory, Disk), can we deploy multiple pods on same nodes?

Yes. In the CassandraDatacenter spec set allowMultipleNodesPerWorker: true allow multiple Cassandra pods to be scheduled onto the same k8s worker node.

Cheers

John

pushpendra_rajpoot · August 11, 2021, 3:41am

Thank you @jsanda for quick reply.

As per the logic written Cass-Operator, in function SplitRacks in file cassandradatacenter_types.go, it will distribute pods unevenly if Node=7, Rack=3, and RF=3 so Topology value will be [3, 2, 2].

Will community consider using Topologyspreadconstraint instead of Affinity in future releases?
What’s your recommendation for Node, Racks & RF? I mean is there any relationship between them when we have multiple Racks.

Example:

The number of racks should match the replication factor in the Keyspaces.
Number of pods should be a multiple of Racks count?

Thanks.

jsanda · August 11, 2021, 4:06am

cass-operator makes a best effort to create evenly distributed racks. In your example, it is not possible since the node count is not a multiple of the rack count.

Will community consider using Topologyspreadconstraint instead of Affinity in future releases?

Sure it is definitely possible. My suggestion is to keep the discussion going here on the forum. It would be great if you could provide some details and examples of how TopologySpreadConstraints differ from node affinity. Hopefully some sort of design doc could materialize from the discussion. With a design in hand, we can scope out work and get some tickets created.

pushpendra_rajpoot · August 11, 2021, 4:33am

I think k8s has documented it well. Sharing the k8s link Here

We could define 2 constraints under spec.TopologySpreadConstraint in statefuleset yaml like this:

Add Zone (Rack) Constraint:
- Set maxSkew to 1
- Set topologyKey to Zone
- Set whenUnsatisfiable to DoNotSchedule
- Set labelSelector as per user define labels
Add Node Constraint:
- Set maxSkew to 1
- Set topologyKey to Node
- Set whenUnsatisfiable to DoNotSchedule
- Set labelSelector as per user define labels

NOTE . Affinity/AntiAffinity can be used with TopologySpreadConstraint.

Let me know if you need more info.

ErickRamirezAU · August 11, 2021, 8:59am

@pushpendra_rajpoot would you mind logging a feature request ticket so we at least have it on the board so it doesn’t get forgotten? Cheers!

pushpendra_rajpoot · August 11, 2021, 10:36am

Create a ticket in ‘Future Sprint’ (Backlog) under K8SSAND-506 - Pod Scheduling Control

K8SSAND-788 - In Rack Topology, Replace Affinity Rules with TopologySpreadConstraint

Let me know if I mapped it in wrong category/field or any change required in it.

Thanks.

ErickRamirezAU · August 11, 2021, 11:44am

Thanks! Much appreciated!

bradfordcp · August 11, 2021, 6:56pm

Just to note if you are using cass-operator with the ValidatingAdmissionWebhook enabled CRDs with a size that is not a multiple of the total rack count will be rejected. Assuming keyspace replication factor equals the rack count then an entire copy of the data will exist within each rack. I have personally experienced issues with this type of deployment when the number of nodes per rack is not consistent.

Topic		Replies	Views
nodeAffinity does not schedule pods on the required workers K8ssandra help	5	617	May 12, 2021
K8ssandra v1.2.0 and Cass Operator v1.7.1 Releases Welcome and Announcements	0	801	April 29, 2021
Deploy a multi-datacenter Apache Cassandra cluster in Kubernetes	6	464	February 6, 2023
Requirements for running K8ssandra for development K8ssandra help	0	639	April 29, 2021
Stargate container failed with pod anti affinity rule K8ssandra help	0	198	October 19, 2023

K8ssandra

In Rack topology, Why Affinity Rules are preferred over TopologySpreadConstraint?

Related topics