K8ssandra-operator and helm charts

k8ssandra 2.0 will be based on k8ssandra-operator. We will support Helm as one of the ways to install the operator. There are some questions we need to address with respect to users moving from 1.x to 2.0.

  • Do we support upgrades, migrations, both?
  • How will upgrades be done?

https://github.com/k8ssandra/k8ssandra/pull/1080 added a chart for k8ssandra-operator. The PR also made the new k8ssandra-operator chart a dependency of the k8ssandra chart. I think the idea is to eventually change the k8ssandra chart such that it installs k8ssandra-operator. This makes sense from an upgrade perspective. As a user who has installed k8ssandra 1.x, it makes sense that I would do a helm upgrade to upgrade to k8ssandra 2.0. It is arguably less intuitive for me to upgrade by doing a helm install of the k8ssandra-operator chart.

The k8ssandra-operator chart has as its only dependency k8ssandra-common. I think that this is wrong. The chart should also depend on the cass-operator chart; otherwise, helm install won’t work.

For 2.0, the k8ssandra chart should just depend on the k8ssandra-common and k8ssandra-operator charts. All of the other dependencies are for 1.x and should be removed in 2.0. It is a fair question to ask whether we need both charts. Couldn’t we just put everything in the k8ssandra chart? The answer is yes, but by having a separate k8ssandra-operator chart we can more easily support development of both 1.x and 2.x which I think we may have to do for a bit.

The user will do a helm upgrade for the 2.0 k8ssandra chart which will perform the necessary migration steps which includes:

  • Install k8ssandra-operator
  • Create a K8ssandraCluster
    • Add the helm.sh/resource-policy: keep annotation so that Helm does not manage the object. Helm will ignore it on upgrade, uninstall, and rollback.
    • Includes the existing CassandraDatacenter
    • If there is a Stargate deployment, create a Stargate object and add it to the K8ssandraCluster
    • If there is a Reaper object, add it to the K8ssandraCluster (note that Reaper is not yet supported in k8ssandra-operator)
  • Upgrade cass-operator
  • Remove medusa-operator Deployment
  • Remove reaper-operator Deployment

Upgrading from 1.x to 2.x is the only time the k8ssandra chart should create a K8ssandraCluster object. We are not going to manage K8ssandraCluster objects via Helm.

It also needs to add resource-policy: keep to the existing CassandraDatacenter and create a similar K8ssandraCluster which would end up with a similar CassandraDatacenter (and fix annotations / labels also). That way there’s no hiccups to the running instance (we don’t want unnecessary restarts). As for Reaper / Medusa, we might go with just recreating them if that makes it easier (it shouldn’t impact anything really).

I’m not sure why k8ssandra-operator couldn’t be the one installing only k8ssandra-operator, it’s a single subchart just like cass-operator is these days. We can still keep the charts/k8ssandra as the binding glue which takes care of the dependencies. Maybe the user wants to only install k8ssandra-operator for Stargate and nothing else (we do support such at the moment). It’s not like our reaper-operator / medusa-operator enforce cass-operator dependency either, so this would be a change from our current approach.

Our big issue is of course the fact that we can’t have the 2.0 as Helm chart simultaneously with 1.x series since Helm does not understand such structure sanely (without creating additional repos). As such, if we want k8ssandra/k8ssandra to install also k8ssandra-operator, it needs to be a subchart for now. But subcharts can come and go later, that’s not a big deal.

It also needs to add resource-policy: keep to the existing CassandraDatacenter and create a similar K8ssandraCluster which would end up with a similar CassandraDatacenter (and fix annotations / labels also). That way there’s no hiccups to the running instance (we don’t want unnecessary restarts). As for Reaper / Medusa, we might go with just recreating them if that makes it easier (it shouldn’t impact anything really).

We definitely need to add the resource-policy annotation to the CassandraDatacenter. Thanks for pointing that out. I agree that we should be fine with recreating Reaper and Stargate. Medusa is deployed as part of the CassandraDatacenter so nothing special there should need to be done.

I’m not sure why k8ssandra-operator couldn’t be the one installing only k8ssandra-operator, it’s a single subchart just like cass-operator is these days. We can still keep the charts/k8ssandra as the binding glue which takes care of the dependencies. Maybe the user wants to only install k8ssandra-operator for Stargate and nothing else (we do support such at the moment). It’s not like our reaper-operator / medusa-operator enforce cass-operator dependency either, so this would be a change from our current approach.

Fair points but as you noted this complicates simultaneously having 1.x and 2.0 charts. If we add the dependencies in the k8ssandra-operator chart, then it becomes much easier to support.

Without any doubt we’ll want to make a decision that makes maintaining a 1.x line at the same time as a 2.x line easier, I think we can say with pretty high certainty that we’ll need to do that for some amount of time.

A guess a question to be asked is, would anyone ever want to do an install of the operator alone? Would/should

helm install k8ssandra/k8ssandra-operator

be a thing?

That question is perhaps complicated by a question about the right now. I think more what I’d like to do is make sure we preserve our flexibility in the k8ssandra/k8ssandra chart for later when we go to really dig into the upgrade/migration topic. We’re starting to think about those things, but I think there’s more research and prototyping even to come to a firm conclusion about the best way to do that.

So, in the meantime, what’s the safest way to preserve that future from breaking changes? Is that to instead support

helm install k8ssandra/k8ssandra-operator

itself?

A guess a question to be asked is, would anyone ever want to do an install of the operator alone?

If by alone you mean without required dependencies the answer is definitely no because the operator won’t be usable.

So, in the meantime, what’s the safest way to preserve that future from breaking changes? Is that to instead support

helm install k8ssandra/k8ssandra-operator

itself?

Yes, I believe this is the safest way.

That was a bit of a wandering train of thought in my last comment, I think I more meant this…

Once we have helm install k8ssandra/k8assandra doing it’s magic by deploying the operator, would there then be any need for helm install k8ssandra/k8ssandra-operator.

My thinking would be yes, in the sense that at that point the k8ssandra/k8ssandra chart becomes a convenience wrapper, a bridge from the existing deployment mechanism that we’re to use to to the new one. Although it would be a bit duplicative, so perhaps it isn’t really desired.

Does that question make more sense?

You are correct about k8ssandra/k8ssandra basically just being a bridge from 1.x to 2.x.

I did a bit of testing to confirm that the helm.sh/resource-policy annotation works as described. It does but we would need to disable the cleaner pre-delete hook since it will delete the CassandraDatacenter. I stubbed it out, added the annotation to my CassandraDatacenter, and it was left intact after helm uninstall.

Something occurred to me though about trying to automatically migrate the CassandraDatacenter into a newly created K8ssandraCluster. The K8ssandraCluster spec will define the desired state for the CassandraDatacenters that are part of it. This means we need we would need to figure out a way to take the spec from the CassandraDatacenter and redeclare it in the K8ssandraCluster spec; otherwise, k8ssandra-operator will overwrite the spec of the CassandraDatacenter. This isn’t possible right now since we don’t expose all of the CassandraDatacenter properties in the K8ssandraCluster spec. I need to think some more on this.

With respect to the initial alpha release, I think it makes the most sense to add the required dependencies in the k8ssandra-operator chart so that it is useable now and doesn’t interfere with any 1.x development. That will buy us time to figure out how the migration should be done without forcing a long term solution now.

So after some more discussion today, I think we’re left with the following proposal for the moment:

  • Enhance the current k8ssandra/k8ssandra-operator chart so that it includes the necessary dependencies so that it can be installed independently
  • Remove the hooks from the k8ssandra/k8ssandra chart that expose the operator
  • Make later decisions about how to integrate the operator deployment into the k8ssandra/k8ssandra chart, if at all
  • We would for now be accepting that we may end up with a later orphaned k8ssandra/k8ssandra-operator chart (this risk can hopefully be mitigated by keeping the k8ssandra/k8ssandra-operator chart behind the --devel helm flag)
  • A lot of the decisions there will hinge on the migration/upgrade path deemed most reasonable for going from 1.x to 2.x

What do we think about that?

Sounds reasonable to me!

Works for me as well :+1: