Establishing conventions with kustomize

Background

Both cass-operator (starting in v1.8.0) and k8ssandra-operator support installation/configuration via kustomize. There has been ongoing discussion about kustomize in https://github.com/k8ssandra/k8ssandra-operator/pull/144 but I think the discussion is a bit outside of the scope of the PR and is better suited for the forum.

I am going to reference the example from the operator-sdk docs. If you create an operator project with:

operator-sdk init --domain example.com --repo github.com/example/memcached-operator

and then generate a controller with:

operator-sdk create api --group cache --version v1alpha1 --kind Memcached --resource --controller

a config directory is generated with several kustomization directories. It looks like this:

config
├── crd
│   ├── kustomization.yaml
│   ├── kustomizeconfig.yaml
│   └── patches
│       ├── cainjection_in_memcacheds.yaml
│       └── webhook_in_memcacheds.yaml
├── default
│   ├── kustomization.yaml
│   ├── manager_auth_proxy_patch.yaml
│   └── manager_config_patch.yaml
├── manager
│   ├── controller_manager_config.yaml
│   ├── kustomization.yaml
│   └── manager.yaml
├── manifests
│   └── kustomization.yaml
├── prometheus
│   ├── kustomization.yaml
│   └── monitor.yaml
├── rbac
│   ├── auth_proxy_client_clusterrole.yaml
│   ├── auth_proxy_role.yaml
│   ├── auth_proxy_role_binding.yaml
│   ├── auth_proxy_service.yaml
│   ├── kustomization.yaml
│   ├── leader_election_role.yaml
│   ├── leader_election_role_binding.yaml
│   ├── memcached_editor_role.yaml
│   ├── memcached_viewer_role.yaml
│   ├── role_binding.yaml
│   └── service_account.yaml
├── samples
│   ├── cache_v1alpha1_memcached.yaml
│   └── kustomization.yaml
└── scorecard
    ├── bases
    │   └── config.yaml
    ├── kustomization.yaml
    └── patches
        ├── basic.config.yaml
        └── olm.config.yaml

After running make manifests you will also wind up with config/rbac/role.yaml. Running make manifests will create/update role.yaml any time you modify kubebuilder RBAC annotations such as this one:

// +kubebuilder:rbac:groups=core,namespace="k8ssandra",resources=pods;secrets,verbs=get;list;watch

You will find this same directory structure in the cass-operator and k8ssandra-operator projects. For the purposes of this discussion we are primarily focused on the following directories:

  • crd
  • default
  • manager
  • rbac

We often refer to bases and overlays with kustomize. A base is a directory with a kustomization.yaml that includes resources, e.g., Deployment, to be configured and created.

An overlay is a directory with a kustomization.yaml that refers to other kustomization directories as it bases.

The crd, manager, and rbac directories are bases. I consider the default directory an overlay.

Questions

With some background covered I want to raise some questions around some of the discussion in https://github.com/k8ssandra/k8ssandra-operator/pull/144.

Where should non-generated kustomization directories live?
There was some discussion in the PR about whether additional kustomization directories should live under config or some other directory. I am in favor of keeping mostly everything under the config directory. There was a concern raised that this would complicate operator-sdk upgrades. I have not found that to be the case in my experience. The scaffolding of the config directory is a one-time operation with the exception of files that are modified by controller-gen. We don’t regenerate the scaffolding when upgrading operator-sdk.

More importantly the config directory is familiar to contributors as that is the convention used by kubebuilder/operator-sdk.

What bases should exist?
crd, manager, rbac, and webhook (not listed above but used in cass-operator eventually used in k8ssandra-operator) are simply defaults generated by the kubebuilder scaffolding. There is no hard and fast rule that a project has to have these and only these. With that said I think that these are sensible defaults and it makes sense to keep them.

As the projects evolve and as we refactor the kustomize-related code, we may introduce additional bases.

Where should bases be defined?
I am leaning towards creating a config/bases directory and moving all of the base directories under it. I think this would provide some clarity. This looks to be the approach with the istio-operator project. See here.

Where should overlays be defined
Similar to bases, I propose that overlays live under config/overlays.

What overlays should exist?
Both operator projects should have overlays for creating namespace-scoped and cluster-scoped deployments.

cass-operator should have a webhook overlay that includes cert-manager. We don’t need this yet in k8ssandra-operator since we haven’t implemented any webhooks for it.

k8ssandra-operator should have additional overlays for control-plane and data-plane deployments.

Then we need some combinations:

  • namespace-scoped, control-plane
  • namespace-scoped, data-plane
  • cluster-scoped, control-plane
  • cluster-scoped, data-plane

Should bases create namespaces?
In short, no. The scaffolding generates `config/manager/manager.yaml. This file includes:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    control-plane: controller-manager
  name: system

This creates problems as I have described in https://github.com/k8ssandra/cass-operator/issues/167.

I think that there should separate, dedicated bases should be used to create namespaces. This would of course be done in conjunction with overlays.

This article has some interesting things to say on the topic.

Firstly, I want to note that these conventions should not exclusively focus on k8ssandra-operator. We have four operators to consider here and should adopt a standard approach.

Namespaces are a constant bugbear and I’m happy with the proposals there.

I would prefer that the config/ directory remain as close to stock as possible because -

  1. It makes upgrades easier to keep track of. operator-sdk provides detailed update instructions for each version which rely on files being in certain places and resources having certain names. While we don’t rescaffold, we do need to modify the manifests as part of upgrade processes.
  2. It maximises the familiarity benefit of using a framework. We currently have 4 operators in play and the more we can transfer knowledge across them (and between our operators and other operators in the community) the easier it is for users + contributors to adopt.
  3. It offers flexibility for us and users to add more operator-sdk features as we move forward. E.g. right now we do not use the kube-rbac-proxy but some users might want to enable this.

There is no technical reason or convention that precludes treating the local operator’s config directory as a standalone base and then installing its dependancies via another base of which the local operator is a dependancy. I am in favour of this approach.

I am also in favour of keeping the customised base in a top level directory separate from the config directory, for the reasons given above. AFAIK an overlay should apply to a single base. Bases can call other bases (e.g. cass-operator dependancy for k8ssandra-operator) but overlays should not call multiple bases.

I am in favour of modifications to the stock config/ where these are documented in the README. Things like turning off unused features in kustomize.yaml files make sense. Adding image overrides in config/ or trying to install dependancies (like cass-operator) presents unacceptable risks which we could and should avoid.

Regarding overlays, I think there may be benefits to having two layers of overlays rather than having ‘combination’ overlays as suggested above. E.g. data-plane is an overlay on top of namespace-scoped. I think this is the more modular and idiomatic approach.

I am open to the idea of throwing the gates open to more extensive modifications. But in this case I recommend we clean out the configs directory to create a minimal set of configs. Right now we have subdirectories such as -

  • Webhook
  • kube-rbac-proxy
  • samples
  • cert-manager

Not all of these are used in all operators. In particular in reaper-operator there are a lot of dead manifests which would need to be removed. In my view, we can choose to either accept operator-sdk cruft on the basis that it is standardised cruft, or we need to do some very delicate surgery on the config directory and go our own way.

At this point I want to minimize the investments we have to make in the reaper-operator and medusa-operator repos since we are migrating them into k8ssandra-operator. That is why I focused only cass-operator and k8ssandra-operator.

About the config directory and operator-sdk upgrades…

We only have to modify the things we care about.

I already mentioned my motivation for trying to minimize changes in reaper-operator and medusa-operator.

With respect to contributors and other operators, here is a short list of operators from OperatorHub.io that do make changes in the config directory:

While this is not a huge list, it does show that it is not uncommon to make modifications in config.

First, it should be noted that config/default is an overlay not a base. Secondly and more importantly I will point out again K8SSAND-877 ⁃ Default kustomization should not create namespace · Issue #167 · k8ssandra/cass-operator · GitHub. The stock manifests, specifically config/manager.yaml present a problem for precisely this scenario.

Can you provide some concrete examples of risks?

I think that is more or less what I have in mind at this point. The directory structure might look something like this:

overlays
├── cluster-scoped
│   ├── control-plane
│   └── data-plane
└── namespace-scoped
    ├── control-plane
    └── data-plane

They are still a part of the ecosystem. For as long as they exist we should try to ensure that they are maintainable alongside the other operators. A planned deprecation (if that has been announced) doesn’t mean we can walk away from them tomorrow.

Figuring out which things we care about is not always straightforward, especially for new contributors. For example - I wasn’t expecting the reaper-operator to do leader election considering that we do not expect to run it in a replicated mode.

This is one of the kinds of risks I’m talking about. I would think following all the upgrade instructions is the lowest risk way and incurs the least cognitive overhead.

Another risk relates to the case I was experiencing yesterday. By including cass-operator in the configs/default kustomization, multiple leader-election-roles were instantiated in the same kustomization, which led to name clashes between the two operators. By building both out of a common aux-config/default base the issue disappears, because namePrefixes seem to be applied at a lower level in the kustomization hierarchy.

This alone is a compelling reason to keep config dedicated to the stock operator-sdk configs for the local operator.

It is exactly the same issue that you raised in this ticket which you’ve mentioned a few times. Using the design I’ve proposed (additional intermediate layer to install dependancies rather than piggybacking off config/default) eliminates these problems. Note that it does not save you from namespacing issues precisely because namespaces to not get namePrefixed AFAIK.

I’ve only looked at the VictoriaMetrics operator, but from what I can see there is no config/overlays/, no config/bases or anything like that. Deployment entrypoint seems to remain default. They do have a config/deployments which seems to be example deployments.

I’m just proposing to make that a top level folder. I don’t find the VicMetrics approach of having config/default called by config/deployments/default intuitive or reasonable. There should be one default.

So in the design I’m proposing it is a base for aux-config/default. But true, by default it is an overlay. This is where I really feel that talk of overlays and bases is kind of a distraction, and another reason I object to naming folders in this way - the second you create a new overlay referring to an item in config/overlays it now should be moved into config/bases? This seems less than ideal.

I mean, you’re violating “Package by feature, not layer” straight out of the gate.

I do note that there seems to be no implication in the kustomize docs to suggest that an overlay must reference a single base (as I previously claimed). My experience leads me to think of bases and overlays in these terms however, and I find that a more effective abstraction.

Also RE composing multi/single-namespace and data/control plane overlays - I think this is what kustomize components are intended to address. https://github.com/arrikto/kubernetes-enhancements/commit/35e9662eeb4e9b176269b031096a042ac21631f6

(And perhaps when I make comments on overlays I’m really talking about components. This is a high change area…)

Real Istio operaotr does not use config directory at all (this link won’t work as the forum won’t allow me to link): https : // github .com / istio/ istio/ tree/ master/ operator

As for the banzaicloud’s own Istio operator, they do use config/base, but it’s based on the old kustomize structure (they created the directory in 2019), when kustomize still used “bases” and not “resources” like these days.

These are not enough. There are also OLM related functionality which depend on the structure, such as manifests being integrated in to the bundle automatically (I didn’t see where this could be changed). The operator-sdk assumes the structure is what it is at the moment when it scaffolds / updates the bundle creation in Makefile. We will need these features later on.

manifests itself is a base / overlay depending how you want to see it.

This isn’t necessarily possible. Kustomize does not allow adding …/ as resource, that creates a loop. So while it sounds nice to have: overlays/with-namespace/data-plane and ./control-plane, neither data-plane nor control-plane can include overlays/with-namespace/ as their parent. So the actual kustomization.yaml has to live in some other directory.

The error message you’ll end up with is:

“Error: accumulating resources: accumulation err=‘accumulating resources from ‘…/’: ‘/Users/michael.burman/projects/datastax/cass-operator/config/default’ must resolve to a file’: cycle detected: candidate root ‘/Users/michael.burman/projects/datastax/cass-operator/config/default’ contains visited root ‘/Users/michael.burman/projects/datastax/cass-operator/config/default/with-namespace’”

@miles correct me if I am wrong but were the errors that you hit due to an older version of controller-gen and not the kustomize directory structure, resources, etc.?

Then that answers how some of the organization needs to be done :+1:

It seems to have been some interaction between the two. Upgrading to 0.6.1 of controller-gen fixed it, but restructuring the kustomizations also fixed it (and I think the changes I made there have value outside of the role names clashing - they fixed some additional issues you identified too.)

I am at a loss as to why the controller-gen annotation on the CRDs would cause an issue with clashing role names. It does not make sense to me as the same roles were in place irrespective of the annotation.