Need help to setup metrics


I installed a kubernetes cluster with my own prometheus and grafana (without operator) and k8ssandra.
I followed the doc Apache Cassandra® metrics endpoints | K8ssandra, Apache Cassandra on Kubernetes.

I am using this prometheus config: metric-collector-for-apache-cassandra/dashboards/prometheus/prometheus.yaml at master · datastax/metric-collector-for-apache-cassandra · GitHub.

I am seeing the prometheus metrics like collectd_mcac.* in prometheus.

I installed the dashboards metric-collector-for-apache-cassandra/dashboards/grafana/generated-dashboards at master · datastax/metric-collector-for-apache-cassandra · GitHub in grafana.

However the dashboards are not working as expected. In one dashboard, the prometheus item should be named mcac_client_request_latency_total but I didn’t see it in prometheus.

What is missing in my configuration ?


Hi, we’re moving away from MCAC, so I’d recommend (unless you can’t) to switch to the new metrics endpoint.
In order to setup the dashboards properly, follow this guide and pay attention to the " When using MCAC" or " When using the new metrics endpoint" parts, depending on the metrics endpoint you’ve configured.

You shouldn’t need to use the prometheus config you’re referring to if you’re using the new metrics endpoint (in which case the metrics names shouldn’t start with mcac_…). And these metrics names would work with the new dashboards straight away.

Note that MCAC exposes its metrics on port 9000 while MCAC exposes them on port 9103.

Nice Trolls de Troy avatar :wink:

I am using the old endpoint (port 9103). For the moment there is no answer on port 9000. So I tried to add in my k8ssandra config:

        enabled: true

but I encountered an error like k8ssandra don’t find the prometheus.

With the guide you mention in your answer, can I use my own prometheus/grafana ?


Hebus is the best !

Which version of k8ssandra-operator and Cassandra are you using?
That may explain why you’re not seeing the new endpoint.

Enabling telemetry.prometheus indeed requires to have the Prometheus operator installed. When enabled, k8ssandra-operator will create ServiceMonitor objects that will allow Prometheus discovering and scraping the endpoints automatically.

I must admit that I’ve never set up a custom Prometheus to scrape the Cassandra pods. Not entirely sure how to discover the new endpoints dynamically as the cluster expands :thinking:

One alternative that you could take is to rely on Vector instead, using the prometheus remote write sink component. But if you’re using MCAC, that means you’ll be missing a lot of relabellings and that would make our dashboards unfit for the metrics you’ll be writing.

So, the options you have:

  • Integrated way: Use the prometheus operator and rely on telemetry.prometheus.enabled to set up the scraping automatically, and use our “old” dashboards if you scrape MCAC, or the new ones if you have access to the new metrics endpoint

  • Integrated way: Use Vector in conjunction with the new metrics endpoint (more on setting up Vector here), and use our new dashboards as is. This allows you to use your current Prometheus installation without requiring to use the prometheus-operator.

  • Custom way: Move forward with what you have right now, and adjust the dashboards to what you’re getting. I would discourage this as we’re going to fully remove MCAC in a near future. The exact relabellings that are expected in the dashboard can be found here. I’m also unsure how you’ll be able to dynamically scrape newly added nodes.

It seems that the second option is the best. So I tried to do some test on port 9000.

If I do a port forwarding to port 9000 on the cassandra service, it is working fine. I am seeing the metrics on my web browser (it is working also in port 9103).

But if I connect to my prometheus pod, the port 9103 is working but the port 9000 is not working (connection refused).

Is-it because I don’t install vector yet ?

oooh I know what’s wrong here. It’s not related to Vector, it’s rather a “bug” we have where the new metrics endpoint on listens on localhost, thus preventing access from outside the pod.
You’ll find the solution here.

I was on the same topic :grin:
I am trying to change the config in live with openlens but I have

Failed to save resource: "cassandra" is invalid: spec.cassandra.telemetry.cassandra.endpoint: Invalid value: "string": spec.cassandra.telemetry.cassandra.endpoint in body must be of type object: "string"

But I added exactly the same than in the github discussion:

        enabled: true
        enabled: false
        endpoint: ""

In fact it’s:

        enabled: true
        enabled: false
          address: ""

Perhaps I need to tweak a little bit the dashboard. For example the variable PROMETHEUS_DS ?

I tweak a little bit the dashboard variables and it seems to work. I need to setup vector.

I use the cassandra service name in the prometheus config to scrape the metrics but in the dashboards I see only one pod. Do you know if the prometheus operator use the cassandra service ?

That’s the thing, going through the service won’t allow Prometheus to scrape all the endpoints consistently.
The prometheus operator discovers all the pods that have to be scraped thanks to the service monitor which indicates the labels of the target pods. Then it scrapes each pod separately, without going through a service.
Vector uses a push model, so you don’t need to worry about scraping each pod separately and change the list of pods when the cluster scales.

Yes we need to switch to prometheus operator but not yet.

I tried to fetch the pod directly from the prometheus pod but it does not work.
Normally http://pod ip:9000 should work ?

I activated vector but I still have some empty values for metric called host_cpu_seconds_total in the dashboards. Do you know why ?

for the missing host_*** metrics, see here.

1 Like

Sorry to disturb you again.
I need to choose a specific port for the sink (not 9000 / 9103) into my vector configuration like this ?

  - config: |
      address = ""
      - enrich_host_metrics
    name: prometheus_exporter
    type: prometheus_exporter

I was recommending to use the prometheus remote write sink, not the exporter one. Otherwise you’re in the same situation with a pull model.