However the dashboards are not working as expected. In one dashboard, the prometheus item should be named mcac_client_request_latency_total but I didn’t see it in prometheus.
Hi, we’re moving away from MCAC, so I’d recommend (unless you can’t) to switch to the new metrics endpoint.
In order to setup the dashboards properly, follow this guide and pay attention to the " When using MCAC" or " When using the new metrics endpoint" parts, depending on the metrics endpoint you’ve configured.
You shouldn’t need to use the prometheus config you’re referring to if you’re using the new metrics endpoint (in which case the metrics names shouldn’t start with mcac_…). And these metrics names would work with the new dashboards straight away.
Note that MCAC exposes its metrics on port 9000 while MCAC exposes them on port 9103.
Which version of k8ssandra-operator and Cassandra are you using?
That may explain why you’re not seeing the new endpoint.
Enabling telemetry.prometheus indeed requires to have the Prometheus operator installed. When enabled, k8ssandra-operator will create ServiceMonitor objects that will allow Prometheus discovering and scraping the endpoints automatically.
I must admit that I’ve never set up a custom Prometheus to scrape the Cassandra pods. Not entirely sure how to discover the new endpoints dynamically as the cluster expands
One alternative that you could take is to rely on Vector instead, using the prometheus remote write sink component. But if you’re using MCAC, that means you’ll be missing a lot of relabellings and that would make our dashboards unfit for the metrics you’ll be writing.
So, the options you have:
Integrated way: Use the prometheus operator and rely on telemetry.prometheus.enabled to set up the scraping automatically, and use our “old” dashboards if you scrape MCAC, or the new ones if you have access to the new metrics endpoint
Integrated way: Use Vector in conjunction with the new metrics endpoint (more on setting up Vector here), and use our new dashboards as is. This allows you to use your current Prometheus installation without requiring to use the prometheus-operator.
Custom way: Move forward with what you have right now, and adjust the dashboards to what you’re getting. I would discourage this as we’re going to fully remove MCAC in a near future. The exact relabellings that are expected in the dashboard can be found here. I’m also unsure how you’ll be able to dynamically scrape newly added nodes.
It seems that the second option is the best. So I tried to do some test on port 9000.
If I do a port forwarding to port 9000 on the cassandra service, it is working fine. I am seeing the metrics on my web browser (it is working also in port 9103).
But if I connect to my prometheus pod, the port 9103 is working but the port 9000 is not working (connection refused).
oooh I know what’s wrong here. It’s not related to Vector, it’s rather a “bug” we have where the new metrics endpoint on listens on localhost, thus preventing access from outside the pod.
You’ll find the solution here.
I was on the same topic
I am trying to change the config in live with openlens but I have
Failed to save resource: K8ssandraCluster.k8ssandra.io "cassandra" is invalid: spec.cassandra.telemetry.cassandra.endpoint: Invalid value: "string": spec.cassandra.telemetry.cassandra.endpoint in body must be of type object: "string"
But I added exactly the same than in the github discussion:
I tweak a little bit the dashboard variables and it seems to work. I need to setup vector.
I use the cassandra service name in the prometheus config to scrape the metrics but in the dashboards I see only one pod. Do you know if the prometheus operator use the cassandra service ?
That’s the thing, going through the service won’t allow Prometheus to scrape all the endpoints consistently.
The prometheus operator discovers all the pods that have to be scraped thanks to the service monitor which indicates the labels of the target pods. Then it scrapes each pod separately, without going through a service.
Vector uses a push model, so you don’t need to worry about scraping each pod separately and change the list of pods when the cluster scales.