Wrong time in Cassandra metrics collector (mcac)

tah-mas · August 23, 2021, 10:34am

Hi all, I’m in the process of switching over our production Cassandra cluster to k8ssandra but in order to do this I need to be able to observe the cluster and set up alerts etc in Grafana.

After an install of k8ssandra 1.3.0 and Cassandra 4.0.0 (on EKS with Helm) with Prometheus and Grafana, data flows into Prometheus but the logs quickly fill up with out-of-order timestamp see:

github.com/k8ssandra/k8ssandra

Prometheus "out-of-order timestamp" error due to metrics relabeling conflict

opened 12:41PM - 14 Jul 21 UTC

kien-truong

bug needs-triage

## Bug Report  **Describe the bug** K8ssandra is currently exporting 2 metrics `collectd_mcac_histogram_count_total` and `collectd_mcac_meter_count_total`. However, using the default relabeling rules, these metrics are being relabeled to the same metrics. This will confuse Prometheus and can cause `out-of-order` timestamp error if the metrics have different timestamps. For example: ``` collectd_mcac_histogram_count_total{mcac="org.apache.cassandra.metrics.table.write_latency.system_schema.types",instance="10.11.66.20",mcac_filtered="true",cluster="cluster",dc="dc1",rack="rack1"} 2 1626265466529 collectd_mcac_meter_count_total{mcac="org.apache.cassandra.metrics.table.write_latency.system_schema.types",instance="10.11.66.20",mcac_filtered="true",cluster="cluster",dc="dc1",rack="rack1"} 2 1626265466528 ``` After relabelings ``` mcac_table_write_latency{mcac="org.apache.cassandra.metrics.table.write_latency.system_schema.types",instance="10.11.66.20",mcac_filtered="true",cluster="cluster",dc="dc1",rack="rack1", table="types", keyspace="system_schema"} 2 1626265466529 mcac_table_write_latency{mcac="org.apache.cassandra.metrics.table.write_latency.system_schema.types",instance="10.11.66.20",mcac_filtered="true",cluster="cluster",dc="dc1",rack="rack1", table="types", keyspace="system_schema"} 2 1626265466528 ``` Because `1626265466528` is before `1626265466529` an `out-of-order timestamp error` will be reported. **To Reproduce** Steps to reproduce the behavior: Deploy k8ssandra with Prometheus integration using default configuration. **Expected behavior** No relabeling conflict and no error reported by Prometheus. **Environment (please complete the following information):** * Helm charts version info ``` NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION cassandra cassandra 16 2021-07-07 16:14:03.704604036 +0700 +07 deployed k8ssandra-1.2.0 ``` * Kubernetes version information: ``` Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:09:28Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:00:06Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"} ``` * Kubernetes cluster kind: `Baremetal` ┆Issue is synchronized with this [Jiraserver Bug](https://k8ssandra.atlassian.net/browse/K8SSAND-712) by [Unito](https://www.unito.io) ┆Issue Number: K8SSAND-712 ┆Priority: Medium

Not too bad, as we can still view the data in Grafana but after a couple of days MCAC timestamps starts to lag behind (up to 4 hours) which results in Prometheus dropping the messages and an empty Cassandra Grafana dashboards:
“Error on ingesting samples that are too old or are too far into the future” num_dropped=61638

See Remove timestamp from the metrics · Issue #43 · datastax/metric-collector-for-apache-cassandra · GitHub for more details about this.

Are there any known workarounds to this as I can’t deploy to production without being able to monitor my cluster?

Thanks

Topic		Replies	Views
Need help to setup metrics K8ssandra help	14	937	June 29, 2023
Filter out new metrics K8ssandra help	4	180	April 5, 2024
Welcome to the K8ssandra Community Welcome and Announcements	8	1371	April 11, 2024
Add relabelings on servicemonitor created via k8ssandra-operator 1.4 K8ssandra help	4	326	June 13, 2023
K8ssandra v1.2.0 and Cass Operator v1.7.1 Releases Welcome and Announcements	0	800	April 29, 2021

K8ssandra

Wrong time in Cassandra metrics collector (mcac)

Related topics