Hi all, I’m in the process of switching over our production Cassandra cluster to k8ssandra but in order to do this I need to be able to observe the cluster and set up alerts etc in Grafana.
After an install of k8ssandra 1.3.0 and Cassandra 4.0.0 (on EKS with Helm) with Prometheus and Grafana, data flows into Prometheus but the logs quickly fill up with out-of-order timestamp see:
opened 12:41PM - 14 Jul 21 UTC
bug
needs-triage
## Bug Report
<!--
Thanks for filing an issue! Before hitting the button, please… answer these questions.
Fill in as much of the template below as you can.
-->
**Describe the bug**
K8ssandra is currently exporting 2 metrics `collectd_mcac_histogram_count_total` and `collectd_mcac_meter_count_total`. However, using the default relabeling rules, these metrics are being relabeled to the same metrics. This will confuse Prometheus and can cause `out-of-order` timestamp error if the metrics have different timestamps.
For example:
```
collectd_mcac_histogram_count_total{mcac="org.apache.cassandra.metrics.table.write_latency.system_schema.types",instance="10.11.66.20",mcac_filtered="true",cluster="cluster",dc="dc1",rack="rack1"} 2 1626265466529
collectd_mcac_meter_count_total{mcac="org.apache.cassandra.metrics.table.write_latency.system_schema.types",instance="10.11.66.20",mcac_filtered="true",cluster="cluster",dc="dc1",rack="rack1"} 2 1626265466528
```
After relabelings
```
mcac_table_write_latency{mcac="org.apache.cassandra.metrics.table.write_latency.system_schema.types",instance="10.11.66.20",mcac_filtered="true",cluster="cluster",dc="dc1",rack="rack1", table="types", keyspace="system_schema"} 2 1626265466529
mcac_table_write_latency{mcac="org.apache.cassandra.metrics.table.write_latency.system_schema.types",instance="10.11.66.20",mcac_filtered="true",cluster="cluster",dc="dc1",rack="rack1", table="types", keyspace="system_schema"} 2 1626265466528
```
Because `1626265466528` is before `1626265466529` an `out-of-order timestamp error` will be reported.
**To Reproduce**
Steps to reproduce the behavior:
Deploy k8ssandra with Prometheus integration using default configuration.
**Expected behavior**
No relabeling conflict and no error reported by Prometheus.
**Environment (please complete the following information):**
* Helm charts version info
```
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cassandra cassandra 16 2021-07-07 16:14:03.704604036 +0700 +07 deployed k8ssandra-1.2.0
```
* Kubernetes version information:
```
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:09:28Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:00:06Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
```
* Kubernetes cluster kind: `Baremetal`
┆Issue is synchronized with this [Jiraserver Bug](https://k8ssandra.atlassian.net/browse/K8SSAND-712) by [Unito](https://www.unito.io)
┆Issue Number: K8SSAND-712
┆Priority: Medium
Not too bad, as we can still view the data in Grafana but after a couple of days MCAC timestamps starts to lag behind (up to 4 hours) which results in Prometheus dropping the messages and an empty Cassandra Grafana dashboards:
“Error on ingesting samples that are too old or are too far into the future” num_dropped=61638
See Remove timestamp from the metrics · Issue #43 · datastax/metric-collector-for-apache-cassandra · GitHub for more details about this.
Are there any known workarounds to this as I can’t deploy to production without being able to monitor my cluster?
Thanks