Issue in Multiple Reaper instances with JMX accessible for all DCs

yonglinh · August 30, 2021, 4:39pm

Hi Experts,

I have setup two-cluster Cassandra with two DCs. A Reaper instance is installed on each cluster.
On cluster-a,

$ kubectl get pod -n k8ssandra
NAME                                                  READY   STATUS      RESTARTS   AGE
k8ssandra-cass-operator-597795986c-cnmlc              1/1     Running     0          6d3h
k8ssandra-crd-upgrader-job-k8ssandra-gllrt            0/1     Completed   0          2d23h
k8ssandra-dc1-eastus2-1-sts-0                         3/3     Running     0          2d23h
k8ssandra-dc1-eastus2-2-sts-0                         3/3     Running     0          2d23h
k8ssandra-dc1-eastus2-3-sts-0                         3/3     Running     0          2d23h
k8ssandra-grafana-679d495758-xr2mn                    2/2     Running     0          6d3h
k8ssandra-kube-prometheus-operator-7dcccdcc86-mmxf7   1/1     Running     0          6d3h
k8ssandra-medusa-operator-6f8d8595d8-n55v7            1/1     Running     2          6d3h
k8ssandra-reaper-6cd6649456-tp7w2                     1/1     Running     0          2d23h
k8ssandra-reaper-operator-58cf7c449d-j7p5j            1/1     Running     3          3d9h
prometheus-k8ssandra-kube-prometheus-prometheus-0     2/2     Running     1          6d3h

On cluster-b,

$ kubectl get pod -n k8ssandra
NAME                                                  READY   STATUS      RESTARTS   AGE
k8ssandra-cass-operator-597795986c-jqmn6              1/1     Running     0          3d14h
k8ssandra-crd-upgrader-job-k8ssandra-g24vt            0/1     Completed   0          77m
k8ssandra-dc2-eastus2-1-sts-0                         3/3     Running     0          6h29m
k8ssandra-dc2-eastus2-2-sts-0                         3/3     Running     0          6h28m
k8ssandra-dc2-eastus2-3-sts-0                         3/3     Running     0          6h27m
k8ssandra-grafana-679d495758-vthqn                    2/2     Running     0          3d14h
k8ssandra-kube-prometheus-operator-7dcccdcc86-cw8dq   1/1     Running     0          3d14h
k8ssandra-medusa-operator-6f8d8595d8-ztd68            1/1     Running     0          3d14h
k8ssandra-reaper-fb84449f6-fpk7b                      1/1     Running     0          6h26m
k8ssandra-reaper-operator-67fcffbc99-h49gk            1/1     Running     0          3d14h
prometheus-k8ssandra-kube-prometheus-prometheus-0     2/2     Running     1          3d14h

The reaper instance on cluster-a works well. But on cluster-b, the reaper repair job works well, but the cluster information cannot be retrieved on webui.

The logs show that it fails to connect to the k8ssandra-dc1-service:7199.
Logs:

$ kubectl logs k8ssandra-reaper-fb84449f6-fpk7b -n k8ssandra -f
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ConfigurationException [Root exception is java.rmi.UnknownHostException: Unknown host: k8ssandra-dc1-service; nested exception is: 
        java.net.UnknownHostException: k8ssandra-dc1-service]
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:206)
        at io.cassandrareaper.jmx.JmxProxyImpl.connectWithTimeout(JmxProxyImpl.java:304)
        at io.cassandrareaper.jmx.JmxProxyImpl.connect(JmxProxyImpl.java:242)
        ... 30 common frames omitted
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ConfigurationException [Root exception is java.rmi.UnknownHostException: Unknown host: k8ssandra-dc1-service; nested exception is: 
        java.net.UnknownHostException: k8ssandra-dc1-service]
        at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
        at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
        at io.cassandrareaper.jmx.JmxProxyImpl.lambda$connectWithTimeout$0(JmxProxyImpl.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

I think it is not surprised for reaper instance running on cluster-b to access the service running on cluster-a by service host name.

The cluster table in reaper_db has the following data which should be replicated between the dc1 and dc2.

k8ssandra-superuser@cqlsh:reaper_db> select * from cluster;

 name      | last_contact                    | partitioner                                 | properties                             | seed_hosts                | state
-----------+---------------------------------+---------------------------------------------+----------------------------------------+---------------------------+--------
 k8ssandra | 2021-08-30 00:00:00.000000+0000 | org.apache.cassandra.dht.Murmur3Partitioner | {"jmxPort":7199,"jmxCredentials":null} | {'k8ssandra-dc1-service'} | ACTIVE

(1 rows)

So any suggestion on this problem? Will setting the “datacenterAvailability” to EACH work here? The “datacenterAvailability” defaults to “ALL” and cannot be modified from helm chart currently.

Thanks much!

Topic		Replies	Views
Can't find reaper pod/service after k8ssandra deployment K8ssandra help	7	788	October 27, 2022
Reaper with Cassandra5.0 K8ssandra help	0	105	February 18, 2025
Reaper crashing - failure to connect to cluster K8ssandra help	1	1580	February 26, 2022
Centralized Reaper Deployment Mode in k8ssandra-operator	0	19	September 9, 2024
Another K8ssandra operator reconciliation error K8ssandra help	6	361	June 29, 2023

K8ssandra

Issue in Multiple Reaper instances with JMX accessible for all DCs

Related topics