K8ssandra Forum

Issue in Multiple Reaper instances with JMX accessible for all DCs

Hi Experts,

I have setup two-cluster Cassandra with two DCs. A Reaper instance is installed on each cluster.
On cluster-a,

$ kubectl get pod -n k8ssandra
NAME                                                  READY   STATUS      RESTARTS   AGE
k8ssandra-cass-operator-597795986c-cnmlc              1/1     Running     0          6d3h
k8ssandra-crd-upgrader-job-k8ssandra-gllrt            0/1     Completed   0          2d23h
k8ssandra-dc1-eastus2-1-sts-0                         3/3     Running     0          2d23h
k8ssandra-dc1-eastus2-2-sts-0                         3/3     Running     0          2d23h
k8ssandra-dc1-eastus2-3-sts-0                         3/3     Running     0          2d23h
k8ssandra-grafana-679d495758-xr2mn                    2/2     Running     0          6d3h
k8ssandra-kube-prometheus-operator-7dcccdcc86-mmxf7   1/1     Running     0          6d3h
k8ssandra-medusa-operator-6f8d8595d8-n55v7            1/1     Running     2          6d3h
k8ssandra-reaper-6cd6649456-tp7w2                     1/1     Running     0          2d23h
k8ssandra-reaper-operator-58cf7c449d-j7p5j            1/1     Running     3          3d9h
prometheus-k8ssandra-kube-prometheus-prometheus-0     2/2     Running     1          6d3h

On cluster-b,

$ kubectl get pod -n k8ssandra
NAME                                                  READY   STATUS      RESTARTS   AGE
k8ssandra-cass-operator-597795986c-jqmn6              1/1     Running     0          3d14h
k8ssandra-crd-upgrader-job-k8ssandra-g24vt            0/1     Completed   0          77m
k8ssandra-dc2-eastus2-1-sts-0                         3/3     Running     0          6h29m
k8ssandra-dc2-eastus2-2-sts-0                         3/3     Running     0          6h28m
k8ssandra-dc2-eastus2-3-sts-0                         3/3     Running     0          6h27m
k8ssandra-grafana-679d495758-vthqn                    2/2     Running     0          3d14h
k8ssandra-kube-prometheus-operator-7dcccdcc86-cw8dq   1/1     Running     0          3d14h
k8ssandra-medusa-operator-6f8d8595d8-ztd68            1/1     Running     0          3d14h
k8ssandra-reaper-fb84449f6-fpk7b                      1/1     Running     0          6h26m
k8ssandra-reaper-operator-67fcffbc99-h49gk            1/1     Running     0          3d14h
prometheus-k8ssandra-kube-prometheus-prometheus-0     2/2     Running     1          3d14h

The reaper instance on cluster-a works well. But on cluster-b, the reaper repair job works well, but the cluster information cannot be retrieved on webui.

The logs show that it fails to connect to the k8ssandra-dc1-service:7199.
Logs:

$ kubectl logs k8ssandra-reaper-fb84449f6-fpk7b -n k8ssandra -f
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ConfigurationException [Root exception is java.rmi.UnknownHostException: Unknown host: k8ssandra-dc1-service; nested exception is: 
        java.net.UnknownHostException: k8ssandra-dc1-service]
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:206)
        at io.cassandrareaper.jmx.JmxProxyImpl.connectWithTimeout(JmxProxyImpl.java:304)
        at io.cassandrareaper.jmx.JmxProxyImpl.connect(JmxProxyImpl.java:242)
        ... 30 common frames omitted
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ConfigurationException [Root exception is java.rmi.UnknownHostException: Unknown host: k8ssandra-dc1-service; nested exception is: 
        java.net.UnknownHostException: k8ssandra-dc1-service]
        at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
        at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
        at io.cassandrareaper.jmx.JmxProxyImpl.lambda$connectWithTimeout$0(JmxProxyImpl.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

I think it is not surprised for reaper instance running on cluster-b to access the service running on cluster-a by service host name.

The cluster table in reaper_db has the following data which should be replicated between the dc1 and dc2.

k8ssandra-superuser@cqlsh:reaper_db> select * from cluster;

 name      | last_contact                    | partitioner                                 | properties                             | seed_hosts                | state
-----------+---------------------------------+---------------------------------------------+----------------------------------------+---------------------------+--------
 k8ssandra | 2021-08-30 00:00:00.000000+0000 | org.apache.cassandra.dht.Murmur3Partitioner | {"jmxPort":7199,"jmxCredentials":null} | {'k8ssandra-dc1-service'} | ACTIVE

(1 rows)

So any suggestion on this problem? Will setting the “datacenterAvailability” to EACH work here? The “datacenterAvailability” defaults to “ALL” and cannot be modified from helm chart currently.

Thanks much!