Hi Experts,
I have setup two-cluster Cassandra with two DCs. A Reaper instance is installed on each cluster.
On cluster-a,
$ kubectl get pod -n k8ssandra
NAME READY STATUS RESTARTS AGE
k8ssandra-cass-operator-597795986c-cnmlc 1/1 Running 0 6d3h
k8ssandra-crd-upgrader-job-k8ssandra-gllrt 0/1 Completed 0 2d23h
k8ssandra-dc1-eastus2-1-sts-0 3/3 Running 0 2d23h
k8ssandra-dc1-eastus2-2-sts-0 3/3 Running 0 2d23h
k8ssandra-dc1-eastus2-3-sts-0 3/3 Running 0 2d23h
k8ssandra-grafana-679d495758-xr2mn 2/2 Running 0 6d3h
k8ssandra-kube-prometheus-operator-7dcccdcc86-mmxf7 1/1 Running 0 6d3h
k8ssandra-medusa-operator-6f8d8595d8-n55v7 1/1 Running 2 6d3h
k8ssandra-reaper-6cd6649456-tp7w2 1/1 Running 0 2d23h
k8ssandra-reaper-operator-58cf7c449d-j7p5j 1/1 Running 3 3d9h
prometheus-k8ssandra-kube-prometheus-prometheus-0 2/2 Running 1 6d3h
On cluster-b,
$ kubectl get pod -n k8ssandra
NAME READY STATUS RESTARTS AGE
k8ssandra-cass-operator-597795986c-jqmn6 1/1 Running 0 3d14h
k8ssandra-crd-upgrader-job-k8ssandra-g24vt 0/1 Completed 0 77m
k8ssandra-dc2-eastus2-1-sts-0 3/3 Running 0 6h29m
k8ssandra-dc2-eastus2-2-sts-0 3/3 Running 0 6h28m
k8ssandra-dc2-eastus2-3-sts-0 3/3 Running 0 6h27m
k8ssandra-grafana-679d495758-vthqn 2/2 Running 0 3d14h
k8ssandra-kube-prometheus-operator-7dcccdcc86-cw8dq 1/1 Running 0 3d14h
k8ssandra-medusa-operator-6f8d8595d8-ztd68 1/1 Running 0 3d14h
k8ssandra-reaper-fb84449f6-fpk7b 1/1 Running 0 6h26m
k8ssandra-reaper-operator-67fcffbc99-h49gk 1/1 Running 0 3d14h
prometheus-k8ssandra-kube-prometheus-prometheus-0 2/2 Running 1 3d14h
The reaper instance on cluster-a works well. But on cluster-b, the reaper repair job works well, but the cluster information cannot be retrieved on webui.
The logs show that it fails to connect to the k8ssandra-dc1-service:7199.
Logs:
$ kubectl logs k8ssandra-reaper-fb84449f6-fpk7b -n k8ssandra -f
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ConfigurationException [Root exception is java.rmi.UnknownHostException: Unknown host: k8ssandra-dc1-service; nested exception is:
java.net.UnknownHostException: k8ssandra-dc1-service]
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at io.cassandrareaper.jmx.JmxProxyImpl.connectWithTimeout(JmxProxyImpl.java:304)
at io.cassandrareaper.jmx.JmxProxyImpl.connect(JmxProxyImpl.java:242)
... 30 common frames omitted
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ConfigurationException [Root exception is java.rmi.UnknownHostException: Unknown host: k8ssandra-dc1-service; nested exception is:
java.net.UnknownHostException: k8ssandra-dc1-service]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at io.cassandrareaper.jmx.JmxProxyImpl.lambda$connectWithTimeout$0(JmxProxyImpl.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I think it is not surprised for reaper instance running on cluster-b to access the service running on cluster-a by service host name.
The cluster table in reaper_db has the following data which should be replicated between the dc1 and dc2.
k8ssandra-superuser@cqlsh:reaper_db> select * from cluster;
name | last_contact | partitioner | properties | seed_hosts | state
-----------+---------------------------------+---------------------------------------------+----------------------------------------+---------------------------+--------
k8ssandra | 2021-08-30 00:00:00.000000+0000 | org.apache.cassandra.dht.Murmur3Partitioner | {"jmxPort":7199,"jmxCredentials":null} | {'k8ssandra-dc1-service'} | ACTIVE
(1 rows)
So any suggestion on this problem? Will setting the “datacenterAvailability” to EACH work here? The “datacenterAvailability” defaults to “ALL” and cannot be modified from helm chart currently.
Thanks much!