K8ssandra Forum

Local installation on VM: reaper and stargate are stuck

Hi together,

I am trying to setup k8ssandra on my laptop to evaluate and start developing.
I followed so far the steps listed in the documentation and also saw the hints on resource requirements. My laptop is a standard lenovo thinkpad with 16GB RAM and a I7 cpu.
For k8ssandra installation, I am starting a Vagrant VirtualBox with 16GB RAM and 8 CPUs.
Except for stargate and reaper pod, everything is running as expected. Just these two just don’t want to. What puzzles me, is that I cannot find an error in the stargate’s log.
The reaper says, “no datasource available” which I think is related to the crashed stargate pod. The only point that comes to my mind is the warning about rlimi memlock. However, from what I saw, kubernetes currently provides no way to pass a ulimit option.

Do you have a tip for me?

Please find below some dumps. I apologize for the plain text pasting here. If there is a preferred way to pass large logs, please let me know.

node description:

Name:               ubuntu-focal
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ubuntu-focal
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node-role.kubernetes.io/master=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 172.28.128.20/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 10.96.223.128
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 08 Jun 2021 13:22:56 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ubuntu-focal
  AcquireTime:     <unset>
  RenewTime:       Wed, 09 Jun 2021 09:04:20 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Wed, 09 Jun 2021 09:01:22 +0000   Wed, 09 Jun 2021 09:01:22 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Wed, 09 Jun 2021 09:04:00 +0000   Tue, 08 Jun 2021 13:22:48 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Wed, 09 Jun 2021 09:04:00 +0000   Tue, 08 Jun 2021 13:22:48 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Wed, 09 Jun 2021 09:04:00 +0000   Tue, 08 Jun 2021 13:22:48 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Wed, 09 Jun 2021 09:04:00 +0000   Tue, 08 Jun 2021 13:25:48 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.0.2.15
  Hostname:    ubuntu-focal
Capacity:
  cpu:                8
  ephemeral-storage:  40593612Ki
  hugepages-2Mi:      0
  memory:             16397132Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  37411072758
  hugepages-2Mi:      0
  memory:             16294732Ki
  pods:               110
System Info:
  Machine ID:                 0c2486c2f10248ae90b28b59db14433a
  System UUID:                c66293b4-c852-4b9f-b252-db12d84911a2
  Boot ID:                    887b495f-3c10-49e7-ab27-b1897e84edd8
  Kernel Version:             5.4.0-73-generic
  OS Image:                   Ubuntu 20.04.2 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.7
  Kubelet Version:            v1.21.1
  Kube-Proxy Version:         v1.21.1
PodCIDR:                      10.96.0.0/24
PodCIDRs:                     10.96.0.0/24
Non-terminated Pods:          (19 in total)
  Namespace                   Name                                                 CPU Requests  CPU Limits   Memory Requests  Memory Limits    Age
  ---------                   ----                                                 ------------  ----------   ---------------  -------------    ---
  default                     dnsutils                                             0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
  default                     k8ssandra-cass-operator-65fd67b6d8-9rmj5             0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
  default                     k8ssandra-dc1-default-sts-0                          1100m (13%)   1100m (13%)  2159652Ki (13%)  2159652Ki (13%)  94m
  default                     k8ssandra-dc1-stargate-76c576f7fc-spbp4              200m (2%)     1 (12%)      512Mi (3%)       1Gi (6%)         19h
  default                     k8ssandra-grafana-584dfb486f-w6zrn                   0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
  default                     k8ssandra-kube-prometheus-operator-85695ffb-mmccj    0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
  default                     k8ssandra-reaper-7ffb485bb8-ggvl7                    0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
  default                     k8ssandra-reaper-operator-b67dc8cdf-drvrh            0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
  default                     prometheus-k8ssandra-kube-prometheus-prometheus-0    100m (1%)     100m (1%)    50Mi (0%)        50Mi (0%)        19h
  kube-system                 calico-kube-controllers-b656ddcfc-rllvh              0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
  kube-system                 calico-node-qrzqh                                    250m (3%)     0 (0%)       0 (0%)           0 (0%)           19h
  kube-system                 coredns-558bd4d5db-jl7mp                             100m (1%)     0 (0%)       70Mi (0%)        170Mi (1%)       19h
  kube-system                 coredns-558bd4d5db-n2jwm                             100m (1%)     0 (0%)       70Mi (0%)        170Mi (1%)       19h
  kube-system                 etcd-ubuntu-focal                                    100m (1%)     0 (0%)       100Mi (0%)       0 (0%)           19h
  kube-system                 kube-apiserver-ubuntu-focal                          250m (3%)     0 (0%)       0 (0%)           0 (0%)           19h
  kube-system                 kube-controller-manager-ubuntu-focal                 200m (2%)     0 (0%)       0 (0%)           0 (0%)           19h
  kube-system                 kube-proxy-jb9p6                                     0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
  kube-system                 kube-scheduler-ubuntu-focal                          100m (1%)     0 (0%)       0 (0%)           0 (0%)           19h
  local-path-storage          local-path-provisioner-5696dbb894-d687f              0 (0%)        0 (0%)       0 (0%)           0 (0%)           19h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests         Limits
  --------           --------         ------
  cpu                2500m (31%)      2200m (27%)
  memory             2980900Ki (18%)  3607588Ki (22%)
  ephemeral-storage  100Mi (0%)       0 (0%)
  hugepages-2Mi      0 (0%)           0 (0%)
Events:              <none>

stargate description:

Name:         k8ssandra-dc1-stargate-76c576f7fc-spbp4
Namespace:    default
Priority:     0
Node:         ubuntu-focal/10.0.2.15
Start Time:   Tue, 08 Jun 2021 13:32:38 +0000
Labels:       app=k8ssandra-dc1-stargate
              pod-template-hash=76c576f7fc
Annotations:  cni.projectcalico.org/podIP: 10.96.223.164/32
              cni.projectcalico.org/podIPs: 10.96.223.164/32
Status:       Running
IP:           10.96.223.164
IPs:
  IP:           10.96.223.164
Controlled By:  ReplicaSet/k8ssandra-dc1-stargate-76c576f7fc
Init Containers:
  wait-for-cassandra:
    Container ID:  docker://3a43be530a3df15f4013c1b792cf344f6c8ac3a7226fbe3a2498eb48b91445bc
    Image:         alpine:3.12.2
    Image ID:      docker-pullable://alpine@sha256:a126728cb7db157f0deb377bcba3c5e473e612d7bafc27f6bb4e5e083f9f08c2
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
    Args:
      -c
      echo "Waiting for all Cassandra nodes in dc1 to finish bootstrapping..."
      while [ "$(nslookup k8ssandra-dc1-service.default.svc.cluster.local | grep Name | wc -l)" != "1" ]; do
          sleep 5
      done
      echo "Cassandra is ready. Starting Stargate..."
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 09 Jun 2021 07:28:43 +0000
      Finished:     Wed, 09 Jun 2021 07:35:25 +0000
    Ready:          True
    Restart Count:  2
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7fllr (ro)
Containers:
  k8ssandra-dc1-stargate:
    Container ID:   docker://a83523126279517058d5efb5783c27bad8ba01179858bd758f328eac62b6321f
    Image:          stargateio/stargate-3_11:v1.0.18
    Image ID:       docker-pullable://stargateio/stargate-3_11@sha256:0085bfb234056a8921c63b8bc251c40bc4bc215e3c9544ed80c9db7c2acb8215
    Ports:          8080/TCP, 8081/TCP, 8082/TCP, 8084/TCP, 8085/TCP, 8090/TCP, 9042/TCP, 8609/TCP, 7000/TCP, 7001/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Wed, 09 Jun 2021 09:03:31 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Wed, 09 Jun 2021 09:02:18 +0000
      Finished:     Wed, 09 Jun 2021 09:03:29 +0000
    Ready:          False
    Restart Count:  37
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      200m
      memory:   512Mi
    Liveness:   http-get http://:health/checker/liveness delay=30s timeout=10s period=10s #success=1 #failure=5
    Readiness:  http-get http://:health/checker/readiness delay=30s timeout=10s period=10s #success=1 #failure=5
    Environment:
      JAVA_OPTS:        -XX:+CrashOnOutOfMemoryError -Xms256M -Xmx256M
      CLUSTER_NAME:     k8ssandra
      CLUSTER_VERSION:  3.11
      SEED:             k8ssandra-seed-service.default.svc.cluster.local
      DATACENTER_NAME:  dc1
      RACK_NAME:        default
      ENABLE_AUTH:      true
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7fllr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-7fllr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From     Message
  ----     ------     ----                   ----     -------
  Warning  Unhealthy  31m (x100 over 88m)    kubelet  Readiness probe failed: Get "http://10.96.223.164:8084/checker/readiness": dial tcp 10.96.223.164:8084: connect: connection refused
  Warning  BackOff    6m14s (x231 over 82m)  kubelet  Back-off restarting failed container
  Warning  Unhealthy  77s (x121 over 88m)    kubelet  Liveness probe failed: Get "http://10.96.223.164:8084/checker/liveness": dial tcp 10.96.223.164:8084: connect: connection refused

stargate log:

Using environment for config
Running java -server -XX:+CrashOnOutOfMemoryError -Xms256M -Xmx256M -Dstargate.libdir=./stargate-lib -Djava.awt.headless=true -jar ./stargate-lib/stargate-starter-1.0.18.jar --cluster-name k8ssandra --cluster-version 3.11 --cluster-seed k8ssandra-seed-service.default.svc.cluster.local --listen 10.96.223.164 --dc dc1 --rack default --enable-auth
JAR DIR: ./stargate-lib
Loading persistence backend persistence-cassandra-3.11-1.0.18.jar
Installing bundle persistence-cassandra-3.11-1.0.18.jar
Installing bundle animal-sniffer-annotations-1.9.jar
Installing bundle asm-7.1.jar
Installing bundle asm-analysis-7.1.jar
Installing bundle asm-tree-7.1.jar
Installing bundle auth-api-1.0.18.jar
Installing bundle auth-jwt-service-1.0.18.jar
Installing bundle auth-table-based-service-1.0.18.jar
Installing bundle authnz-1.0.18.jar
Installing bundle commons-beanutils-1.9.4.jar
Installing bundle commons-collections-3.2.2.jar
Installing bundle commons-digester-2.1.jar
Installing bundle commons-logging-1.2.jar
Installing bundle commons-validator-1.7.jar
Installing bundle config-store-api-1.0.18.jar
Installing bundle config-store-yaml-1.0.18.jar
Installing bundle core-1.0.18.jar
Installing bundle cql-1.0.18.jar
Installing bundle graphqlapi-1.0.18.jar
Installing bundle health-checker-1.0.18.jar
Installing bundle org.apache.felix.scr-2.1.20.jar
Installing bundle org.apache.felix.scr.ds-annotations-1.2.10.jar
Installing bundle org.apache.felix.scr.generator-1.18.4.jar
Installing bundle org.osgi.compendium-4.2.0.jar
Installing bundle org.osgi.util.function-1.1.0.jar
Installing bundle org.osgi.util.promise-1.1.1.jar
Installing bundle persistence-api-1.0.18.jar
Installing bundle rate-limiting-global-1.0.18.jar
Installing bundle restapi-1.0.18.jar
Starting bundle io.stargate.db.cassandra_3_11
INFO  [main] 2021-06-09 08:56:07,639 BaseActivator.java:92 - Starting persistence-cassandra-3.11 ...
Starting bundle null
Starting bundle org.objectweb.asm
Starting bundle org.objectweb.asm.tree.analysis
Starting bundle org.objectweb.asm.tree
Starting bundle io.stargate.auth.api
INFO  [main] 2021-06-09 08:56:09,938 BaseActivator.java:92 - Starting authApiServer ...
Starting bundle io.stargate.auth.jwt
Starting bundle io.stargate.auth.table
INFO  [main] 2021-06-09 08:56:12,413 BaseActivator.java:92 - Starting authnTableBasedService and authzTableBasedServie ...
Starting bundle io.stargate.auth
Starting bundle org.apache.commons.commons-beanutils
Starting bundle org.apache.commons.collections
Starting bundle org.apache.commons.digester
Starting bundle org.apache.commons.logging
Starting bundle org.apache.commons.commons-validator
Starting bundle io.stargate.config.store.api
Starting bundle io.stargate.config.store.yaml
INFO  [main] 2021-06-09 08:56:14,666 BaseActivator.java:92 - Starting Config Store YAML ...
Starting bundle io.stargate.core
INFO  [main] 2021-06-09 08:56:14,674 BaseActivator.java:92 - Starting core services ...
INFO  [main] 2021-06-09 08:56:14,725 BaseActivator.java:173 - Registering core services as io.stargate.core.metrics.api.Metrics
INFO  [main] 2021-06-09 08:56:18,118 AbstractCassandraPersistence.java:100 - Initializing Apache Cassandra
INFO  [main] 2021-06-09 08:56:18,346 DatabaseDescriptor.java:381 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO  [main] 2021-06-09 08:56:18,349 DatabaseDescriptor.java:439 - Global memtable on-heap threshold is enabled at 61MB
INFO  [main] 2021-06-09 08:56:18,351 DatabaseDescriptor.java:443 - Global memtable off-heap threshold is enabled at 61MB
WARN  [main] 2021-06-09 08:56:19,797 DatabaseDescriptor.java:579 - Only 31.111GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
INFO  [main] 2021-06-09 08:56:19,953 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 2000.
INFO  [main] 2021-06-09 08:56:19,955 DatabaseDescriptor.java:773 - Back-pressure is disabled with strategy null.
INFO  [main] 2021-06-09 08:56:20,138 GossipingPropertyFileSnitch.java:68 - Unable to load cassandra-topology.properties; compatibility mode disabled
INFO  [main] 2021-06-09 08:56:21,510 JMXServerUtils.java:246 - Configured JMX server at: service:jmx:rmi://0.0.0.0/jndi/rmi://0.0.0.0:7199/jmxrmi
INFO  [main] 2021-06-09 08:56:21,545 CassandraDaemon.java:489 - Hostname: k8ssandra-dc1-stargate-76c576f7fc-spbp4
INFO  [main] 2021-06-09 08:56:21,547 CassandraDaemon.java:496 - JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_252
INFO  [main] 2021-06-09 08:56:21,612 CassandraDaemon.java:497 - Heap size: 247.500MiB/247.500MiB
INFO  [main] 2021-06-09 08:56:21,615 CassandraDaemon.java:502 - Code Cache Non-heap memory: init = 2555904(2496K) used = 5547584(5417K) committed = 5570560(5440K) max = 251658240(245760K)
INFO  [main] 2021-06-09 08:56:21,617 CassandraDaemon.java:502 - Metaspace Non-heap memory: init = 0(0K) used = 24982216(24396K) committed = 26214400(25600K) max = -1(-1K)
INFO  [main] 2021-06-09 08:56:21,619 CassandraDaemon.java:502 - Compressed Class Space Non-heap memory: init = 0(0K) used = 3254264(3177K) committed = 3670016(3584K) max = 1073741824(1048576K)
INFO  [main] 2021-06-09 08:56:21,622 CassandraDaemon.java:502 - Eden Space Heap memory: init = 71630848(69952K) used = 71630848(69952K) committed = 71630848(69952K) max = 71630848(69952K)
INFO  [main] 2021-06-09 08:56:21,630 CassandraDaemon.java:502 - Survivor Space Heap memory: init = 8912896(8704K) used = 7103928(6937K) committed = 8912896(8704K) max = 8912896(8704K)
INFO  [main] 2021-06-09 08:56:21,637 CassandraDaemon.java:502 - Tenured Gen Heap memory: init = 178978816(174784K) used = 15222048(14865K) committed = 178978816(174784K) max = 178978816(174784K)
INFO  [main] 2021-06-09 08:56:21,640 CassandraDaemon.java:504 - Classpath: ./stargate-lib/stargate-starter-1.0.18.jar
INFO  [main] 2021-06-09 08:56:21,642 CassandraDaemon.java:506 - JVM Arguments: [-XX:+CrashOnOutOfMemoryError, -Xms256M, -Xmx256M, -Dstargate.libdir=./stargate-lib, -Djava.awt.headless=true]
WARN  [main] 2021-06-09 08:56:22,234 NativeLibrary.java:189 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN  [main] 2021-06-09 08:56:22,236 StartupChecks.java:136 - jemalloc shared library could not be preloaded to speed up memory allocations
INFO  [main] 2021-06-09 08:56:22,237 StartupChecks.java:176 - JMX is enabled to receive remote connections on port: 7199
INFO  [main] 2021-06-09 08:56:22,244 SigarLibrary.java:44 - Initializing SIGAR library
INFO  [main] 2021-06-09 08:56:22,420 SigarLibrary.java:57 - Could not initialize SIGAR library org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem; 
INFO  [main] 2021-06-09 08:56:22,422 SigarLibrary.java:185 - Sigar could not be initialized, test for checking degraded mode omitted.
INFO  [main] 2021-06-09 08:56:23,225 QueryProcessor.java:116 - Initialized prepared statement caches with 10 MB (native) and 10 MB (Thrift)
INFO  [main] 2021-06-09 08:56:27,813 ColumnFamilyStore.java:427 - Initializing system.IndexInfo
INFO  [main] 2021-06-09 08:56:36,754 ColumnFamilyStore.java:427 - Initializing system.batches
INFO  [main] 2021-06-09 08:56:37,046 ColumnFamilyStore.java:427 - Initializing system.paxos
INFO  [main] 2021-06-09 08:56:37,349 ColumnFamilyStore.java:427 - Initializing system.local
INFO  [main] 2021-06-09 08:56:37,717 ColumnFamilyStore.java:427 - Initializing system.peers
INFO  [main] 2021-06-09 08:56:38,050 ColumnFamilyStore.java:427 - Initializing system.peer_events
INFO  [main] 2021-06-09 08:56:38,519 ColumnFamilyStore.java:427 - Initializing system.range_xfers
INFO  [main] 2021-06-09 08:56:38,915 ColumnFamilyStore.java:427 - Initializing system.compaction_history
INFO  [main] 2021-06-09 08:56:39,617 ColumnFamilyStore.java:427 - Initializing system.sstable_activity
INFO  [main] 2021-06-09 08:56:40,140 ColumnFamilyStore.java:427 - Initializing system.size_estimates
INFO  [main] 2021-06-09 08:56:40,712 ColumnFamilyStore.java:427 - Initializing system.available_ranges
INFO  [main] 2021-06-09 08:56:41,158 ColumnFamilyStore.java:427 - Initializing system.transferred_ranges
INFO  [main] 2021-06-09 08:56:41,744 ColumnFamilyStore.java:427 - Initializing system.views_builds_in_progress
INFO  [main] 2021-06-09 08:56:42,413 ColumnFamilyStore.java:427 - Initializing system.built_views
INFO  [main] 2021-06-09 08:56:42,750 ColumnFamilyStore.java:427 - Initializing system.hints
INFO  [main] 2021-06-09 08:56:43,146 ColumnFamilyStore.java:427 - Initializing system.batchlog
INFO  [main] 2021-06-09 08:56:43,728 ColumnFamilyStore.java:427 - Initializing system.prepared_statements
INFO  [main] 2021-06-09 08:56:44,521 ColumnFamilyStore.java:427 - Initializing system.schema_keyspaces
INFO  [main] 2021-06-09 08:56:44,946 ColumnFamilyStore.java:427 - Initializing system.schema_columnfamilies
INFO  [main] 2021-06-09 08:56:45,176 ColumnFamilyStore.java:427 - Initializing system.schema_columns
INFO  [main] 2021-06-09 08:56:45,551 ColumnFamilyStore.java:427 - Initializing system.schema_triggers
INFO  [main] 2021-06-09 08:56:45,846 ColumnFamilyStore.java:427 - Initializing system.schema_usertypes
INFO  [main] 2021-06-09 08:56:46,434 ColumnFamilyStore.java:427 - Initializing system.schema_functions
INFO  [main] 2021-06-09 08:56:46,720 ColumnFamilyStore.java:427 - Initializing system.schema_aggregates
INFO  [main] 2021-06-09 08:56:46,726 ViewManager.java:137 - Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO  [main] 2021-06-09 08:56:46,743 ClientState.java:102 - Using io.stargate.db.cassandra.impl.StargateQueryHandler as query handler for native protocol queries (as requested with -Dcassandra.custom_query_handler_class)
INFO  [main] 2021-06-09 08:56:47,112 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds
INFO  [main] 2021-06-09 08:56:47,641 ColumnFamilyStore.java:427 - Initializing system_schema.keyspaces
INFO  [main] 2021-06-09 08:56:48,136 ColumnFamilyStore.java:427 - Initializing system_schema.tables
INFO  [main] 2021-06-09 08:56:49,094 ColumnFamilyStore.java:427 - Initializing system_schema.columns
INFO  [main] 2021-06-09 08:56:49,633 ColumnFamilyStore.java:427 - Initializing system_schema.triggers
INFO  [main] 2021-06-09 08:56:49,978 ColumnFamilyStore.java:427 - Initializing system_schema.dropped_columns
INFO  [main] 2021-06-09 08:56:50,415 ColumnFamilyStore.java:427 - Initializing system_schema.views
INFO  [main] 2021-06-09 08:56:50,753 ColumnFamilyStore.java:427 - Initializing system_schema.types
INFO  [main] 2021-06-09 08:56:51,150 ColumnFamilyStore.java:427 - Initializing system_schema.functions
INFO  [main] 2021-06-09 08:56:51,554 ColumnFamilyStore.java:427 - Initializing system_schema.aggregates
INFO  [main] 2021-06-09 08:56:51,927 ColumnFamilyStore.java:427 - Initializing system_schema.indexes
INFO  [main] 2021-06-09 08:56:51,937 ViewManager.java:137 - Not submitting build tasks for views in keyspace system_schema as storage service is not initialized
INFO  [MemtableFlushWriter:1] 2021-06-09 08:56:54,946 CacheService.java:100 - Initializing key cache with capacity of 12 MBs.
INFO  [MemtableFlushWriter:1] 2021-06-09 08:56:55,008 CacheService.java:122 - Initializing row cache with capacity of 0 MBs
INFO  [MemtableFlushWriter:1] 2021-06-09 08:56:55,014 CacheService.java:151 - Initializing counter cache with capacity of 6 MBs
INFO  [MemtableFlushWriter:1] 2021-06-09 08:56:55,017 CacheService.java:162 - Scheduling counter cache save to every 7200 seconds (going to save all keys).
INFO  [CompactionExecutor:2] 2021-06-09 08:56:56,113 BufferPool.java:234 - Global buffer pool is enabled, when pool is exhausted (max is 61.000MiB) it will allocate on heap
INFO  [main] 2021-06-09 08:56:56,534 StorageService.java:639 - Populating token metadata from system tables
INFO  [main] 2021-06-09 08:56:57,014 StorageService.java:646 - Token metadata: 
INFO  [pool-9-thread-1] 2021-06-09 08:56:57,814 AutoSavingCache.java:174 - Completed loading (74 ms; 8 keys) KeyCache cache
INFO  [main] 2021-06-09 08:56:58,119 CommitLog.java:142 - No commitlog files found; skipping replay
INFO  [main] 2021-06-09 08:56:58,132 StorageService.java:639 - Populating token metadata from system tables
INFO  [main] 2021-06-09 08:56:58,333 StorageService.java:646 - Token metadata: 
INFO  [main] 2021-06-09 08:57:00,218 QueryProcessor.java:163 - Preloaded 0 prepared statements
INFO  [main] 2021-06-09 08:57:00,222 StorageService.java:657 - Cassandra version: 3.11.6
INFO  [main] 2021-06-09 08:57:00,224 StorageService.java:658 - Thrift API version: 20.1.0
INFO  [main] 2021-06-09 08:57:00,226 StorageService.java:659 - CQL supported versions: 3.4.4 (default: 3.4.4)
INFO  [main] 2021-06-09 08:57:00,233 StorageService.java:661 - Native protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)
INFO  [main] 2021-06-09 08:57:00,740 IndexSummaryManager.java:87 - Initializing index summary manager with a memory pool size of 12 MB and a resize interval of 60 minutes
INFO  [main] 2021-06-09 08:57:00,844 MessagingService.java:750 - Starting Messaging Service on /10.96.223.164:7000 (eth0)
WARN  [main] 2021-06-09 08:57:00,941 SystemKeyspace.java:1130 - No host ID found, created e2dc618e-8884-47d7-ad1d-513cb20bf97c (Note: This should happen exactly once per node).
INFO  [main] 2021-06-09 08:57:01,218 OutboundTcpConnection.java:108 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [HANDSHAKE-k8ssandra-seed-service.default.svc.cluster.local/10.96.223.171] 2021-06-09 08:57:01,321 OutboundTcpConnection.java:561 - Handshaking version with k8ssandra-seed-service.default.svc.cluster.local/10.96.223.171
INFO  [main] 2021-06-09 08:57:02,280 StorageService.java:743 - Loading persisted ring state
INFO  [main] 2021-06-09 08:57:02,286 StorageService.java:871 - Starting up server gossip
INFO  [MigrationStage:1] 2021-06-09 08:57:05,915 ViewManager.java:137 - Not submitting build tasks for views in keyspace system_auth as storage service is not initialized
INFO  [MigrationStage:1] 2021-06-09 08:57:07,160 ColumnFamilyStore.java:427 - Initializing system_auth.resource_role_permissons_index
INFO  [MigrationStage:1] 2021-06-09 08:57:08,752 ColumnFamilyStore.java:427 - Initializing system_auth.role_members
INFO  [MigrationStage:1] 2021-06-09 08:57:09,918 ColumnFamilyStore.java:427 - Initializing system_auth.role_permissions
INFO  [MigrationStage:1] 2021-06-09 08:57:11,133 ColumnFamilyStore.java:427 - Initializing system_auth.roles
INFO  [main] 2021-06-09 08:57:11,609 AuthCache.java:177 - (Re)initializing CredentialsCache (validity period/update interval/max entries) (2000/2000/1000)
INFO  [main] 2021-06-09 08:57:11,619 StorageService.java:733 - Not joining ring as requested. Use JMX (StorageService->joinRing()) to initiate ring joining
INFO  [StorageServiceShutdownHook] 2021-06-09 08:57:11,631 HintsService.java:209 - Paused hints dispatch
INFO  [main] 2021-06-09 08:57:11,633 Gossiper.java:1780 - Waiting for gossip to settle...
WARN  [StorageServiceShutdownHook] 2021-06-09 08:57:11,634 Gossiper.java:1655 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [StorageServiceShutdownHook] 2021-06-09 08:57:11,638 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.96.223.164] 2021-06-09 08:57:11,712 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [StorageServiceShutdownHook] 2021-06-09 08:57:12,165 HintsService.java:209 - Paused hints dispatch

It’s not readily obvious to me what the problem is. Could you please post the following?

$ kubetctl get nodes
$ kubectl get pods

And a kubectl describe on the Reaper pod please?

Probably also worth a look at the C* logs as well, doesn’t seem to be anything obvious from the Stargate log, unfortunately.

Hi @Hanno_Scharwaechter,

could you also post the Helm values file that you used for your K8ssandra install?

Hi guys,

thank you so much for your quick responses. I am happy to provide required logs.

reaper description:

Name:         k8ssandra-reaper-7ffb485bb8-ggvl7
Namespace:    default
Priority:     0
Node:         ubuntu-focal/10.0.2.15
Start Time:   Tue, 08 Jun 2021 13:51:46 +0000
Labels:       app.kubernetes.io/managed-by=reaper-operator
              pod-template-hash=7ffb485bb8
              reaper.cassandra-reaper.io/reaper=k8ssandra-reaper
Annotations:  cni.projectcalico.org/podIP: 10.96.223.166/32
              cni.projectcalico.org/podIPs: 10.96.223.166/32
Status:       Running
IP:           10.96.223.166
IPs:
  IP:           10.96.223.166
Controlled By:  ReplicaSet/k8ssandra-reaper-7ffb485bb8
Containers:
  reaper:
    Container ID:   docker://534f1ee660cbc87540d0a4e2ac1c5f5e10ca9ac994144c5cb18427d9851e705d
    Image:          docker.io/thelastpickle/cassandra-reaper:2.2.2
    Image ID:       docker-pullable://thelastpickle/cassandra-reaper@sha256:752f041e7c933602b052e8c319aef8c9770cd8073a739b7d35233cd230c3eb62
    Ports:          8080/TCP, 8081/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Wed, 09 Jun 2021 09:01:41 +0000
      Finished:     Wed, 09 Jun 2021 09:03:09 +0000
    Ready:          False
    Restart Count:  40
    Liveness:       http-get http://:8081/healthcheck delay=45s timeout=1s period=15s #success=1 #failure=3
    Readiness:      http-get http://:8081/healthcheck delay=45s timeout=1s period=15s #success=1 #failure=3
    Environment:
      REAPER_STORAGE_TYPE:              cassandra
      REAPER_ENABLE_DYNAMIC_SEED_LIST:  false
      REAPER_CASS_CONTACT_POINTS:       [k8ssandra-dc1-service]
      REAPER_AUTH_ENABLED:              false
      REAPER_JMX_AUTH_USERNAME:         <set to the key 'username' in secret 'k8ssandra-reaper-jmx'>  Optional: false
      REAPER_JMX_AUTH_PASSWORD:         <set to the key 'password' in secret 'k8ssandra-reaper-jmx'>  Optional: false
      REAPER_CASS_AUTH_USERNAME:        <set to the key 'username' in secret 'k8ssandra-reaper'>      Optional: false
      REAPER_CASS_AUTH_PASSWORD:        <set to the key 'password' in secret 'k8ssandra-reaper'>      Optional: false
      REAPER_CASS_AUTH_ENABLED:         true
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ngzbk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-ngzbk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From     Message
  ----     ------     ----                   ----     -------
  Warning  BackOff    5m49s (x252 over 87m)  kubelet  Back-off restarting failed container
  Warning  Unhealthy  48s (x80 over 94m)     kubelet  Liveness probe failed: Get "http://10.96.223.166:8081/healthcheck": dial tcp 10.96.223.166:8081: connect: connection refused

reaper logs:

INFO   [2021-06-09 09:01:56,462] [main] i.d.s.DefaultServerFactory - Registering jersey handler with root path prefix: / 
INFO   [2021-06-09 09:01:56,476] [main] i.d.s.DefaultServerFactory - Registering admin handler with root path prefix: / 
INFO   [2021-06-09 09:01:56,477] [main] i.d.a.AssetsBundle - Registering AssetBundle with name: assets for path /webui/* 
INFO   [2021-06-09 09:01:56,793] [main] i.c.ReaperApplication - initializing runner thread pool with 15 threads and 2 repair runners 
INFO   [2021-06-09 09:01:56,793] [main] i.c.ReaperApplication - initializing storage of type: cassandra 
INFO   [2021-06-09 09:01:57,224] [main] c.d.d.core - DataStax Java driver 3.10.1 for Apache Cassandra 
INFO   [2021-06-09 09:01:57,257] [main] c.d.d.c.GuavaCompatibility - Detected Guava >= 19 in the classpath, using modern compatibility layer 
INFO   [2021-06-09 09:01:59,148] [main] c.d.d.c.ClockFactory - Using native clock to generate timestamps. 
INFO   [2021-06-09 09:02:00,125] [main] c.d.d.c.NettyUtil - Did not find Netty's native epoll transport in the classpath, defaulting to NIO. 
INFO   [2021-06-09 09:02:03,142] [main] c.d.d.c.p.DCAwareRoundRobinPolicy - Using data-center name 'dc1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor) 
INFO   [2021-06-09 09:02:03,152] [main] c.d.d.c.Cluster - New Cassandra host k8ssandra-dc1-service/10.96.223.171:9042 added 
INFO   [2021-06-09 09:02:05,261] [main] o.c.c.m.MigrationRepository - Found 12 migration scripts 
WARN   [2021-06-09 09:02:05,262] [main] i.c.s.CassandraStorage - Starting db migration from 21 to 27… 
INFO   [2021-06-09 09:02:06,130] [main] o.c.c.m.MigrationRepository - Found 12 migration scripts 
WARN   [2021-06-09 09:02:08,028] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='insert into schema_migration(applied_successful, version, script_name, script, executed_at) values(?, ?, ?, ?, ?)' 
WARN   [2021-06-09 09:02:08,040] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='INSERT INTO schema_migration_leader (keyspace_name, leader, took_lead_at, leader_hostname) VALUES (?, ?, dateOf(now()), ?) IF NOT EXISTS USING TTL 300' 
WARN   [2021-06-09 09:02:08,114] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='DELETE FROM schema_migration_leader where keyspace_name = ? IF leader = ?' 
ERROR  [2021-06-09 09:02:09,342] [main] i.c.ReaperApplication - Storage is not ready yet, trying again to connect shortly... 
org.cognitor.cassandra.migration.MigrationException: Error during migration of script 022_cluster_states.cql while executing 'ALTER TABLE cluster ADD state text;'
	at org.cognitor.cassandra.migration.Database.execute(Database.java:269)
	at java.util.Collections$SingletonList.forEach(Collections.java:4824)
	at org.cognitor.cassandra.migration.MigrationTask.migrate(MigrationTask.java:68)
	at io.cassandrareaper.storage.CassandraStorage.migrate(CassandraStorage.java:362)
	at io.cassandrareaper.storage.CassandraStorage.initializeCassandraSchema(CassandraStorage.java:293)
	at io.cassandrareaper.storage.CassandraStorage.initializeAndUpgradeSchema(CassandraStorage.java:250)
	at io.cassandrareaper.storage.CassandraStorage.<init>(CassandraStorage.java:238)
	at io.cassandrareaper.ReaperApplication.initializeStorage(ReaperApplication.java:480)
	at io.cassandrareaper.ReaperApplication.tryInitializeStorage(ReaperApplication.java:303)
	at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:181)
	at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:98)
	at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:43)
	at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:87)
	at io.dropwizard.cli.Cli.run(Cli.java:78)
	at io.dropwizard.Application.run(Application.java:93)
	at io.cassandrareaper.ReaperApplication.main(ReaperApplication.java:117)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid column name state because it conflicts with an existing column
	at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
	at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
	at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
	at org.cognitor.cassandra.migration.Database.executeStatement(Database.java:277)
	at org.cognitor.cassandra.migration.Database.execute(Database.java:261)
	... 15 common frames omitted
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid column name state because it conflicts with an existing column
	at com.datastax.driver.core.Responses$Error.asException(Responses.java:181)
	at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:215)
	at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
	at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:61)
	at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:1011)
	at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:814)
	at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1287)
	at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1205)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at com.datastax.driver.core.InboundTrafficMeter.channelRead(InboundTrafficMeter.java:38)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1304)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:921)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:135)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)```

**k8ssandra.yaml:**

cassandra:

version: “3.11.10”

cassandraLibDirVolume:

storageClass: local-path

size: 5Gi

allowMultipleNodesPerWorker: true

heap:

size: 500M

newGenSize: 200M

resources:

requests:

  cpu: 1000m

  memory: 2Gi

limits:

  cpu: 1000m

  memory: 2Gi

datacenters:

  • name: dc1

    size: 1

    racks:

    • name: default

kube-prometheus-stack:

grafana:

adminUser: admin

adminPassword: admin123 

stargate:

enabled: true

replicas: 1

heapMB: 256

cpuReqMillicores: 200

cpuLimMillicores: 1000

And here remaining

get nodes:

NAME STATUS ROLES AGE VERSION
ubuntu-focal Ready control-plane,master 24h v1.21.1

get pods coming…

I suspect there could be some schema disagreements due to Stargate starting at the same time Reaper performs the schema migration.

Could you uninstall your Helm release and redeploy K8ssandra with 0 Stargate replica, and once everything is up and running (including Reaper), modify your Helm values to add 1 Stargate replica and upgrade your release? helm upgrade <release-name> k8ssandra/k8ssandra -f <values-file.yaml>

1 Like

Hi Alex,

followed your proposal and started k8ssandra without stargate. Unti now, it looks very good. All pods (except stargate) are up. I am surpriswd how fast this went up. I’ll watch it while and then add stargate. I’ll let you know soon

Hi Alex,
seems you were right regarding the stargate. I upgraded now the installation. Unforunately, stargate and reaper are still restarting.
Here is the “get pods” result:

NAME                                                READY   STATUS             RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
dnsutils                                            1/1     Running            4          24h   10.96.223.174   ubuntu-focal   <none>           <none>
k8ssandra-cass-operator-65fd67b6d8-xqscr            1/1     Running            0          23m   10.96.223.191   ubuntu-focal   <none>           <none>
k8ssandra-crd-upgrader-job-k8ssandra-z7nsk          0/1     Completed          0          14m   10.96.223.131   ubuntu-focal   <none>           <none>
k8ssandra-dc1-default-sts-0                         2/2     Running            0          23m   10.96.223.132   ubuntu-focal   <none>           <none>
k8ssandra-dc1-stargate-76c576f7fc-2cx5p             0/1     CrashLoopBackOff   6          14m   10.96.223.139   ubuntu-focal   <none>           <none>
k8ssandra-grafana-584dfb486f-27s65                  2/2     Running            0          23m   10.96.223.190   ubuntu-focal   <none>           <none>
k8ssandra-kube-prometheus-admission-patch-p5m2h     0/1     Completed          5          14m   10.96.223.141   ubuntu-focal   <none>           <none>
k8ssandra-kube-prometheus-operator-85695ffb-cwkrw   1/1     Running            0          23m   10.96.223.134   ubuntu-focal   <none>           <none>
k8ssandra-reaper-7ffb485bb8-b6c27                   0/1     CrashLoopBackOff   7          15m   10.96.223.129   ubuntu-focal   <none>           <none>
k8ssandra-reaper-operator-b67dc8cdf-kgfhs           1/1     Running            0          23m   10.96.223.189   ubuntu-focal   <none>           <none>
prometheus-k8ssandra-kube-prometheus-prometheus-0   2/2     Running            1          23m   10.96.223.142   ubuntu-focal   <none>           <none>

The log of stargate looks pretty much the same. For me it seems, the process is somehow terminated. No clue…

I believe exit code 143 can be related to memory issues. Can you run a dmesg -T on the ubuntu node? Another option is to change the Stargate heap from 256 to 512.

Hey guys,
I tried to start stargate with 512M heap, but without effect.
Summarized, starting k8ssandra without stargate works like cheers. At least from what I can observe from “get pods”.
At the moment I start stargate, reaper and stargate are in a CrashBackLoop.
I did a dmesg. It is lots of stuff (I’ve cut a bit from the top). I wasn’t able to identify anything there. Maybe it helps you.

dmesg:

[    6.094705] *** VALIDATE bpf ***
[    6.098713] VFS: Disk quotas dquot_6.6.0
[    6.103025] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    6.109532] *** VALIDATE ramfs ***
[    6.112796] *** VALIDATE hugetlbfs ***
[    6.117554] AppArmor: AppArmor Filesystem Enabled
[    6.122662] pnp: PnP ACPI init
[    6.126207] pnp 00:00: Plug and Play ACPI device, IDs PNP0303 (active)
[    6.126523] pnp 00:01: Plug and Play ACPI device, IDs PNP0f03 (active)
[    6.128346] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
[    6.131414] pnp: PnP ACPI: found 3 devices
[    6.169490] thermal_sys: Registered thermal governor 'fair_share'
[    6.169493] thermal_sys: Registered thermal governor 'bang_bang'
[    6.174850] thermal_sys: Registered thermal governor 'step_wise'
[    6.180212] thermal_sys: Registered thermal governor 'user_space'
[    6.185660] thermal_sys: Registered thermal governor 'power_allocator'
[    6.196054] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    6.210281] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
[    6.215543] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff window]
[    6.221209] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[    6.227192] pci_bus 0000:00: resource 7 [mem 0xe0000000-0xfdffffff window]
[    6.235982] NET: Registered protocol family 2
[    6.254422] tcp_listen_portaddr_hash hash table entries: 8192 (order: 5, 131072 bytes, linear)
[    6.264139] TCP established hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    6.276411] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear)
[    6.291405] TCP: Hash tables configured (established 131072 bind 65536)
[    6.297341] UDP hash table entries: 8192 (order: 6, 262144 bytes, linear)
[    6.303588] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes, linear)
[    6.310261] NET: Registered protocol family 1
[    6.316422] NET: Registered protocol family 44
[    6.320340] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    6.326259] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    6.332628] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    6.340378] PCI: CLS 0 bytes, default 64
[    6.343894] Trying to unpack rootfs image as initramfs...
[    7.068505] Freeing initrd memory: 26724K
[    7.074614] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    7.083976] software IO TLB: mapped [mem 0xdbff0000-0xdfff0000] (64MB)
[    7.092425] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x396d5809302, max_idle_ns: 881590667022 ns
[    7.106404] clocksource: Switched to clocksource tsc
[    7.112344] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    7.123163] check: Scanning for low memory corruption every 60 seconds
[    7.134594] Initialise system trusted keyrings
[    7.141406] Key type blacklist registered
[    7.147939] workingset: timestamp_bits=36 max_order=22 bucket_order=0
[    7.164569] zbud: loaded
[    7.171102] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    7.181066] fuse: init (API version 7.31)
[    7.187047] *** VALIDATE fuse ***
[    7.191550] *** VALIDATE fuse ***
[    7.196882] Platform Keyring initialized
[    7.212457] Key type asymmetric registered
[    7.217206] Asymmetric key parser 'x509' registered
[    7.222837] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 244)
[    7.231729] io scheduler mq-deadline registered
[    7.240324] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    7.249845] intel_idle: Please enable MWAIT in BIOS SETUP
[    7.250409] ACPI: AC Adapter [AC] (off-line)
[    7.255330] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    7.263570] ACPI: Power Button [PWRF]
[    7.268201] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
[    7.276460] ACPI: Sleep Button [SLPF]
[    7.284422] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    7.290624] battery: ACPI: Battery Slot [BAT0] (battery present)
[    7.325015] 00:02: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    7.353212] Linux agpgart interface v0.103
[    7.374968] loop: module loaded
[    7.379705] ata_piix 0000:00:01.1: version 2.13
[    7.382353] scsi host0: ata_piix
[    7.387922] scsi host1: ata_piix
[    7.391547] ata1: PATA max UDMA/33 cmd 0x1f0 ctl 0x3f6 bmdma 0xd000 irq 14
[    7.399922] ata2: PATA max UDMA/33 cmd 0x170 ctl 0x376 bmdma 0xd008 irq 15
[    7.410121] libphy: Fixed MDIO Bus: probed
[    7.415549] tun: Universal TUN/TAP device driver, 1.6
[    7.421804] PPP generic driver version 2.4.2
[    7.427629] VFIO - User Level meta-driver version: 0.3
[    7.434815] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    7.441961] ehci-pci: EHCI PCI platform driver
[    7.446732] ehci-platform: EHCI generic platform driver
[    7.452475] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    7.459302] ohci-pci: OHCI PCI platform driver
[    7.464456] ohci-platform: OHCI generic platform driver
[    7.470469] uhci_hcd: USB Universal Host Controller Interface driver
[    7.478206] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
[    7.492294] serio: i8042 KBD port at 0x60,0x64 irq 1
[    7.498376] serio: i8042 AUX port at 0x60,0x64 irq 12
[    7.505112] mousedev: PS/2 mouse device common for all mice
[    7.516636] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[    7.525742] rtc_cmos rtc_cmos: registered as rtc0
[    7.525816] rtc_cmos rtc_cmos: alarms up to one day, 114 bytes nvram
[    7.543631] i2c /dev entries driver
[    7.547784] device-mapper: uevent: version 1.0.3
[    7.553599] device-mapper: ioctl: 4.41.0-ioctl (2019-09-16) initialised: dm-devel@redhat.com
[    7.561840] platform eisa.0: Probing EISA bus 0
[    7.566055] platform eisa.0: EISA: Cannot allocate resource for mainboard
[    7.572198] platform eisa.0: Cannot allocate resource for EISA slot 1
[    7.579587] platform eisa.0: Cannot allocate resource for EISA slot 2
[    7.587707] platform eisa.0: Cannot allocate resource for EISA slot 3
[    7.595298] platform eisa.0: Cannot allocate resource for EISA slot 4
[    7.603119] platform eisa.0: Cannot allocate resource for EISA slot 5
[    7.610194] platform eisa.0: Cannot allocate resource for EISA slot 6
[    7.618310] platform eisa.0: Cannot allocate resource for EISA slot 7
[    7.625790] platform eisa.0: Cannot allocate resource for EISA slot 8
[    7.632425] platform eisa.0: EISA: Detected 0 cards
[    7.639034] intel_pstate: CPU model not supported
[    7.647268] ledtrig-cpu: registered to indicate activity on CPUs
[    7.654395] intel_pmc_core intel_pmc_core.0:  initialized
[    7.661736] drop_monitor: Initializing network drop monitor service
[    7.669535] NET: Registered protocol family 10
[    7.713984] Segment Routing with IPv6
[    7.718848] NET: Registered protocol family 17
[    7.725172] Key type dns_resolver registered
[    7.735392] RAS: Correctable Errors collector initialized.
[    7.741817] IPI shorthand broadcast: enabled
[    7.746728] sched_clock: Marking stable (6933480893, 813127102)->(7983741226, -237133231)
[    7.757543] registered taskstats version 1
[    7.762781] Loading compiled-in X.509 certificates
[    7.772425] Loaded X.509 cert 'Build time autogenerated kernel key: 514fdcf1b3e83a75fa120b4c37fbaecef65ad977'
[    7.785554] Loaded X.509 cert 'Canonical Ltd. Live Patch Signing: 14df34d1a87cf37625abec039ef2bf521249b969'
[    7.797891] Loaded X.509 cert 'Canonical Ltd. Kernel Module Signing: 88f752e560a1e0737e31163a466ad7b70a850c19'
[    7.807816] zswap: loaded using pool lzo/zbud
[    7.813874] Key type ._fscrypt registered
[    7.818997] Key type .fscrypt registered
[    7.863146] Key type big_key registered
[    7.923157] Key type encrypted registered
[    7.933211] AppArmor: AppArmor sha1 policy hashing enabled
[    7.940189] ima: No TPM chip found, activating TPM-bypass!
[    7.951920] ima: Allocated hash algorithm: sha1
[    7.959182] ima: No architecture policies found
[    7.966726] evm: Initialising EVM extended attributes:
[    7.974822] evm: security.selinux
[    7.980029] evm: security.SMACK64
[    7.985079] evm: security.SMACK64EXEC
[    7.990815] evm: security.SMACK64TRANSMUTE
[    7.996833] evm: security.SMACK64MMAP
[    8.003911] evm: security.apparmor
[    8.009501] evm: security.ima
[    8.014693] evm: security.capability
[    8.020617] evm: HMAC attrs: 0x1
[    8.028594] PM:   Magic number: 9:737:735
[    8.036252] rtc_cmos rtc_cmos: setting system clock to 2021-06-09T13:42:57 UTC (1623246177)
[    8.059979] Freeing unused decrypted memory: 2040K
[    8.071077] Freeing unused kernel image memory: 2732K
[    8.092049] Write protecting the kernel read-only data: 22528k
[    8.108840] Freeing unused kernel image memory: 2008K
[    8.128042] Freeing unused kernel image memory: 1136K
[    8.206010] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    8.212302] x86/mm: Checking user space page tables
[    8.270788] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    8.277646] Run /init as init process
[    8.705625] Fusion MPT base driver 3.04.20
[    8.711473] Copyright (c) 1999-2008 LSI Corporation
[    8.723881] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[    8.750981] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    8.789251] cryptd: max_cpu_qlen set to 1000
[    8.795290] Fusion MPT SPI Host driver 3.04.20
[    8.850365] mptbase: ioc0: Initiating bringup
[    8.867378] AVX2 version of gcm_enc/dec engaged.
[    8.886414] AES CTR mode by8 optimization enabled
[    8.965653] ioc0: LSI53C1030 A0: Capabilities={Initiator}
[    9.026160] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input4
[    9.264507] scsi host2: ioc0: LSI53C1030 A0, FwRev=00000000h, Ports=1, MaxQ=256, IRQ=20
[    9.445616] scsi 2:0:0:0: Direct-Access     VBOX     HARDDISK         1.0  PQ: 0 ANSI: 5
[    9.575916] scsi target2:0:0: Beginning Domain Validation
[    9.588658] scsi target2:0:0: Domain Validation skipping write tests
[    9.595330] scsi target2:0:0: Ending Domain Validation
[    9.601852] scsi target2:0:0: asynchronous
[    9.609439] scsi 2:0:1:0: Direct-Access     VBOX     HARDDISK         1.0  PQ: 0 ANSI: 5
[   10.223914] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 02:54:0e:20:46:53
[   10.230759] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
[   10.410586] scsi target2:0:1: Beginning Domain Validation
[   10.426706] scsi target2:0:1: Domain Validation skipping write tests
[   10.436194] scsi target2:0:1: Ending Domain Validation
[   10.443681] scsi target2:0:1: asynchronous
[   10.472088] sd 2:0:0:0: Attached scsi generic sg0 type 0
[   10.473816] sd 2:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
[   10.493650] sd 2:0:0:0: [sda] Write Protect is off
[   10.496899] sd 2:0:1:0: [sdb] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB)
[   10.500865] sd 2:0:0:0: [sda] Mode Sense: 04 00 10 00
[   10.510524] sd 2:0:1:0: Attached scsi generic sg1 type 0
[   10.510919] sd 2:0:0:0: [sda] Incomplete mode parameter data
[   10.518063] sd 2:0:1:0: [sdb] Write Protect is off
[   10.524499] sd 2:0:0:0: [sda] Assuming drive cache: write through
[   10.530269] sd 2:0:1:0: [sdb] Mode Sense: 04 00 10 00
[   10.538991] sd 2:0:1:0: [sdb] Incomplete mode parameter data
[   10.546274] sd 2:0:1:0: [sdb] Assuming drive cache: write through
[   10.773310]  sda: sda1
[   10.782766] sd 2:0:0:0: [sda] Attached SCSI disk
[   10.881431] sd 2:0:1:0: [sdb] Attached SCSI disk
[   11.332017] e1000 0000:00:08.0 eth1: (PCI:33MHz:32-bit) 08:00:27:3d:63:a0
[   11.338520] e1000 0000:00:08.0 eth1: Intel(R) PRO/1000 Network Connection
[   11.350552] e1000 0000:00:03.0 enp0s3: renamed from eth0
[   11.374129] e1000 0000:00:08.0 enp0s8: renamed from eth1
[   13.480628] raid6: avx2x4   gen()  4588 MB/s
[   13.532539] raid6: avx2x4   xor()  2784 MB/s
[   13.584349] raid6: avx2x2   gen()  3963 MB/s
[   13.636103] raid6: avx2x2   xor()  2856 MB/s
[   13.687792] raid6: avx2x1   gen()  3972 MB/s
[   13.740463] raid6: avx2x1   xor()  3065 MB/s
[   13.791866] raid6: sse2x4   gen()  2036 MB/s
[   13.845135] raid6: sse2x4   xor()  1405 MB/s
[   13.895847] raid6: sse2x2   gen()  1992 MB/s
[   13.948058] raid6: sse2x2   xor()  1528 MB/s
[   14.000028] raid6: sse2x1   gen()  1976 MB/s
[   14.052135] raid6: sse2x1   xor()  1132 MB/s
[   14.058563] raid6: using algorithm avx2x4 gen() 4588 MB/s
[   14.064541] raid6: .... xor() 2784 MB/s, rmw enabled
[   14.070200] raid6: using avx2x2 recovery algorithm
[   14.082012] xor: automatically using best checksumming function   avx       
[   14.094328] async_tx: api initialized (async)
[   14.386974] Btrfs loaded, crc32c=crc32c-intel
[   14.519409] EXT4-fs (sda1): INFO: recovery required on readonly filesystem
[   14.527781] EXT4-fs (sda1): write access will be enabled during recovery
[   14.692648] EXT4-fs (sda1): recovery complete
[   14.698055] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[   15.325750] systemd[1]: Inserted module 'autofs4'
[   15.374393] systemd[1]: systemd 245.4-4ubuntu3.6 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[   15.393276] systemd[1]: Detected virtualization oracle.
[   15.397929] systemd[1]: Detected architecture x86-64.
[   15.433857] systemd[1]: Set hostname to <ubuntu-focal>.
[   16.324167] systemd[1]: Configuration file /lib/systemd/system/kubelet.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[   16.347574] systemd[1]: Configuration file /etc/systemd/system/kubelet.service.d/10-kubeadm.conf is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[   16.407873] systemd[1]: /lib/systemd/system/docker.service:48: Unknown key name 'LimitMEMLOCK' in section 'Install', ignoring.
[   16.669053] systemd[1]: Created slice system-modprobe.slice.
[   16.678460] systemd[1]: Created slice system-serial\x2dgetty.slice.
[   16.687611] systemd[1]: Created slice User and Session Slice.
[   16.704171] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[   16.714575] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[   16.725557] systemd[1]: Reached target User and Group Name Lookups.
[   16.734009] systemd[1]: Reached target Slices.
[   16.740431] systemd[1]: Reached target Swap.
[   16.746230] systemd[1]: Reached target System Time Set.
[   16.753366] systemd[1]: Reached target System Time Synchronized.
[   16.762489] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[   16.771968] systemd[1]: Listening on LVM2 poll daemon socket.
[   16.780210] systemd[1]: Listening on multipathd control socket.
[   16.788817] systemd[1]: Listening on Syslog Socket.
[   16.805401] systemd[1]: Listening on fsck to fsckd communication Socket.
[   16.815483] systemd[1]: Listening on initctl Compatibility Named Pipe.
[   16.825023] systemd[1]: Listening on Journal Audit Socket.
[   16.845714] systemd[1]: Listening on Journal Socket (/dev/log).
[   16.854330] systemd[1]: Listening on Journal Socket.
[   16.862232] systemd[1]: Listening on Network Service Netlink Socket.
[   16.870607] systemd[1]: Listening on udev Control Socket.
[   16.889338] systemd[1]: Listening on udev Kernel Socket.
[   16.901510] systemd[1]: Mounting Huge Pages File System...
[   16.925430] systemd[1]: Mounting POSIX Message Queue File System...
[   16.941286] systemd[1]: Mounting Kernel Debug File System...
[   16.956304] systemd[1]: Mounting Kernel Trace File System...
[   16.989580] systemd[1]: Starting Journal Service...
[   17.004880] systemd[1]: Starting Set the console keyboard layout...
[   17.021406] systemd[1]: Starting Create list of static device nodes for the current kernel...
[   17.049850] systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
[   17.071520] systemd[1]: Starting Load Kernel Module drm...
[   17.093068] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
[   17.110995] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[   17.127185] systemd[1]: Starting File System Check on Root Device...
[   17.158801] systemd[1]: Starting Load Kernel Modules...
[   17.186812] systemd[1]: Starting udev Coldplug all Devices...
[   17.254848] systemd[1]: Starting Uncomplicated firewall...
[   17.291308] systemd[1]: Mounted Huge Pages File System.
[   17.303694] systemd[1]: Mounted POSIX Message Queue File System.
[   17.315073] systemd[1]: Mounted Kernel Debug File System.
[   17.337251] systemd[1]: Mounted Kernel Trace File System.
[   17.350165] systemd[1]: Finished Create list of static device nodes for the current kernel.
[   17.363570] systemd[1]: Finished Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
[   17.386194] systemd[1]: modprobe@drm.service: Succeeded.
[   17.395547] systemd[1]: Finished Load Kernel Module drm.
[   17.405111] systemd[1]: Started Journal Service.
[   17.566084] EXT4-fs (sda1): re-mounted. Opts: (null)
[   17.650197] systemd-journald[421]: Received client request to flush runtime journal.
[   19.071278] vboxguest: loading out-of-tree module taints kernel.
[   19.086387] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   19.086885] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/LNXVIDEO:00/input/input5
[   19.209419] vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
[   19.210007] input: Unspecified device as /devices/pci0000:00/0000:00:04.0/input/input6
[   19.223106] vboxguest: Successfully loaded version 6.1.16_Ubuntu
[   19.223260] vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
[   19.223265] vboxguest: Successfully loaded version 6.1.16_Ubuntu (interface 0x00010004)
[   20.407826] alua: device handler registered
[   20.417201] emc: device handler registered
[   20.434789] rdac: device handler registered
[   21.093621] audit: type=1400 audit(1623246190.555:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lsb_release" pid=613 comm="apparmor_parser"
[   21.093647] audit: type=1400 audit(1623246190.555:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=607 comm="apparmor_parser"
[   21.093662] audit: type=1400 audit(1623246190.555:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=607 comm="apparmor_parser"
[   21.097895] audit: type=1400 audit(1623246190.559:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=608 comm="apparmor_parser"
[   21.097919] audit: type=1400 audit(1623246190.559:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=608 comm="apparmor_parser"
[   21.097929] audit: type=1400 audit(1623246190.559:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=608 comm="apparmor_parser"
[   21.103676] audit: type=1400 audit(1623246190.563:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/snapd/snap-confine" pid=610 comm="apparmor_parser"
[   21.103692] audit: type=1400 audit(1623246190.563:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=610 comm="apparmor_parser"
[   21.109939] audit: type=1400 audit(1623246190.571:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=609 comm="apparmor_parser"
[   21.116862] audit: type=1400 audit(1623246190.579:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=612 comm="apparmor_parser"
[   25.337241] ISO 9660 Extensions: Microsoft Joliet Level 3
[   25.353266] ISO 9660 Extensions: RRIP_1991A
[   26.856862] e1000: enp0s8 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   26.869629] e1000: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   26.874447] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s8: link becomes ready
[   26.874617] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
[   33.484155] vboxsf: g_fHostFeatures=0x80000007 g_fSfFeatures=0x0 g_uSfLastFunction=20
[   33.485250] vboxsf: Successfully loaded version 6.1.16_Ubuntu
[   33.485445] vboxsf: Successfully loaded version 6.1.16_Ubuntu on 5.4.0-73-generic SMP mod_unload modversions  (LINUX_VERSION_CODE=0x5046a)
[   33.488155] vboxsf: SHFL_FN_MAP_FOLDER failed for '/vagrant': share not found
[   33.488203] vbsf_read_super_aux err=-6
[   36.363328] 13:43:25.813552 main     VBoxService 6.1.16_Ubuntu r140961 (verbosity: 0) linux.amd64 (Dec 17 2020 22:06:23) release log
               13:43:25.813563 main     Log opened 2021-06-09T13:43:25.813526000Z
[   36.363838] 13:43:25.828165 main     OS Product: Linux
[   36.364128] 13:43:25.828488 main     OS Release: 5.4.0-73-generic
[   36.364360] 13:43:25.828690 main     OS Version: #82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021
[   36.364725] 13:43:25.828920 main     Executable: /usr/sbin/VBoxService
               13:43:25.828925 main     Process ID: 793
               13:43:25.828929 main     Package type: LINUX_64BITS_GENERIC (OSE)
[   36.475478] 13:43:25.939760 main     6.1.16_Ubuntu r140961 started. Verbose level = 0
[   36.519626] 13:43:25.979991 main     vbglR3GuestCtrlDetectPeekGetCancelSupport: Supported (#1)
[   38.206229] aufs 5.4.3-20200302
[   42.103134] kauditd_printk_skb: 20 callbacks suppressed
[   42.103140] audit: type=1400 audit(1623246211.566:32): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=926 comm="apparmor_parser"
[   47.066111] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   47.077214] Bridge firewalling registered
[   47.212754] bpfilter: Loaded bpfilter_umh pid 1110
[   47.214565] Started bpfilter
[   48.515673] Initializing XFRM netlink socket
[   72.186658] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
[  137.600013] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[  137.600216] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
[  137.600626] IPVS: ipvs loaded.
[  137.826663] IPVS: [rr] scheduler registered.
[  137.947199] IPVS: [wrr] scheduler registered.
[  138.004952] IPVS: [sh] scheduler registered.
[  178.280765] ipip: IPv4 and MPLS over IPv4 tunneling driver
[ 1260.502864] IPv6: ADDRCONF(NETDEV_CHANGE): cali67cb074b4d9: link becomes ready
[ 1827.948625] IPv6: ADDRCONF(NETDEV_CHANGE): calia87155bc7f9: link becomes ready
[ 4099.725746] rcu: INFO: rcu_sched self-detected stall on CPU
[ 4099.751472] rcu: 	5-...0: (1 GPs behind) idle=92e/1/0x4000000000000002 softirq=406741/406743 fqs=2 
[ 4099.766610] 	(t=16863 jiffies g=904881 q=141)
[ 4099.766620] Sending NMI from CPU 5 to CPUs 2:
[ 4099.767651] NMI backtrace for cpu 2 skipped: idling at native_halt+0xd/0x10
[ 4099.767661] NMI backtrace for cpu 5
[ 4099.767673] CPU: 5 PID: 120924 Comm: calico-node Tainted: G           O      5.4.0-73-generic #82-Ubuntu
[ 4099.767677] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 4099.767680] Call Trace:
[ 4099.767687]  <IRQ>
[ 4099.767707]  dump_stack+0x6d/0x8b
[ 4099.767722]  ? lapic_can_unplug_cpu+0x80/0x80
[ 4099.767730]  nmi_cpu_backtrace.cold+0x14/0x53
[ 4099.767743]  nmi_trigger_cpumask_backtrace+0xe8/0xf0
[ 4099.767754]  arch_trigger_cpumask_backtrace+0x19/0x20
[ 4099.767762]  rcu_dump_cpu_stacks+0x99/0xcb
[ 4099.767769]  rcu_sched_clock_irq.cold+0x1d9/0x3c1
[ 4099.767816]  update_process_times+0x2c/0x60
[ 4099.767826]  tick_sched_handle+0x29/0x60
[ 4099.767834]  tick_sched_timer+0x3d/0x80
[ 4099.767843]  __hrtimer_run_queues+0xf7/0x270
[ 4099.767850]  ? tick_sched_do_timer+0x60/0x60
[ 4099.767861]  hrtimer_interrupt+0x109/0x220
[ 4099.767872]  smp_apic_timer_interrupt+0x71/0x140
[ 4099.767879]  apic_timer_interrupt+0xf/0x20
[ 4099.767884]  </IRQ>
[ 4099.767912] RIP: 0010:page_remove_rmap+0x26/0x310
[ 4099.767920] Code: 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 08 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 <f6> 40 18 01 74 74 40 84 f6 0f 85 2d 01 00 00 f0 83 47 30 ff 78 0b
[ 4099.767924] RSP: 0018:ffffba1f87103a58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 4099.767932] RAX: ffffe0a8ca7449c0 RBX: 0000000000000000 RCX: 8000000000000867
[ 4099.767936] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe0a8ca7449c0
[ 4099.767940] RBP: ffffba1f87103a78 R08: ffffe0a8ca7449c0 R09: 0000000000000000
[ 4099.767944] R10: 0000000000000001 R11: ffff8cc69ffc9000 R12: ffffe0a8ca7449c0
[ 4099.767948] R13: 00007f01b036c000 R14: 800000029d127867 R15: 00007f01b036d000
[ 4099.767963]  ? page_remove_rmap+0x51/0x310
[ 4099.767971]  ? tlb_flush_mmu+0x3a/0x140
[ 4099.767979]  zap_pte_range.isra.0+0x273/0x7f0
[ 4099.767985]  ? __switch_to_asm+0x34/0x70
[ 4099.767997]  unmap_page_range+0x2da/0x4a0
[ 4099.768008]  unmap_single_vma+0x7f/0xf0
[ 4099.768016]  unmap_vmas+0x79/0xf0
[ 4099.768026]  exit_mmap+0xb4/0x1b0
[ 4099.768039]  mmput+0x5d/0x130
[ 4099.768047]  do_exit+0x31a/0xaf0
[ 4099.768056]  ? default_wake_function+0x12/0x20
[ 4099.768065]  do_group_exit+0x47/0xb0
[ 4099.768076]  get_signal+0x169/0x890
[ 4099.768090]  do_signal+0x34/0x6c0
[ 4099.768103]  ? ep_show_fdinfo+0x90/0x90
[ 4099.768113]  ? __x64_sys_futex+0x13f/0x170
[ 4099.768126]  exit_to_usermode_loop+0xbf/0x160
[ 4099.768136]  do_syscall_64+0x163/0x190
[ 4099.768148]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4099.768156] RIP: 0033:0x4726a3
[ 4099.768164] Code: 8d 6c 24 20 48 8b 44 24 30 48 89 04 24 48 8b 4c 24 38 48 89 4c 24 08 48 c7 44 24 10 1a 00 00 00 e8 22 05 f9 ff 80 7c 24 18 00 <75> 10 31 c0 88 44 24 40 48 8b 6c 24 20 48 83 c4 28 c3 48 8b 44 24
[ 4099.768168] RSP: 002b:00007f01abffec40 EFLAGS: 00000286 ORIG_RAX: 00000000000000ca
[ 4099.768174] RAX: fffffffffffffe00 RBX: 000000c000078c00 RCX: 00000000004726a3
[ 4099.768178] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000c000078d48
[ 4099.768182] RBP: 00007f01abffec88 R08: 0000000000000000 R09: 0000000000000000
[ 4099.768186] R10: 0000000000000000 R11: 0000000000000286 R12: 000000c0004c6f90
[ 4099.768190] R13: 000000c0001827e0 R14: 000000c0001822a0 R15: 0000000000000055
[ 4099.780268] 15:48:54.491953 timesync vgsvcTimeSyncWorker: Radical host time change: 3 537 407 000 000ns (HostNow=1 623 253 734 439 000 000 ns HostLast=1 623 250 197 032 000 000 ns)
[ 4109.781288] 15:49:04.494131 timesync vgsvcTimeSyncWorker: Radical guest time change: 3 474 154 372 000ns (GuestNow=1 623 253 744 494 097 000 ns GuestLast=1 623 250 270 339 725 000 ns fSetTimeLastLoop=true )
[ 4560.170381] hrtimer: interrupt took 323647 ns

Hmmmm, doesn’t look like it was OOM killed. Have you checked the logs in the server-system-logger container of k8ssandra-dc1-default-sts-0?

By chance have you tried bringing up stargate and then reaper once everything is settled?

Another option is increasing the probe delays for stargate. Depending on the size of your schema and resource constraints stargate might not be able to settle gossip in time.

This is only the second time I’ve ever seen this and I don’t know how it gets to this point so we’ll need to track this down.

/cc @alexander , @jsanda

I think @dwettlaufer 's suggestion on tuning the probes could be spot on.
@Hanno_Scharwaechter, I’ve used your values file locally on a Kind cluster and while I got no issue with Reaper, the Stargate pod was restarted 6 times with similar logs to yours before finally getting into Ready state.
I doubled the default value of both the liveness and readiness probes initial delay to 60 seconds, and did another deployment: Stargate went into Ready state without any restart.

Stargate has a bad habit of declaring a new schema version when it starts, so I can see how having it restart several times can drive Reaper crazy as it is migrating the schema.

Let us know how it works with longer probe initial delays.

Hi Alex, dwettlaufer,

I also think dwettlaufer got the point. A VM on a labptop will simply not provide enough computation power to start the stuff up in time. I’ll give it a try and let you know.

Hi guys,

sorry, took me a while. Unfortunately, no white smoke yet. I tried to launch stargate with increased liveness and readiness parameters: 60, 90, 120 but without effect. Its really strange, I was pretty sure that this is the right track. Actually, I still believe that it is related to some inter-process sync. Please find below some dumps (I split them due to their size):

stargate log:

Using environment for config
Running java -server -XX:+CrashOnOutOfMemoryError -Xms1024M -Xmx1024M -Dstargate.libdir=./stargate-lib -Djava.awt.headless=true -jar ./stargate-lib/stargate-starter-1.0.18.jar --cluster-name k8ssandra --cluster-version 3.11 --cluster-seed k8ssandra-seed-service.default.svc.cluster.local --listen 10.96.223.188 --dc dc1 --rack default --enable-auth
JAR DIR: ./stargate-lib
Loading persistence backend persistence-cassandra-3.11-1.0.18.jar
Installing bundle persistence-cassandra-3.11-1.0.18.jar
Installing bundle animal-sniffer-annotations-1.9.jar
Installing bundle asm-7.1.jar
Installing bundle asm-analysis-7.1.jar
Installing bundle asm-tree-7.1.jar
Installing bundle auth-api-1.0.18.jar
Installing bundle auth-jwt-service-1.0.18.jar
Installing bundle auth-table-based-service-1.0.18.jar
Installing bundle authnz-1.0.18.jar
Installing bundle commons-beanutils-1.9.4.jar
Installing bundle commons-collections-3.2.2.jar
Installing bundle commons-digester-2.1.jar
Installing bundle commons-logging-1.2.jar
Installing bundle commons-validator-1.7.jar
Installing bundle config-store-api-1.0.18.jar
Installing bundle config-store-yaml-1.0.18.jar
Installing bundle core-1.0.18.jar
Installing bundle cql-1.0.18.jar
Installing bundle graphqlapi-1.0.18.jar
Installing bundle health-checker-1.0.18.jar
Installing bundle org.apache.felix.scr-2.1.20.jar
Installing bundle org.apache.felix.scr.ds-annotations-1.2.10.jar
Installing bundle org.apache.felix.scr.generator-1.18.4.jar
Installing bundle org.osgi.compendium-4.2.0.jar
Installing bundle org.osgi.util.function-1.1.0.jar
Installing bundle org.osgi.util.promise-1.1.1.jar
Installing bundle persistence-api-1.0.18.jar
Installing bundle rate-limiting-global-1.0.18.jar
Installing bundle restapi-1.0.18.jar
Starting bundle io.stargate.db.cassandra_3_11
INFO  [main] 2021-06-10 09:53:25,948 BaseActivator.java:92 - Starting persistence-cassandra-3.11 ...
Starting bundle null
Starting bundle org.objectweb.asm
Starting bundle org.objectweb.asm.tree.analysis
Starting bundle org.objectweb.asm.tree
Starting bundle io.stargate.auth.api
INFO  [main] 2021-06-10 09:53:28,550 BaseActivator.java:92 - Starting authApiServer ...
Starting bundle io.stargate.auth.jwt
Starting bundle io.stargate.auth.table
INFO  [main] 2021-06-10 09:53:33,014 BaseActivator.java:92 - Starting authnTableBasedService and authzTableBasedServie ...
Starting bundle io.stargate.auth
Starting bundle org.apache.commons.commons-beanutils
Starting bundle org.apache.commons.collections
Starting bundle org.apache.commons.digester
Starting bundle org.apache.commons.logging
Starting bundle org.apache.commons.commons-validator
Starting bundle io.stargate.config.store.api
Starting bundle io.stargate.config.store.yaml
INFO  [main] 2021-06-10 09:53:35,514 BaseActivator.java:92 - Starting Config Store YAML ...
Starting bundle io.stargate.core
INFO  [main] 2021-06-10 09:53:35,522 BaseActivator.java:92 - Starting core services ...
INFO  [main] 2021-06-10 09:53:35,551 BaseActivator.java:173 - Registering core services as io.stargate.core.metrics.api.Metrics
INFO  [main] 2021-06-10 09:53:41,012 AbstractCassandraPersistence.java:100 - Initializing Apache Cassandra
INFO  [main] 2021-06-10 09:53:41,256 DatabaseDescriptor.java:381 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO  [main] 2021-06-10 09:53:41,259 DatabaseDescriptor.java:439 - Global memtable on-heap threshold is enabled at 247MB
INFO  [main] 2021-06-10 09:53:41,261 DatabaseDescriptor.java:443 - Global memtable off-heap threshold is enabled at 247MB
WARN  [main] 2021-06-10 09:53:42,849 DatabaseDescriptor.java:579 - Only 31.027GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
INFO  [main] 2021-06-10 09:53:43,110 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 2000.
INFO  [main] 2021-06-10 09:53:43,113 DatabaseDescriptor.java:773 - Back-pressure is disabled with strategy null.
INFO  [main] 2021-06-10 09:53:43,335 GossipingPropertyFileSnitch.java:68 - Unable to load cassandra-topology.properties; compatibility mode disabled
INFO  [main] 2021-06-10 09:53:44,926 JMXServerUtils.java:246 - Configured JMX server at: service:jmx:rmi://0.0.0.0/jndi/rmi://0.0.0.0:7199/jmxrmi
INFO  [main] 2021-06-10 09:53:45,041 CassandraDaemon.java:489 - Hostname: k8ssandra-dc1-stargate-68b459c4fc-mj2ph
INFO  [main] 2021-06-10 09:53:45,047 CassandraDaemon.java:496 - JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_252
INFO  [main] 2021-06-10 09:53:45,061 CassandraDaemon.java:497 - Heap size: 989.875MiB/989.875MiB
INFO  [main] 2021-06-10 09:53:45,066 CassandraDaemon.java:502 - Code Cache Non-heap memory: init = 2555904(2496K) used = 5704064(5570K) committed = 5767168(5632K) max = 251658240(245760K)
INFO  [main] 2021-06-10 09:53:45,068 CassandraDaemon.java:502 - Metaspace Non-heap memory: init = 0(0K) used = 25141928(24552K) committed = 26214400(25600K) max = -1(-1K)
INFO  [main] 2021-06-10 09:53:45,074 CassandraDaemon.java:502 - Compressed Class Space Non-heap memory: init = 0(0K) used = 3270392(3193K) committed = 3670016(3584K) max = 1073741824(1048576K)
INFO  [main] 2021-06-10 09:53:45,078 CassandraDaemon.java:502 - Eden Space Heap memory: init = 286326784(279616K) used = 149008312(145515K) committed = 286326784(279616K) max = 286326784(279616K)
INFO  [main] 2021-06-10 09:53:45,079 CassandraDaemon.java:502 - Survivor Space Heap memory: init = 35782656(34944K) used = 0(0K) committed = 35782656(34944K) max = 35782656(34944K)
INFO  [main] 2021-06-10 09:53:45,083 CassandraDaemon.java:502 - Tenured Gen Heap memory: init = 715849728(699072K) used = 19289936(18837K) committed = 715849728(699072K) max = 715849728(699072K)
INFO  [main] 2021-06-10 09:53:45,111 CassandraDaemon.java:504 - Classpath: ./stargate-lib/stargate-starter-1.0.18.jar
INFO  [main] 2021-06-10 09:53:45,118 CassandraDaemon.java:506 - JVM Arguments: [-XX:+CrashOnOutOfMemoryError, -Xms1024M, -Xmx1024M, -Dstargate.libdir=./stargate-lib, -Djava.awt.headless=true]
WARN  [main] 2021-06-10 09:53:46,025 NativeLibrary.java:189 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN  [main] 2021-06-10 09:53:46,028 StartupChecks.java:136 - jemalloc shared library could not be preloaded to speed up memory allocations
INFO  [main] 2021-06-10 09:53:46,034 StartupChecks.java:176 - JMX is enabled to receive remote connections on port: 7199
INFO  [main] 2021-06-10 09:53:46,052 SigarLibrary.java:44 - Initializing SIGAR library
INFO  [main] 2021-06-10 09:53:46,347 SigarLibrary.java:57 - Could not initialize SIGAR library org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem; 
INFO  [main] 2021-06-10 09:53:46,351 SigarLibrary.java:185 - Sigar could not be initialized, test for checking degraded mode omitted.
INFO  [main] 2021-06-10 09:53:47,122 QueryProcessor.java:116 - Initialized prepared statement caches with 10 MB (native) and 10 MB (Thrift)
INFO  [main] 2021-06-10 09:53:52,827 ColumnFamilyStore.java:427 - Initializing system.IndexInfo
INFO  [main] 2021-06-10 09:54:05,224 ColumnFamilyStore.java:427 - Initializing system.batches
INFO  [main] 2021-06-10 09:54:05,607 ColumnFamilyStore.java:427 - Initializing system.paxos
INFO  [main] 2021-06-10 09:54:06,480 ColumnFamilyStore.java:427 - Initializing system.local
INFO  [main] 2021-06-10 09:54:06,841 ColumnFamilyStore.java:427 - Initializing system.peers
INFO  [main] 2021-06-10 09:54:07,218 ColumnFamilyStore.java:427 - Initializing system.peer_events
INFO  [main] 2021-06-10 09:54:07,649 ColumnFamilyStore.java:427 - Initializing system.range_xfers
INFO  [main] 2021-06-10 09:54:08,122 ColumnFamilyStore.java:427 - Initializing system.compaction_history
INFO  [main] 2021-06-10 09:54:08,657 ColumnFamilyStore.java:427 - Initializing system.sstable_activity
INFO  [main] 2021-06-10 09:54:09,316 ColumnFamilyStore.java:427 - Initializing system.size_estimates
INFO  [main] 2021-06-10 09:54:10,010 ColumnFamilyStore.java:427 - Initializing system.available_ranges
INFO  [main] 2021-06-10 09:54:10,609 ColumnFamilyStore.java:427 - Initializing system.transferred_ranges
INFO  [main] 2021-06-10 09:54:11,750 ColumnFamilyStore.java:427 - Initializing system.views_builds_in_progress
INFO  [main] 2021-06-10 09:54:12,343 ColumnFamilyStore.java:427 - Initializing system.built_views
INFO  [main] 2021-06-10 09:54:12,913 ColumnFamilyStore.java:427 - Initializing system.hints
INFO  [main] 2021-06-10 09:54:13,422 ColumnFamilyStore.java:427 - Initializing system.batchlog
INFO  [main] 2021-06-10 09:54:14,523 ColumnFamilyStore.java:427 - Initializing system.prepared_statements
INFO  [main] 2021-06-10 09:54:15,124 ColumnFamilyStore.java:427 - Initializing system.schema_keyspaces
INFO  [main] 2021-06-10 09:54:15,719 ColumnFamilyStore.java:427 - Initializing system.schema_columnfamilies
INFO  [main] 2021-06-10 09:54:16,353 ColumnFamilyStore.java:427 - Initializing system.schema_columns
INFO  [main] 2021-06-10 09:54:17,031 ColumnFamilyStore.java:427 - Initializing system.schema_triggers
INFO  [main] 2021-06-10 09:54:17,925 ColumnFamilyStore.java:427 - Initializing system.schema_usertypes
INFO  [main] 2021-06-10 09:54:18,760 ColumnFamilyStore.java:427 - Initializing system.schema_functions
INFO  [main] 2021-06-10 09:54:19,375 ColumnFamilyStore.java:427 - Initializing system.schema_aggregates
INFO  [main] 2021-06-10 09:54:19,388 ViewManager.java:137 - Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO  [main] 2021-06-10 09:54:19,415 ClientState.java:102 - Using io.stargate.db.cassandra.impl.StargateQueryHandler as query handler for native protocol queries (as requested with -Dcassandra.custom_query_handler_class)
INFO  [main] 2021-06-10 09:54:20,209 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds
INFO  [main] 2021-06-10 09:54:20,743 ColumnFamilyStore.java:427 - Initializing system_schema.keyspaces
INFO  [main] 2021-06-10 09:54:21,435 ColumnFamilyStore.java:427 - Initializing system_schema.tables
INFO  [main] 2021-06-10 09:54:21,979 ColumnFamilyStore.java:427 - Initializing system_schema.columns
INFO  [main] 2021-06-10 09:54:22,310 ColumnFamilyStore.java:427 - Initializing system_schema.triggers
INFO  [main] 2021-06-10 09:54:22,732 ColumnFamilyStore.java:427 - Initializing system_schema.dropped_columns
INFO  [main] 2021-06-10 09:54:23,157 ColumnFamilyStore.java:427 - Initializing system_schema.views
INFO  [main] 2021-06-10 09:54:23,651 ColumnFamilyStore.java:427 - Initializing system_schema.types
INFO  [main] 2021-06-10 09:54:24,155 ColumnFamilyStore.java:427 - Initializing system_schema.functions
INFO  [main] 2021-06-10 09:54:24,627 ColumnFamilyStore.java:427 - Initializing system_schema.aggregates
INFO  [main] 2021-06-10 09:54:25,065 ColumnFamilyStore.java:427 - Initializing system_schema.indexes
INFO  [main] 2021-06-10 09:54:25,078 ViewManager.java:137 - Not submitting build tasks for views in keyspace system_schema as storage service is not initialized
INFO  [MemtableFlushWriter:1] 2021-06-10 09:54:30,410 CacheService.java:100 - Initializing key cache with capacity of 49 MBs.
INFO  [MemtableFlushWriter:1] 2021-06-10 09:54:30,448 CacheService.java:122 - Initializing row cache with capacity of 0 MBs
INFO  [MemtableFlushWriter:1] 2021-06-10 09:54:30,507 CacheService.java:151 - Initializing counter cache with capacity of 24 MBs
INFO  [MemtableFlushWriter:1] 2021-06-10 09:54:30,522 CacheService.java:162 - Scheduling counter cache save to every 7200 seconds (going to save all keys).
INFO  [CompactionExecutor:2] 2021-06-10 09:54:32,720 BufferPool.java:234 - Global buffer pool is enabled, when pool is exhausted (max is 247.000MiB) it will allocate on heap
INFO  [main] 2021-06-10 09:54:33,439 StorageService.java:639 - Populating token metadata from system tables
INFO  [main] 2021-06-10 09:54:34,142 StorageService.java:646 - Token metadata: 
INFO  [pool-9-thread-1] 2021-06-10 09:54:35,016 AutoSavingCache.java:174 - Completed loading (22 ms; 8 keys) KeyCache cache
INFO  [main] 2021-06-10 09:54:35,437 CommitLog.java:142 - No commitlog files found; skipping replay
INFO  [main] 2021-06-10 09:54:35,439 StorageService.java:639 - Populating token metadata from system tables
INFO  [main] 2021-06-10 09:54:35,612 StorageService.java:646 - Token metadata: 
INFO  [main] 2021-06-10 09:54:37,233 QueryProcessor.java:163 - Preloaded 0 prepared statements
INFO  [main] 2021-06-10 09:54:37,235 StorageService.java:657 - Cassandra version: 3.11.6
INFO  [main] 2021-06-10 09:54:37,237 StorageService.java:658 - Thrift API version: 20.1.0
INFO  [main] 2021-06-10 09:54:37,238 StorageService.java:659 - CQL supported versions: 3.4.4 (default: 3.4.4)
INFO  [main] 2021-06-10 09:54:37,241 StorageService.java:661 - Native protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)
INFO  [main] 2021-06-10 09:54:37,849 IndexSummaryManager.java:87 - Initializing index summary manager with a memory pool size of 49 MB and a resize interval of 60 minutes
INFO  [main] 2021-06-10 09:54:37,951 MessagingService.java:750 - Starting Messaging Service on /10.96.223.188:7000 (eth0)
WARN  [main] 2021-06-10 09:54:38,080 SystemKeyspace.java:1130 - No host ID found, created bae8eff3-e53b-459a-b615-4e83a6872752 (Note: This should happen exactly once per node).
INFO  [main] 2021-06-10 09:54:38,251 OutboundTcpConnection.java:108 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [HANDSHAKE-k8ssandra-seed-service.default.svc.cluster.local/10.96.223.158] 2021-06-10 09:54:38,533 OutboundTcpConnection.java:561 - Handshaking version with k8ssandra-seed-service.default.svc.cluster.local/10.96.223.158
INFO  [main] 2021-06-10 09:54:39,442 StorageService.java:743 - Loading persisted ring state
INFO  [main] 2021-06-10 09:54:39,449 StorageService.java:871 - Starting up server gossip
INFO  [HANDSHAKE-k8ssandra-seed-service.default.svc.cluster.local/10.96.223.158] 2021-06-10 09:54:40,925 OutboundTcpConnection.java:561 - Handshaking version with k8ssandra-seed-service.default.svc.cluster.local/10.96.223.158
INFO  [GossipStage:1] 2021-06-10 09:54:41,742 Gossiper.java:1127 - Node /10.96.223.158 is now part of the cluster
WARN  [GossipTasks:1] 2021-06-10 09:54:41,855 FailureDetector.java:278 - Not marking nodes down due to local pause of 7331698835 > 5000000000
INFO  [GossipStage:1] 2021-06-10 09:54:42,743 TokenMetadata.java:497 - Updating topology for /10.96.223.158
INFO  [GossipStage:1] 2021-06-10 09:54:42,748 TokenMetadata.java:497 - Updating topology for /10.96.223.158
INFO  [GossipStage:1] 2021-06-10 09:54:42,937 Gossiper.java:1089 - InetAddress /10.96.223.158 is now UP
INFO  [MigrationStage:1] 2021-06-10 09:54:44,309 ViewManager.java:137 - Not submitting build tasks for views in keyspace system_auth as storage service is not initialized
INFO  [MigrationStage:1] 2021-06-10 09:54:45,452 ColumnFamilyStore.java:427 - Initializing system_auth.resource_role_permissons_index
INFO  [MigrationStage:1] 2021-06-10 09:54:46,834 ColumnFamilyStore.java:427 - Initializing system_auth.role_members
INFO  [MigrationStage:1] 2021-06-10 09:54:49,016 ColumnFamilyStore.java:427 - Initializing system_auth.role_permissions
INFO  [MigrationStage:1] 2021-06-10 09:54:50,250 ColumnFamilyStore.java:427 - Initializing system_auth.roles
INFO  [main] 2021-06-10 09:54:50,635 AuthCache.java:177 - (Re)initializing CredentialsCache (validity period/update interval/max entries) (2000/2000/1000)
INFO  [main] 2021-06-10 09:54:50,720 StorageService.java:733 - Not joining ring as requested. Use JMX (StorageService->joinRing()) to initiate ring joining
INFO  [main] 2021-06-10 09:54:50,726 Gossiper.java:1780 - Waiting for gossip to settle...
INFO  [InternalResponseStage:1] 2021-06-10 09:54:55,695 ColumnFamilyStore.java:427 - Initializing reaper_db.cluster
INFO  [InternalResponseStage:1] 2021-06-10 09:54:56,620 ColumnFamilyStore.java:427 - Initializing reaper_db.diagnostic_event_subscription
INFO  [InternalResponseStage:1] 2021-06-10 09:54:57,402 ColumnFamilyStore.java:427 - Initializing reaper_db.leader
INFO  [InternalResponseStage:1] 2021-06-10 09:54:58,275 ColumnFamilyStore.java:427 - Initializing reaper_db.node_metrics_v1
INFO  [main] 2021-06-10 09:54:58,736 Gossiper.java:1811 - No gossip backlog; proceeding
INFO  [main] 2021-06-10 09:54:59,635 CassandraDaemon.java:550 - Not starting native transport as requested. Use JMX (StorageService->startNativeTransport()) or nodetool (enablebinary) to start it
INFO  [main] 2021-06-10 09:54:59,636 CassandraDaemon.java:556 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
INFO  [InternalResponseStage:1] 2021-06-10 09:54:59,724 ColumnFamilyStore.java:427 - Initializing reaper_db.node_metrics_v3
INFO  [InternalResponseStage:1] 2021-06-10 09:55:01,355 ColumnFamilyStore.java:427 - Initializing reaper_db.node_operations
INFO  [InternalResponseStage:1] 2021-06-10 09:55:02,732 ColumnFamilyStore.java:427 - Initializing reaper_db.repair_run
INFO  [InternalResponseStage:1] 2021-06-10 09:55:04,347 ColumnFamilyStore.java:427 - Initializing reaper_db.repair_run_by_cluster
INFO  [InternalResponseStage:1] 2021-06-10 09:55:05,086 ColumnFamilyStore.java:427 - Initializing reaper_db.repair_run_by_cluster_v2
INFO  [InternalResponseStage:1] 2021-06-10 09:55:05,753 ColumnFamilyStore.java:427 - Initializing reaper_db.repair_run_by_unit
INFO  [InternalResponseStage:1] 2021-06-10 09:55:06,719 ColumnFamilyStore.java:427 - Initializing reaper_db.repair_schedule_by_cluster_and_keyspace
INFO  [InternalResponseStage:1] 2021-06-10 09:55:07,748 ColumnFamilyStore.java:427 - Initializing reaper_db.repair_schedule_v1
INFO  [InternalResponseStage:1] 2021-06-10 09:55:08,553 ColumnFamilyStore.java:427 - Initializing reaper_db.repair_unit_v1
INFO  [InternalResponseStage:1] 2021-06-10 09:55:09,849 ColumnFamilyStore.java:427 - Initializing reaper_db.running_reapers
INFO  [InternalResponseStage:1] 2021-06-10 09:55:11,649 ColumnFamilyStore.java:427 - Initializing reaper_db.running_repairs
INFO  [InternalResponseStage:1] 2021-06-10 09:55:12,533 ColumnFamilyStore.java:427 - Initializing reaper_db.schema_migration
INFO  [InternalResponseStage:1] 2021-06-10 09:55:13,883 ColumnFamilyStore.java:427 - Initializing reaper_db.schema_migration_leader
INFO  [InternalResponseStage:1] 2021-06-10 09:55:15,928 ColumnFamilyStore.java:427 - Initializing reaper_db.snapshot
INFO  [InternalResponseStage:1] 2021-06-10 09:55:17,132 ColumnFamilyStore.java:427 - Initializing system_traces.events
INFO  [InternalResponseStage:1] 2021-06-10 09:55:19,258 ColumnFamilyStore.java:427 - Initializing system_traces.sessions
INFO  [InternalResponseStage:1] 2021-06-10 09:55:20,842 ColumnFamilyStore.java:427 - Initializing system_distributed.parent_repair_history
INFO  [InternalResponseStage:1] 2021-06-10 09:55:22,189 ColumnFamilyStore.java:427 - Initializing system_distributed.repair_history
INFO  [InternalResponseStage:1] 2021-06-10 09:55:23,038 ColumnFamilyStore.java:427 - Initializing system_distributed.view_build_status
INFO  [main] 2021-06-10 09:55:26,389 ColumnFamilyStore.java:427 - Initializing stargate_system.local
INFO  [main] 2021-06-10 09:55:28,175 ColumnFamilyStore.java:427 - Initializing stargate_system.peers
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
INFO  [main] 2021-06-10 09:55:30,231 BaseActivator.java:173 - Registering persistence-cassandra-3.11 as io.stargate.db.Persistence
INFO  [main] 2021-06-10 09:55:30,237 BaseActivator.java:180 - Started persistence-cassandra-3.11
INFO  [main] 2021-06-10 09:55:30,247 ConfigStoreActivator.java:57 - Creating Config Store YAML for config file location: /etc/stargate/stargate-config.yaml 
INFO  [main] 2021-06-10 09:55:31,854 BaseActivator.java:173 - Registering Config Store YAML as io.stargate.config.store.api.ConfigStore
INFO  [main] 2021-06-10 09:55:31,910 BaseActivator.java:180 - Started Config Store YAML
INFO  [main] 2021-06-10 09:55:31,912 BaseActivator.java:173 - Registering core services as com.codahale.metrics.health.HealthCheckRegistry
INFO  [main] 2021-06-10 09:55:31,914 BaseActivator.java:180 - Started core services
Starting bundle io.stargate.cql
INFO  [main] 2021-06-10 09:55:34,186 BaseActivator.java:92 - Starting CQL ...
Starting bundle io.stargate.graphql
INFO  [main] 2021-06-10 09:55:36,808 BaseActivator.java:92 - Starting GraphQL ...
Starting bundle io.stargate.health
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [bundle://03599a46-ec55-4726-91ae-d0f42b95f0d3_20.0:82/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [bundle://03599a46-ec55-4726-91ae-d0f42b95f0d3_20.0:100/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
INFO  [main] 2021-06-10 09:55:39,487 BaseActivator.java:92 - Starting healthchecker ...
Starting bundle org.apache.felix.scr
Starting bundle null
Starting bundle null
Starting bundle osgi.cmpn
Starting bundle org.osgi.util.function
Starting bundle org.osgi.util.promise
Starting bundle io.stargate.db
INFO  [main] 2021-06-10 09:55:39,809 BaseActivator.java:92 - Starting DB services ...
INFO  [main] 2021-06-10 09:55:39,819 BaseActivator.java:173 - Registering DB services as io.stargate.db.Persistence
INFO  [main] 2021-06-10 09:55:40,981 CqlImpl.java:60 - Netty using native Epoll event loop
INFO  [main] 2021-06-10 09:55:41,726 Server.java:154 - Using Netty Version: []
INFO  [main] 2021-06-10 09:55:41,734 Server.java:155 - Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
INFO  [Service Thread] 2021-06-10 09:55:43,309 GCInspector.java:285 - MarkSweepCompact GC in 593ms.  Eden Space: 99880112 -> 0; Survivor Space: 5213752 -> 0; Tenured Gen: 48222904 -> 73111640
INFO  [Service Thread] 2021-06-10 09:55:43,338 StatusLogger.java:47 - Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
INFO  [main] 2021-06-10 09:55:44,334 BaseActivator.java:180 - Started CQL
INFO  [main] 2021-06-10 09:55:44,338 BaseActivator.java:173 - Registering DB services as io.stargate.db.datastore.DataStoreFactory
INFO  [main] 2021-06-10 09:55:44,353 HealthCheckerActivator.java:43 - Starting healthchecker....
INFO  [main] 2021-06-10 09:55:46,301 Version.java:21 - HV000001: Hibernate Validator null
INFO  [main] 2021-06-10 09:55:48,447 FileUtil.java:249 - No oshi.properties file found from ClassLoader sun.misc.Launcher$AppClassLoader@70dea4e
INFO  [main] 2021-06-10 09:55:48,451 FileUtil.java:249 - No oshi.properties file found from ClassLoader sun.misc.Launcher$AppClassLoader@70dea4e
INFO  [main] 2021-06-10 09:55:51,507 Log.java:169 - Logging initialized @161043ms to org.eclipse.jetty.util.log.Slf4jLog
INFO  [StorageServiceShutdownHook] 2021-06-10 09:55:56,425 HintsService.java:209 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2021-06-10 09:55:56,524 Gossiper.java:1655 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [StorageServiceShutdownHook] 2021-06-10 09:55:56,527 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.96.223.188] 2021-06-10 09:55:56,537 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [StorageServiceShutdownHook] 2021-06-10 09:55:58,530 HintsService.java:209 - Paused hints dispatch

reaper log;

INFO   [2021-06-10 09:51:48,617] [main] i.d.s.DefaultServerFactory - Registering jersey handler with root path prefix: / 
INFO   [2021-06-10 09:51:48,643] [main] i.d.s.DefaultServerFactory - Registering admin handler with root path prefix: / 
INFO   [2021-06-10 09:51:48,645] [main] i.d.a.AssetsBundle - Registering AssetBundle with name: assets for path /webui/* 
INFO   [2021-06-10 09:51:48,941] [main] i.c.ReaperApplication - initializing runner thread pool with 15 threads and 2 repair runners 
INFO   [2021-06-10 09:51:48,941] [main] i.c.ReaperApplication - initializing storage of type: cassandra 
INFO   [2021-06-10 09:51:49,128] [main] c.d.d.core - DataStax Java driver 3.10.1 for Apache Cassandra 
INFO   [2021-06-10 09:51:49,155] [main] c.d.d.c.GuavaCompatibility - Detected Guava >= 19 in the classpath, using modern compatibility layer 
INFO   [2021-06-10 09:51:50,194] [main] c.d.d.c.ClockFactory - Using native clock to generate timestamps. 
INFO   [2021-06-10 09:51:50,629] [main] c.d.d.c.NettyUtil - Did not find Netty's native epoll transport in the classpath, defaulting to NIO. 
INFO   [2021-06-10 09:51:56,480] [main] c.d.d.c.p.DCAwareRoundRobinPolicy - Using data-center name 'dc1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor) 
INFO   [2021-06-10 09:51:56,485] [main] c.d.d.c.Cluster - New Cassandra host k8ssandra-dc1-service/10.96.223.158:9042 added 
INFO   [2021-06-10 09:52:05,184] [main] o.c.c.m.MigrationRepository - Found 12 migration scripts 
WARN   [2021-06-10 09:52:05,187] [main] i.c.s.CassandraStorage - Starting db migration from 22 to 27… 
INFO   [2021-06-10 09:52:06,260] [main] o.c.c.m.MigrationRepository - Found 12 migration scripts 
WARN   [2021-06-10 09:52:07,942] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='insert into schema_migration(applied_successful, version, script_name, script, executed_at) values(?, ?, ?, ?, ?)' 
WARN   [2021-06-10 09:52:08,037] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='INSERT INTO schema_migration_leader (keyspace_name, leader, took_lead_at, leader_hostname) VALUES (?, ?, dateOf(now()), ?) IF NOT EXISTS USING TTL 300' 
WARN   [2021-06-10 09:52:08,133] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='DELETE FROM schema_migration_leader where keyspace_name = ? IF leader = ?' 
INFO   [2021-06-10 09:52:08,647] [main] o.c.c.m.MigrationTask - Migrated keyspace reaper_db to version 23 
INFO   [2021-06-10 09:52:09,487] [main] i.c.s.CassandraStorage - Migrated keyspace reaper_db to version 23 
INFO   [2021-06-10 09:52:09,746] [main] o.c.c.m.MigrationRepository - Found 12 migration scripts 
WARN   [2021-06-10 09:52:11,322] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='insert into schema_migration(applied_successful, version, script_name, script, executed_at) values(?, ?, ?, ?, ?)' 
WARN   [2021-06-10 09:52:11,336] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='INSERT INTO schema_migration_leader (keyspace_name, leader, took_lead_at, leader_hostname) VALUES (?, ?, dateOf(now()), ?) IF NOT EXISTS USING TTL 300' 
WARN   [2021-06-10 09:52:11,350] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='DELETE FROM schema_migration_leader where keyspace_name = ? IF leader = ?' 
INFO   [2021-06-10 09:52:20,942] [main] o.c.c.m.MigrationTask - Migrated keyspace reaper_db to version 24 
INFO   [2021-06-10 09:52:21,159] [main] i.c.s.CassandraStorage - Migrated keyspace reaper_db to version 24 
INFO   [2021-06-10 09:52:21,274] [main] o.c.c.m.MigrationRepository - Found 12 migration scripts 
WARN   [2021-06-10 09:52:23,253] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='insert into schema_migration(applied_successful, version, script_name, script, executed_at) values(?, ?, ?, ?, ?)' 
WARN   [2021-06-10 09:52:23,285] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='INSERT INTO schema_migration_leader (keyspace_name, leader, took_lead_at, leader_hostname) VALUES (?, ?, dateOf(now()), ?) IF NOT EXISTS USING TTL 300' 
WARN   [2021-06-10 09:52:23,349] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='DELETE FROM schema_migration_leader where keyspace_name = ? IF leader = ?' 
INFO   [2021-06-10 09:52:26,440] [main] o.c.c.m.MigrationTask - Migrated keyspace reaper_db to version 25 
INFO   [2021-06-10 09:52:26,548] [main] i.c.s.CassandraStorage - Migrated keyspace reaper_db to version 25 
INFO   [2021-06-10 09:52:26,709] [main] o.c.c.m.MigrationRepository - Found 12 migration scripts 
WARN   [2021-06-10 09:52:27,976] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='insert into schema_migration(applied_successful, version, script_name, script, executed_at) values(?, ?, ?, ?, ?)' 
WARN   [2021-06-10 09:52:28,021] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='INSERT INTO schema_migration_leader (keyspace_name, leader, took_lead_at, leader_hostname) VALUES (?, ?, dateOf(now()), ?) IF NOT EXISTS USING TTL 300' 
WARN   [2021-06-10 09:52:28,050] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='DELETE FROM schema_migration_leader where keyspace_name = ? IF leader = ?' 
INFO   [2021-06-10 09:52:32,225] [main] o.c.c.m.MigrationTask - Migrated keyspace reaper_db to version 26 
INFO   [2021-06-10 09:52:32,458] [main] i.c.s.CassandraStorage - Migrated keyspace reaper_db to version 26 
INFO   [2021-06-10 09:52:32,571] [main] o.c.c.m.MigrationRepository - Found 12 migration scripts 
WARN   [2021-06-10 09:52:34,754] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='insert into schema_migration(applied_successful, version, script_name, script, executed_at) values(?, ?, ?, ?, ?)' 
WARN   [2021-06-10 09:52:34,877] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='INSERT INTO schema_migration_leader (keyspace_name, leader, took_lead_at, leader_hostname) VALUES (?, ?, dateOf(now()), ?) IF NOT EXISTS USING TTL 300' 
WARN   [2021-06-10 09:52:34,959] [clustername-worker-0] c.d.d.c.Cluster - Re-preparing already prepared query is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once. Query='DELETE FROM schema_migration_leader where keyspace_name = ? IF leader = ?' 
INFO   [2021-06-10 09:52:39,434] [main] o.c.c.m.MigrationTask - Migrated keyspace reaper_db to version 27 
INFO   [2021-06-10 09:52:39,645] [main] i.c.s.CassandraStorage - Migrated keyspace reaper_db to version 27 
WARN   [2021-06-10 09:52:39,656] [main] i.c.s.c.Migration016 - altering every table to set `dclocal_read_repair_chance` to zero… 
WARN   [2021-06-10 09:52:39,679] [main] i.c.s.c.Migration016 - alter every table to set `dclocal_read_repair_chance` to zero completed. 
INFO   [2021-06-10 09:52:39,686] [main] i.c.s.c.Migration021 - Altering node_metrics_v1 to use TWCS... 
ERROR  [2021-06-10 09:52:51,908] [main] i.c.s.c.Migration021 - Failed altering metrics tables to TWCS 
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: k8ssandra-dc1-service/10.96.223.158:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [k8ssandra-dc1-service/10.96.223.158:9042] Timed out waiting for server response))
	at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:83)
	at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37)
	at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
	at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)
	at io.cassandrareaper.storage.cassandra.Migration021.migrate(Migration021.java:52)
	at io.cassandrareaper.storage.CassandraStorage.initializeCassandraSchema(CassandraStorage.java:297)
	at io.cassandrareaper.storage.CassandraStorage.initializeAndUpgradeSchema(CassandraStorage.java:250)
	at io.cassandrareaper.storage.CassandraStorage.<init>(CassandraStorage.java:238)
	at io.cassandrareaper.ReaperApplication.initializeStorage(ReaperApplication.java:480)
	at io.cassandrareaper.ReaperApplication.tryInitializeStorage(ReaperApplication.java:303)
	at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:181)
	at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:98)
	at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:43)
	at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:87)
	at io.dropwizard.cli.Cli.run(Cli.java:78)
	at io.dropwizard.Application.run(Application.java:93)
	at io.cassandrareaper.ReaperApplication.main(ReaperApplication.java:117)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: k8ssandra-dc1-service/10.96.223.158:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [k8ssandra-dc1-service/10.96.223.158:9042] Timed out waiting for server response))
	at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:283)
	at com.datastax.driver.core.RequestHandler.access$1200(RequestHandler.java:61)
	at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:375)
	at com.datastax.driver.core.RequestHandler$SpeculativeExecution.retry(RequestHandler.java:563)
	at com.datastax.driver.core.RequestHandler$SpeculativeExecution.processRetryDecision(RequestHandler.java:545)
	at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:987)
	at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1633)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
INFO   [2021-06-10 09:52:51,936] [main] i.c.s.c.Migration024 - Altering node_metrics_v3 to use TWCS... 

kubectl get pods without stargate:

NAME                                                READY   STATUS      RESTARTS   AGE
dnsutils                                            1/1     Running     8          44h
k8ssandra-cass-operator-65fd67b6d8-q5b4b            1/1     Running     0          9m28s
k8ssandra-crd-upgrader-job-k8ssandra-zdtv7          0/1     Completed   0          80m
k8ssandra-dc1-default-sts-0                         2/2     Running     0          8m36s
k8ssandra-grafana-584dfb486f-rnh8n                  2/2     Running     0          9m28s
k8ssandra-kube-prometheus-operator-85695ffb-kcppp   1/1     Running     0          9m28s
k8ssandra-reaper-7ffb485bb8-dzswb                   0/1     Running     1          104s
k8ssandra-reaper-operator-b67dc8cdf-hh7kv           1/1     Running     0          9m28s
prometheus-k8ssandra-kube-prometheus-prometheus-0   2/2     Running     1          9m10s

get pods with stargate:

NAME                                                READY   STATUS      RESTARTS   AGE
dnsutils                                            1/1     Running     8          44h
k8ssandra-cass-operator-65fd67b6d8-q5b4b            1/1     Running     0          27m
k8ssandra-crd-upgrader-job-k8ssandra-qlfcj          0/1     Completed   0          12m
k8ssandra-dc1-default-sts-0                         2/2     Running     0          26m
k8ssandra-dc1-stargate-68b459c4fc-mj2ph             0/1     Running     3          11m
k8ssandra-grafana-584dfb486f-rnh8n                  2/2     Running     0          27m
k8ssandra-kube-prometheus-operator-85695ffb-kcppp   1/1     Running     0          27m
k8ssandra-reaper-7ffb485bb8-dzswb                   0/1     Running     8          19m
k8ssandra-reaper-operator-b67dc8cdf-hh7kv           1/1     Running     0          27m
prometheus-k8ssandra-kube-prometheus-prometheus-0   2/2     Running     1          27m

One more update: I did a rollback to the release without stargate. There the reaper is up and stays up. Only a NPE during healtcheck, but this seemed to be irrelevant. Then, I did an upgrade again, so that stargate is again added. The reaper stays up, i.e. no restarts. Only the stargate is restarting constantly. Log is pretty much the same as posted.

Do you happen to have the logs from server-system-logger container of k8ssandra-dc1-default-sts-0 during this time?

I’d also like to try this locally with Vagrant. What flavor of k8s are you using on that VM (e.g. KIND, minikube, k3s,…)? Was the values file above the only changes you made to k8ssandra?