Monitoring network partition with Prometheus

Hi there,

I’m trying to find a way to monitor network partition issue. I was able to consistently reproduce the error on the frontend side so I have the message that Mnesia has experienced a network partition, but none of the online examples I found seems to be working in term of prometheus monitoring

This one for example is showing no difference between my rabbitmq in a normal state vs a network partition: cluster-operator/observability/prometheus/rules/rabbitmq/insufficient-established-erlang-distribution-links.yml at v2.2.0 · rabbitmq/cluster-operator · GitHub

Also other available metrics like rabbitmq_unreachable_cluster_peers_count neither show any difference between my clusters in normal state vs the one that is having network parition. The only way I can see this metric being updated is if I restart a member of the cluster

Any idea how I can monitor this?

hello
Use rabbitmq-diagnostics (CLI)
Run:
rabbitmq-diagnostics cluster_status
Look for:
Network Partitions:
export to Prometheus.
or
Use RabbitMQ API
Look for:
“partitions”:
and same export to prometheus using collectore enable.
not sure but you can try this.
please check metrics come on 9090 port
ie. prometheus target and status up or not

thanks

Hello,

Thank you for you suggestion, and it seems to work well with the CLI!

For any people who might have the same issue, here is the script I used:

NAMESPACE=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)

POD=$(kubectl get pods -n ${NAMESPACE} -l app.kubernetes.io/component=rabbitmq -o jsonpath='{.items[0].metadata.name}')

OUTPUT=$(kubectl exec $POD -n ${NAMESPACE} -c rabbitmq -- rabbitmq-diagnostics cluster_status --silent --formatter json)

PARTITIONS_COUNT=$(echo "$OUTPUT" | jq '.partitions | length')

if [ "$PARTITIONS_COUNT" -gt 0 ]; then
  VALUE=1
else
  VALUE=0
fi

cat <<EOF > /tmp/metrics.txt
# HELP rabbitmq_network_partition RabbitMQ network partition detected
# TYPE rabbitmq_network_partition gauge
rabbitmq_network_partition{namespace="${NAMESPACE}",rabbitmq_cluster="${RABBITMQ_CLUSTER}"} ${VALUE}
EOF

curl -s --data-binary @/tmp/metrics.txt http://prometheus-prometheus-pushgateway.prometheus.svc.cluster.local:9091/metrics/job/rabbitmq_partition_check/namespace/${NAMESPACE}/cluster/${RABBITMQ_CLUSTER}
1 Like