always remember

Nothing is foolproof to a sufficiently talented fool... Make something
idiot proof, and the world will simply make a bigger idiot.

Monitor Pending Connections – Zen/Zevenet Load Balancers

In my working environment, we use (rather extensively) ZenLB (or as they are now know, Zevenet) Load Balancers. In production systems, sometimes the back-ends of an infrastructure, or the “real servers” behind the load balancers, can become unresponsive for whatever reason. A typical one that I see quite often is when using clustered MS Exchange Client Access servers behind a load balanced pool. IIS may lock up on one or multiple CAS’s causing the connections coming in from clients to be stored at LB level as “pending”.

This is fine, but in my experience, once the Zevenet LB racks up 1500+ pending connections on one of its farms, it quickly exhausts it’s available memory.

The following check is called by the Nagios NRPE agent installed locally on the LB (It’s just Debian 8 afterall)

#!/bin/bash
#
# ZenLB Pending/Established Connection Tracking v1.0 - Dave Byrne
#
hour=`date +%H`
pending=`cat /proc/net/nf_conntrack |grep SYN_SENT |grep dport='443|80' |wc -l`
established=`cat /proc/net/nf_conntrack |grep ESTABLISHED |grep dport='443|80' |wc -l`

if [ $pending -gt 5 ]
   then
      printf "CRITICAL - Pending connections above threshold! Pending: $pending -- Established: $establishedn"
   exit 2
elif [ $established -eq 0 ] && [ $hour -ge 8 ] && [ $hour -le 23 ];
   then
      printf "CRITICAL - No established connections! Pending: $pending -- Established: $establishedn"
   exit 2
else
      printf "OK - Pending connections at acceptable level. Pending: $pending -- Established: $establishedn"
   exit 0
fi

The check will go CRITICAL if pending connections across ANY of the farms goes above 5. It will also go CRITICAL is the established connections drops to 0 (probably bad). But I have limited this to a certain time frame, as I appreciate that there may well be 0 established connections at 4am!!

-Dave

dave / August 21, 2017 / Code, Nagios Monitoring