More work in zabbix: we got alerts a few ... / 2012-09-07

2012-09-07 More work in zabbix: we got alerts a few ... 6 years ago
More work in zabbix: we got alerts a few times for load averages > 5. But on a 48-core system in use by people doing calculations that isn't a very useful trigger. My solution is to start monitoring the number of CPUs (a very boring number normally), and create a new trigger
{Template_geo_linux:system.cpu.load[all,avg1].last(0)}/{Template_geo_linux:system.cpu.num.last(0)}>3
This makes a lot more sense: a load of more than 3 times the number of cores is an issue, both on a 1-core (virtual) machine and on a 48-core calculating monster. On some of those calculation servers a load of less than 10 means some model crashed and a scientist will be trying to restart it.

And we can now set a trigger on any change in the number of cores. That would be interesting.
{Template_geo_linux:system.cpu.num.change(0)}>0

Tags: , , ,

, reachable as koos+website@idefix.net. PGP encrypted e-mail preferred.

PGP key 5BA9 368B E6F3 34E4 local copy PGP key 5BA9 368B E6F3 34E4 via keyservers pgp key statistics for 0x5BA9368BE6F334E4 Koos van den Hout
RSS
Other webprojects: Camp Wireless, wireless Internet access at campsites, The Virtual Bookcase, book reviews
This page generated in 0.004082 seconds.