More work in zabbix: we got alerts a few ... / 2012-09-07

2012-09-07 More work in zabbix: we got alerts a few ... 8 years ago
More work in zabbix: we got alerts a few times for load averages > 5. But on a 48-core system in use by people doing calculations that isn't a very useful trigger. My solution is to start monitoring the number of CPUs (a very boring number normally), and create a new trigger
{Template_geo_linux:system.cpu.load[all,avg1].last(0)}/{Template_geo_linux:system.cpu.num.last(0)}>3
This makes a lot more sense: a load of more than 3 times the number of cores is an issue, both on a 1-core (virtual) machine and on a 48-core calculating monster. On some of those calculation servers a load of less than 10 means some model crashed and a scientist will be trying to restart it.

And we can now set a trigger on any change in the number of cores. That would be interesting.
{Template_geo_linux:system.cpu.num.change(0)}>0

Tags: , , ,

IPv6 check

Running test...
, reachable as koos+website@idefix.net. PGP encrypted e-mail preferred. PGP key 5BA9 368B E6F3 34E4 local copy PGP key 5BA9 368B E6F3 34E4 via keyservers

RSS
Meningen zijn die van mezelf, wat ik schrijf is beschermd door auteursrecht. Sommige publicaties bevatten een expliciete vermelding dat ze ongevraagd gedeeld mogen worden.
My opinions are my own, what I write is protected by copyrights. Some publications contain an explicit license statement which allows sharing without asking permission.
Other webprojects: Camp Wireless, wireless Internet access at campsites, The Virtual Bookcase, book reviews
This page generated by $Id: newsitem.cgi,v 1.54 2020/12/31 15:36:31 koos Exp $ in 0.005833 seconds.