Ages ago I added scripts to our zabbix install to
monitor a 3ware raid
controller for raid failures. But at the moment we have a raid with a disk
in error state but the raid unit is still listed as 'optimal'.
Change of measuring script:
#!/bin/sh
sudo /usr/local/sbin/tw_cli '/c0 show drivestatus' | grep '^p' | awk ' $2 != "OK" { print } ' | wc -l
This now counts the number of disks not reporting 'OK' as state. Which is for
the unit currently:
# /usr/local/sbin/tw_cli '/c0 show drivestatus'
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 931.51 GB 1953525168 xxxxxxxx
p1 OK u0 931.51 GB 1953525168 xxxxxxxx
p2 OK u0 931.51 GB 1953525168 xxxxxxxx
p3 OK u0 931.51 GB 1953525168 xxxxxxxx
p4 OK u0 931.51 GB 1953525168 xxxxxxxx
p5 OK u0 931.51 GB 1953525168 xxxxxxxx
p6 OK u0 931.51 GB 1953525168 xxxxxxxx
p7 OK u0 931.51 GB 1953525168 xxxxxxxx
p8 OK u0 931.51 GB 1953525168 xxxxxxxx
p9 OK u0 931.51 GB 1953525168 xxxxxxxx
p10 OK u0 931.51 GB 1953525168 xxxxxxxx
p11 OK u0 931.51 GB 1953525168 xxxxxxxx
p12 OK u0 931.51 GB 1953525168 xxxxxxxx
p13 DEVICE-ERROR u0 931.51 GB 1953525168 xxxxxxxx
p14 OK u0 931.51 GB 1953525168 xxxxxxxx
p15 OK u0 931.51 GB 1953525168 xxxxxxxx
But the entire state is stil 'Opt' :
# /usr/local/sbin/tw_cli 'show'
Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU
------------------------------------------------------------------------
c0 9650SE-16ML 16 16 1 0 1 1 OK
So the test had to change.