I enhanced the zabbix system monitoring to also work on aacraid based controllers. Google searching found me How to check the health of an Adaptec RAID array which shows that the right command-line tool is nowadays arcconf which can be found at Adaptec support for RAID products. Select the right type and click through a few times where you will find the storage manager downloads (not the drivers!). The latest 'adaptec storage manager' includes 'arcconf'. After installing arcconf produces a lot of output, but the line I am interested in is easy to find:# /usr/StorMan/arcconf GETCONFIG 1 | grep Defunct Defunct disk drive count : 0which is exactly what I want. Again a special UserParameter in zabbix_agentd.conf:UserParameter=aacraid.okdisk,/etc/zabbix/external/aacraid.okdiskA script to do the actual work:#!/bin/sh # aacraid.okdisk sudo /usr/StorMan/arcconf GETCONFIG 1 | awk ' /Defunct disk drive count/ { print $6 } 'And a change in sudoers to allow this. Allowing /usr/StorMan/arcconf as is did not work because of the capitals but a more general rule helped. Now I can check for the number of disks with problems and warn accordingly (0 disks with problems is ok, 1 disk is warning, > 1 is disaster).
As part of the work on system monitoring I am looking into monitoring RAID units. The beta-ict department uses a number of raid units and data gets replicated between buildings.I want a warning when a disk goes down. The 3ware disk controller has a nice webinterface but I can't integrate that (easily..) into zabbix. What I did was install the tw_cli command line utility from the 3ware LSI raid controller site (lookup your type of controller, find 'support and downloads' and you will see cli utils for lots of unix versions), which makes life easy:
# tw_cli show Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU ------------------------------------------------------------------------ c0 9650SE-16ML 16 15 1 1 1 1 OKWhat I want to know is the number of not-optimal disks (yes, indeed one is broken at the moment and needs replacement). That I can monitor in zabbix, when I pick up the value with a script:#!/bin/sh # /etc/zabbix/external/3ware.okdisk sudo /usr/local/sbin/tw_cli show | awk ' /^c0/ { print $6 } 'Root access via sudo which means a line in /etc/sudoers which allows /usr/local/sbin/tw_cli from the zabbix user, and the right setting in zabbix_agentd.conf to bind this script to a user parameter:UserParameter=3ware.okdisk,/etc/zabbix/external/3ware.okdiskNow I can program a trigger on the output: 0 is ok, 1 is warning, > 1 is disaster. I added an extra action on the trigger to mail the output of tw_cli '/c0 show' to the admins so we know which disk is broken.
Now to do the same for adaptec (aacraid) based raids.
At work one of my main projects at the moment is improving monitoring for beta-ict. I am used to mon at the computer science department but that shows its age a bit and I wanted to try something newer.The choice in monitoring system was mainly for something which could monitor both system variables (free disk space, free memory, system load, whether certain needed processes were running) and service availability (is the network available, is ldap available, are web servers up and not giving out weird error messages).
I chose zabbix. It has an interesting approach: it measures variables, stores results and trends and then you can do stuff with the stored data. Such as monitoring whether certain thresholds aren't crossed, so you can do your normal tests. Or more complicated monitoring of trends or changes. But you can also make graphs of that same data. And you can use the triggers to make nice long-term availability reports.
One thing I learned is that the suggestion in the manual to use a new version postgres (>= 8.3) is to be taken serious. With 8.0 the server running zabbix regularly got up to a load of 10 on adding new systems to be monitored and historic monitoring data was lost for certain time periods. Dumping the database, installing postgres 8.4 and importing the data again and continuing with the same setup made everything lots faster and no data has been lost since.
What is also interesting is the option to use remote proxies to gather data from otherwise firewalled networks and the option to split servers / services into groups. Eventually we may give the 1st-line servicedesk their own view of our zabbix server where they can view whether main services are available so they are aware of troubles before they need to ask us.
It seems the Turkish provider ttnet.tr fell off the Internet for a few hours today. Since we volunteered ntp.cs.uu.nl for tr.pool.ntp.org the drop in traffic was very, very noticeable.
First peak at 5000 packets/second ntp traffic seen on ntp.cs.uu.nl. Still going strong under this load.
We volunteered ntp.cs.uu.nl for extra capacity for the Turkish ntp pool, and the results are quite visible in the ntp.cs.uu.nl statistics. Suddenly peaks are near 5000 packets per second. But ntpd (and the freebsd kernel) deal with it without problems.
The sensors at home are updated with data from the new disk. The cause of the relatively high temperatures is that 3 disks (2 pata and 1 new sata) are in one cage together. I hope to rearrange disks so the airflow improves and they cool better. I might need a bit longer sata cable to make that happen.
Maybe related to the constructionwork at home or to problems with the DSL network to my provider but 29 October was a day of intermittant DSL problems. And indeed, the resulting line quality graph looks 'interesting'.
Some measurable growth in IPv6 traffic at the Amsterdam Internet Exchange: they broke the 2 Gbit IPv6 traffic (after rrdtool rounding ;)) limit. Compared to the total traffic flow (764 Gbit) this is still a very small drop but there is growth in there. On to more and more applications, dns entries and traffic! Source: AMS-IX hits 2 Gbps IPv6 traffic - Fix6
The lightning detector had another active night.. and this time I slept right through it. Nothing like the high numbers on 26 May 2009 but still a peak.
When I bought some 1-wire sensors a while ago at Hobby boards I included the lightning detector in the order. I installed it indoors in the attic where it also counts the switch of the fluorescent lights, so it will probably work better in an outdoor weather station further away from interference. But, in the early hours of today there was a heavy thunderstorm over this country and it counted like crazy. The stated sensitivity is about 80 kilometers:
this lightning detector will be able to pick up lightning more than 50 miles awayWith the 75000 lightning strikes reported in the Netherlands for that night, the numbers don't look that strange.
Free ups test! It seems the power company decided not to deliver at all for 9 minutes. Interesting is that they don't mention a failure on their own website.
I got reminded of my Alcatel stats again and some google searches and some combining of clues (the logical place would be in the 'td call' command) led me to the right answer at DMTv7 für Speedtouch 516 536 546 585 608 706 716 780 to get the dsl linestats from a Speedtouch 546/546i: :td call cmd="tdsl getData all". Trying it:* ____/ * ------------------------------------------------------------------------ =>:td call cmd="tdsl getData all" =====================DISCLAIMER====================== Access to expert commands is intended for qualified personnel only. ==================END=OF=DISCLAIMER================== Vendor Information phyType=2 phyMjVerNum=4 phyMnVerNum=0 phyVerStr=B2pBT004.d15b drvMjVerNum=14 drvMnVerNum=20482 drvVerStr=15bAnd suddenly lots of data including the signal/noise ratio per carrier. So after stopping gathering Alcatel Speedtouch graphs in 2005 because I switched to a Speedtouch 546i I can now gather the stats again and create the graphs daily. In the mean time gnuplot changed a bit but with some tweaking of the plotscript I now have the first S/N graph of the 546i as I want it.
I took some time to work on the house 1-wire network today.. and blew up the serial to 1-wire interface in the process. I think there is a voltage difference between house ground (water pipes) and 1-wire ground and I touched a metal part of the 1-wire counter I was going to use for the electricity counting to a water pipe hiding behind another pipe when I was trying to test whether it responded to the led in the electricity meter. So, still no success on measuring electricity and no new house temperature readings either. I did put in an extension of the 1-wire network from the attic to the cupboard beneath the stairs where the electricity meter lives. I used the 'isdn' sockets on the end of the long 1-wire connection so as a side-effect I moved one temperature sensor from the top of the server to the 'wine rack' area and updated the sensors page. It is a different location temperature-wise so I started new statistics for this sensor. I also looked at options for placing a temperature sensor in the living room. The cable to the thermostat is thoroughly cemented in so I can't place a wire alongside that cable. I'll probably use the hole for an extra television-coax cable to get a wire for a temperature sensor from the crawlspace to the living room. I already ordered a replacement serial 1-wire interface. I hope that is the only component that was damaged.
I decided to start monitoring the electricity usage in the house. Using 1-wire sounds the most logical to me as I am already using that to monitor temperatures. I found a description by Jon00 using a MK120 Velleman Kit which sounds quite compatible with my level of electronics knowledge and my budget. So I went to the local electronics shop, Radio Centrum and bought the Velleman MK120. I asked about a 1-wire counter but they don't sell 1-wire equipment (yet?). Well, a counter is something I can order from Hobby boards. Probably together with some other 1-wire stuff to make it an interesting order.
Yesterday I found some time to install the new 1-wire sensors in a place where I am interested in the temperatures: the attic where the home server gosper lives and started fetching data into rrdtool databases. The assorted sensors at home page now shows some of the available temperatures. Sensor 2 lies in the open area right below the top of the roof.
Happy new year! I used the christmas period to do an upgrade I have been planning for a while: change the mainboard of the home server gosper to a newer (better: less older) one. A few hours of screwing worked: it now is an AMD Athlon 1400. Everything works after a few bits of tweaking, including updated mainboard temperature sensors.
At work we now graph several temperatures in the serverroom (results are not public). We joked (or not..) last Friday that we could add a lot of sensors inside and outside the serverroom (that is where my thinking about 1-wire systems came in again) and have someone research this micro-climate and correlate the micro-climate with the ntp statistics. We did see the influence of the cold wind from the east on the pll stats of several ntp servers.
Some environment sensors at home are now public. Started with the environment sensors of the home server gosper which are the easiest. Other stuff will be added if and when certain monitoring projects go from being a wild idea to delivering real data. Ok, I did order some temperature sensors and a 1-wire controller from Hobby Boards 1-wire solutions.
At work I "took over" a fourfold temperature sensor, Quozl's Temperature Sensor. It got me interested in the 1-wire system for sensors. Applications like Thermd and DigiTemp make it possible to log all kinds of environmental data easily. I'm seriously considering getting a simple 1-wire interface for the server at home so I can monitor several inside temperatures (the cheapest to monitor and the most interesting to me) by just stringing some cheap phone wires and hook up sensors. Yet another network, although this one would be simpler to maintain.
It seems the Turkish provider ttnet.tr fell off the Internet for a
few hours today. Since we
The lightning detector had another active night.. and this time I slept
right through it. Nothing like the
When I bought some
Free ups test! It seems the power company decided not to deliver at all
for 9 minutes. Interesting is that they don't mention a failure on their
own website.