News items for tag work - Koos van den Hout

2010-08-11 (#) 3 weeks ago
I enhanced the zabbix system monitoring to also work on aacraid based controllers. Google searching found me How to check the health of an Adaptec RAID array which shows that the right command-line tool is nowadays arcconf which can be found at Adaptec support for RAID products. Select the right type and click through a few times where you will find the storage manager downloads (not the drivers!). The latest 'adaptec storage manager' includes 'arcconf'. After installing arcconf produces a lot of output, but the line I am interested in is easy to find:
# /usr/StorMan/arcconf GETCONFIG 1 | grep Defunct
   Defunct disk drive count                 : 0
which is exactly what I want. Again a special UserParameter in zabbix_agentd.conf:
UserParameter=aacraid.okdisk,/etc/zabbix/external/aacraid.okdisk
A script to do the actual work:
#!/bin/sh
# aacraid.okdisk

sudo /usr/StorMan/arcconf GETCONFIG 1 | awk ' /Defunct disk drive count/ { print $6 } '
And a change in sudoers to allow this. Allowing /usr/StorMan/arcconf as is did not work because of the capitals but a more general rule helped. Now I can check for the number of disks with problems and warn accordingly (0 disks with problems is ok, 1 disk is warning, > 1 is disaster).
Tags: , ,
2010-08-10 (#) 3 weeks ago
As part of the work on system monitoring I am looking into monitoring RAID units. The beta-ict department uses a number of raid units and data gets replicated between buildings.

I want a warning when a disk goes down. The 3ware disk controller has a nice webinterface but I can't integrate that (easily..) into zabbix. What I did was install the tw_cli command line utility from the 3ware LSI raid controller site (lookup your type of controller, find 'support and downloads' and you will see cli utils for lots of unix versions), which makes life easy:

# tw_cli show

Ctl   Model        (V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU
------------------------------------------------------------------------
c0    9650SE-16ML  16        15       1       1       1       1      OK
What I want to know is the number of not-optimal disks (yes, indeed one is broken at the moment and needs replacement). That I can monitor in zabbix, when I pick up the value with a script:
#!/bin/sh
# /etc/zabbix/external/3ware.okdisk

sudo /usr/local/sbin/tw_cli show | awk ' /^c0/ { print $6 } '
Root access via sudo which means a line in /etc/sudoers which allows /usr/local/sbin/tw_cli from the zabbix user, and the right setting in zabbix_agentd.conf to bind this script to a user parameter:
UserParameter=3ware.okdisk,/etc/zabbix/external/3ware.okdisk

Now I can program a trigger on the output: 0 is ok, 1 is warning, > 1 is disaster. I added an extra action on the trigger to mail the output of tw_cli '/c0 show' to the admins so we know which disk is broken.

Now to do the same for adaptec (aacraid) based raids.
Tags: , , ,
2010-07-30 (#) 1 month ago
XKCD: University Website, Randall Munroe, licentie Creative Commons Attribution-NonCommercial.
De XKCD: University Website is briljant. En uiterst herkenbaar. Precies de disjunctie die ik nu bij de Universiteit Utrecht in de verte hoor voorbij komen. Maar net zo goed wat ik meer dan 10 jaar geleden bij de Hogeschool van Utrecht hoorde. Ver daarvoor waren er wel projecten die er van uitgingen wat mensen zouden zoeken, maar zo tegen 2000 ging dat allemaal overboord en kwamen in plaats daarvan monster websites waar alles zou moeten staan wat volgens de voorlichters ooit gevraagd zou kunnen worden door bezoekers van de website, die daarvoor braaf op de homepage zouden beginnen en door allemaal hierarchische structuren heen zouden klikken. Direct linken naar het juiste onderwerp is volgens die voorlichters ook vooral niet de bedoeling.

Ik ben blij dat deze ergernis voor mij over is en dat ik er alleen nog maar vanaf de verre zijlijn om kan lachen. En als gebruiker van de website natuurlijk niet kan vinden wat ik zoek. Maar daar helpt google bij.

Update: Ook gemeld in het Digitale U-Blad met de titel Web.. some sense? naar aanleiding van het UU webpresence project wat al getypeerd is als Web Absence.
Tags: , , ,
2010-07-30 (#) 1 month ago
At work one of my main projects at the moment is improving monitoring for beta-ict. I am used to mon at the computer science department but that shows its age a bit and I wanted to try something newer.

The choice in monitoring system was mainly for something which could monitor both system variables (free disk space, free memory, system load, whether certain needed processes were running) and service availability (is the network available, is ldap available, are web servers up and not giving out weird error messages).

I chose zabbix. It has an interesting approach: it measures variables, stores results and trends and then you can do stuff with the stored data. Such as monitoring whether certain thresholds aren't crossed, so you can do your normal tests. Or more complicated monitoring of trends or changes. But you can also make graphs of that same data. And you can use the triggers to make nice long-term availability reports.

One thing I learned is that the suggestion in the manual to use a new version postgres (>= 8.3) is to be taken serious. With 8.0 the server running zabbix regularly got up to a load of 10 on adding new systems to be monitored and historic monitoring data was lost for certain time periods. Dumping the database, installing postgres 8.4 and importing the data again and continuing with the same setup made everything lots faster and no data has been lost since.

What is also interesting is the option to use remote proxies to gather data from otherwise firewalled networks and the option to split servers / services into groups. Eventually we may give the 1st-line servicedesk their own view of our zabbix server where they can view whether main services are available so they are aware of troubles before they need to ask us.

Tags: , , ,
2010-07-10 (#) 1 month ago
Available now: repocafe, our subversion self-service webinterface.
Tags: ,
2010-07-05 (#) 1 month ago
Slowly but surely the subversion self-service webinterface we developed at work is turning into a 2.0 version which will be available as open source. I must say "my boss developed", he did most of the coding. I just threw ideas, designs and criticism at him :)

It was our original plan to open-source it, and this plan was woken up again when we got a request about the availability of the source code. Lots of work was done to make structures more flexible and remove hardcoded dependencies on internal infrastructure.

One of the bigger design issues was a good name! For historical reasons we couldn't use the name repoman which was good wordplay on repository-manager in itself. We settled on repocafe. Available for download Real Soon Now™.
Tags: ,
2010-06-17 (#) 2 months ago
In mijn werk heb ik natuurlijk ook veel te maken met de universitaire automatiseringsprojecten. Vandaag las ik een prachtige filosofische beschouwing van het leerlingvolgsysteem: God bestaat en zijn naam is OSIRIS.
Tags: , ,
2010-03-17 (#) 5 months ago
One of our users at work reported today that he noticed the 'Previous Versions' tab in windows explorer being active and showing what we think of as the snapshots of the NetApp fileserver. I tried it myself on the windows 2008 terminal server and it works as it should. As my boss noted this is a very important step: having snapshots available is one thing, but having them available in the standard interface which (experienced) windows users can use makes quite a difference. Helpdesk page about filesystem snapshots updated.
Tags: , ,
2010-03-11 (#) 5 months ago
Met alle aankomende wijzigingen op het werk hebben we besloten om de spamfiltering uit te besteden aan de surfnet mailfilter dienst. Die worden er voor betaald om de filtering dagelijks bij te houden en wij hebben er straks minder tijd voor. Totnogtoe was het natuurlijk altijd onze 'eigen' mailsetup en konden we zelf de spamstats bijhouden, en dat verliezen we.
We hebben eerst students.cs.uu.nl omgezet en vanmorgen cs.uu.nl. In de logs van de studentenmailserver viel me opeens op dat de smtpd ratelimiting (anvil) van postfix aansloeg op de surfnet mailfilters dus ik heb de surfnet mailfilter adressen toegevoegd aan de smtpd_client_event_limit_exceptions setting in postfix. Bij cs.uu.nl gebruiken we postfix, al van toen het nog vmailer heette. In sendmail zou ik voor die IP blokken andere ratelimits kunnen zetten maar postfix heeft blijkbaar alleen de opties default en geen ratelimits.
Tags: , ,
2010-03-04 (#) 6 months ago
It seems the Turkish provider ttnet.tr fell off the Internet for a few hours today. Since we volunteered ntp.cs.uu.nl for tr.pool.ntp.org the drop in traffic was very, very noticeable.
Tags: , , ,
2010-03-01 (#) 6 months ago
First peak at 5000 packets/second ntp traffic seen on ntp.cs.uu.nl. Still going strong under this load.
Tags: , , ,
2010-02-24 (#) 6 months ago
Lots of phishing attempts for webmail accounts flying by, at the moment it seems popular to use webform hosters to ask for account credentials. I seem to miss a part of these. Probably my spamfilters being too good or something. But at work there are some people who know I am interested in new and recurring strains of Internet abuse so I still get interesting stuff forwarded to investigate. The latest catch advertised a dot.tk domain which inlined a webform from a tripod hosted site which was a copy of an emailmeform.com form and used emailmeform.com to process it and redirected to a generic thankyou form by a new zealand printer supplies company. It takes a bit of tracing and trying to solve such a puzzle and notify all parties about their role in the abuse.
Tags: , ,
2010-02-18 (#) 6 months ago
No license to rdesktop for me: I recently got a really weird error from rdesktop:
koos@leek:~$ rdesktop -M -g 1200x900 -d something terminalserver
Autoselected keyboard map en-us
disconnect: No valid license available.
Some searching found me: License to rdesktop. Indeed, setting a different hostname from my own hostname helps:
koos@leek:~$ rdesktop -M -g 1200x900 -d something -n leeks terminalserver
Autoselected keyboard map en-us
/users/koos/.rdesktop/licence.leeks.new: Permission denied
WARNING: Remote desktop does not support colour depth 24; falling back to 16
The license file error has to do with another workaround. But maybe the running out of licenses for 'leek' is because I never give licenses back. Why is all this software very busy with making sure money is made for its maker and not busy with helping the user.
Tags: , ,
2010-02-01 (#) 7 months ago
We volunteered ntp.cs.uu.nl for extra capacity for the Turkish ntp pool, and the results are quite visible in the ntp.cs.uu.nl statistics. Suddenly peaks are near 5000 packets per second. But ntpd (and the freebsd kernel) deal with it without problems.
Tags: , , ,
2010-01-15 (#) 7 months ago
I upgraded ntpd on ntp.cs.uu.nl from 4.2.4 to 4.2.6 and suddenly I notice in the output that this has changed the stratum from 2 to 1.
$ ntpq -c rv ntp.cs.uu.nl
status=011d leap_none, sync_atomic, 1 event, event_13,
version="ntpd 4.2.6@1.2089-o Fri Jan 15 14:31:14 UTC 2010 (1)",
processor="i386", system="FreeBSD/5.4-RELEASE-p13", leap=00, stratum=1,
precision=-19, rootdelay=0.000, rootdisp=1.456, refid=PPS,
reftime=cefb066f.cbe638ff  Fri, Jan 15 2010 16:21:19.796,
clock=cefb0693.889dd5ee  Fri, Jan 15 2010 16:21:55.533, peer=7047, tc=6,
mintc=3, offset=-0.001, frequency=15.448, sys_jitter=0.002,
clk_jitter=0.001, clk_wander=0.002
Which matches the peer list where the PPS stratum is now 0:
$ ntpq -c peer ntp.cs.uu.nl
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*huygens.cs.uu.n .PPS.            1 u   23   64  377    0.197    0.009   0.258
+stardate.cs.uu. .PPS.            1 u   13   64  377    0.998   -0.058   0.033
+tijger.phys.uu. metronoom.dmz.c  2 u   15   64  376    0.599    0.004   0.185
 LOCAL(0)        .LOCL.          10 l  627   64    0    0.000    0.000   0.002
oPPS(0)          .PPS.            0 l   49   64  377    0.000   -0.002   0.002
 NTP.MCAST.NET   .MCST.          16 u    -   64    0    0.000    0.000   0.002
I guess some definition of PPS input has changed. Now I wonder how much more ntp traffic this will cause.
Tags: , ,
2010-01-06 (#) 7 months ago
I tried to use the --filter option in rsync but I was a bit baffled by the syntax and the manpage is nice but I couldn't understand. I wanted certain directories completely, other directories default excluded and certain files in one directory but not all. After some trail and error and talking to the teddybear:
rsync -rvv --progress /home/koos/rsyncsource/ /home/koos/rsyncdest --filter='merge /home/koos/rsyncfilter'
And in the filter file name things to include and exclude:
+ /wel/
- /niet/
+ /random/file
- /random/*
And the result is what I want:
$ ~/bin/testrsync 
building file list ... 
[sender] showing directory wel because of pattern /wel/
[sender] hiding directory niet because of pattern /niet/
[sender] hiding file random/niet because of pattern /random/*
[sender] showing file random/file because of pattern /random/file
7 files to consider
delta-transmission disabled for local transfer or --whole-file
random/
random/file
           0 100%    0.00kB/s    0:00:00 (xfer#1, to-check=4/7)
wel/
wel/file1
           0 100%    0.00kB/s    0:00:00 (xfer#2, to-check=2/7)
wel/file2
           0 100%    0.00kB/s    0:00:00 (xfer#3, to-check=1/7)
wel/file4
           0 100%    0.00kB/s    0:00:00 (xfer#4, to-check=0/7)
total: matches=0  hash_hits=0  false_alarms=0 data=0

sent 319 bytes  received 126 bytes  890.00 bytes/sec
total size is 0  speedup is 0.00
Now to do this on a filesystem with 151000 files.
Tags: , , ,
2010-01-01 (#) 8 months ago
Y2.01K problem: SpamAssassin had a rule since 2006 that e-mail with a date in the 'far future' was likely spam. The 'far future' was defined as 2010-2099. So today that rule started firing, leading to missed e-mail. Documentation for SpamAssassin Rule: FH_DATE_PAST_20XX. Time for an update there...
Tags: , ,
2009-12-11 (#) 8 months ago
It's that xsnow time of year. I wanted to compile it for our students and staff to use and found a major Makefile and a real Imakefile (remember those?):
$ wc Makefile  Imakefile 
  957  2413 26799 Makefile
    7    21   172 Imakefile
Trying to find the 'real' problem I managed to reduce all that to:
xsnow: xsnow.o toon_root.o
        gcc -o xsnow xsnow.o toon_root.o -lm -lXpm -L/usr/X11R6/lib
imake gave us somewhat overkill Makefiles...
Tags: , ,
2009-11-24 (#) 9 months ago
I was replacing ssl certificates on a lot of servers and got it working everywhere except on our ldap server. The SSL certificate chain wasn't given out so there was no link between a trusted root and the certificate on the server. I had it configured:
TLSCACertificateFile /etc/openldap/ssl/cacert.pem
TLSCertificateFile /etc/openldap/ssl/servercrt.pem
TLSCertificateKeyFile /etc/openldap/ssl/serverkey.pem
With the certificate in servercrt.pem and the intermediate certificates in cacert.pem. But that was a config from an older server which uses OpenSSL, including openssl libraries (libssl). The newer ldap server uses the gnu tls libraries (libgnutls) which really need:
TLSCertificateFile /etc/openldap/ssl/servercrt.pem
TLSCertificateKeyFile /etc/openldap/ssl/serverkey.pem
With the server certificate and the entire chain together in servercrt.pem. Something to keep in mind, so I documented it on our internal wiki.
Tags: ,
2009-11-16 (#) 9 months ago
Power failure this morning at work.. which left us not in the dark (enough emergency lighting) but with a completely silent serverroom. When the power came back we had some hours of work to get everything up and running again. Worst problem was with a number of Xen based virtualhosts, some centos upgrade had suddenly created a network device virbr0 which uses NAT and a local dhcp pool and enslaved all xen domU network interfaces under that bridge with no access to the 'real' network because NAT was not set up so their NFS root mount failed. The details on virbr0:
virbr0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          inet addr:192.168.122.1  Bcast:192.168.122.255
A bit hard to disable, but at the end ifconfig virbr0 down ; brctl delbr virbr0 helps to get rid of the weird bridge, and all domUs will start after that.
Tags: , , ,
  Older news items for tag work ⇒

IPv6 ready
Koos van den Hout, E-mail koos+web@kzdoos.xs4all.nl. PGP key DSS/1024 0xF0D7C263 RSS
Other webprojects: Camp Wireless, wireless Internet access at campsites, The Virtual Bookcase, book reviews, Weather maps