I found the probable cause of the not so great power saving: when I installed the first new disk I also updated the bios. And the message I get when trying to load the powernow-k8 cpu driver is:powernow-k8: Found 1 AMD Athlon(tm) Dual Core Processor 4850e processors (2 cpu cores) (version 2.20.00) powernow-k8: MP systems not supported by PSB BIOS structure powernow-k8: MP systems not supported by PSB BIOS structureSo the cpu keeps running at maximum speed without throttling. Searching for the error message finds Ubuntu Bug #33116: powernow-k8 refuses to load and Ubuntu Bug #398109: powernow-k8: Your BIOS does not provide ACPI _PSS objects in a way that Linux understands suggests that I need to check the bios settings to enable "Cool'n'Quiet", enable ACPI APIC and disable MCP61 ACPI HPET Table. That's planned for the next hardware changes.
I noticed that the new Western Digital WD15EADS disk spun down way too fast. After some serious testing I found: when I set the "Advanced Power Management" level (using hdparm -B) to 127 or less the "standby (spindown) timeout" (set using hdparm -S) is ignored and the drive spins down afterabout 58 seconds of inactivity. Way too soon when playing a movie, with mplayer the movie stalls about every 10 seconds because a new bit of movie has to be read from disk which causes another start/stop. The smartctl start/stop counter goes up at the same rate. Feels like a firmware bug to me or a difference of opinion between hdparm and the disk. But the hdparm report suggests that these settings should work on the disk:ATA device, with non-removable media Model Number: WDC WD15EADS-00S2B0 Firmware Revision: 04.05G04 Standby timer values: spec'd by Standard, with device specific minimum Advanced power management level: 126 * Power Management feature setI asked Western Digital customer help about this but the first (standard?) answer is from Support for WD products in LINUX or UNIX which comes down to "we don't support anything else than jumper settings for these operating systems".A lot of further searching with google suggests to me that the 'IntelliPark' feature is causing the drive to park its heads after 8 seconds of inactivity which is not a useful default when streaming video from it with a reasonable cache. And the 'Load Cycle Count' will go up fast, which may result in the drive reaching the 'suggested maximum' within a year. I don't need to test the warranty that fast.
As a workaround I set the Advanced Power Management level back to 128 and installed spindown which is a utility which watches the disk activity from userspace and issues a spindown command when no activity (from /proc/diskstats, so for linux at the device level) was measured over the configured period of time. Now it spins down when the filesystems have been idle for 10 minutes which is a lot more usable.
Update: Official answer from Western Digital customer help is that it's not possible to change this 8 second timeout. So I'll stick to the spindown solution.
The resulting power save from adding a new sata disk, moving the data and removing the old pata disks is not spectacular (yet): the 5 pata disks (all with activated automatic spindown) had the UPS at a 40% load, the current 2 sata and 2 pata disks (also with automatic spindown) have the UPS at a 42% load. It'll be interesting to see what happens when the 2 pata disks can be removed. The main original idea was to save a bit of power and make the system less complicated, let's see if that first part works out in the end.
Update: Found the cause of the not so great power saving: probably the recent bios update.
Filesystems have been moved to the new huge sata disk in home server greenblatt and I found time this evening to remove three old ones. There may be a race condition in the startup scripts where lvm2 is not completely up and running when the filesystems are mounted from the fstab but I saw that happen only once.
The new disk in the homeserver greenblatt was another case of a disk not wanting to go to sleep after the set period. Some searching found two answers: spindown, a daemon to monitor disks for inactivity and spin them down with sg_start --stop or hdparm -y. But the other answer was a better answer: hdparm standby timeout not working for WD raptors? has as answer:* I also know of quite a number of drives where hdparm -B settings override the -S settings, even if you set the -S settings after the hdparm -B settings. You could try combinations with various values of hdparm -B, especially 1 and 255.And the manpage of hdparm has this bit:-B Set Advanced Power Management feature, if the drive supports it. A low value means aggressive power management and a high value means better performance. Possible settings range from values 1 through 127 (which permit spin-down), and values 128 through 254 (which do not permit spin-down). The highest degree of power management is attained with a setting of 1, and the highest I/O performance with a setting of 254. A value of 255 tells hdparm to disable Advanced Power Management altogether on the drive (not all drives support disabling it, but most do).Default on the WD drives is indeed 128, which does not permit spindown on idle. I changed it to 127, see if that helps. I prefer it if the drives decide for themselves when to spin down.
Update : Yes, the changed advanced power management setting helps, now the drive spins down when not in use.
The sensors at home are updated with data from the new disk. The cause of the relatively high temperatures is that 3 disks (2 pata and 1 new sata) are in one cage together. I hope to rearrange disks so the airflow improves and they cool better. I might need a bit longer sata cable to make that happen.
Work on home server greenblatt: time for less disks with more storage. So I bought two sata disks, one huge one to store the camera archive and scratch files, and one for the system and home directories. The choice for two disks is so the one with the camera archive and the scratch files can fall asleep when not in use, to save a bit of power. Installing both new disks at once wasn't going to happen due to space and cabling considerations so I started with the big one. When that one is done I can remove three pata disks from the system. I also updated the system bios to the latest version which made the system clock a lot more stable, ntpd now runs without having to use tickadj. Bios updates are easy these days: this bios can update itself from a USB stick. I chose logical volume management (lvm2) again for managing the big disks so it will be easy to expand storage when needed without getting a big tree of filesystem mounts.
The tapedrive-with-changer on the homeserver found itself in a wedged state with at the bottom of the dmesg output:[105715.017656] ch 0:0:1:1: Attempting to queue a TARGET RESET message [105715.017658] CDB: 0x1b 0x20 0x0 0x0 0x2 0x0 [105715.017663] ch 0:0:1:1: Command not found [105715.017664] aic7xxx_dev_reset returns 0x2002 [105718.936191] target0:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)And still no access to the scsi tape drive. But, there is a bigger hammer nowadays named sg_reset which can fix this:# mt -f /dev/nst0 status /dev/nst0: Input/output error # sg_reset -b /dev/sg0 sg_reset: starting bus reset sg_reset: completed bus reset # mt -f /dev/nst0 status SCSI 2 tape drive: File number=0, block number=0, partition=0. Tape block size 0 bytes. Density code 0x25 (DDS-3). Soft error count since last status=0 General status bits on (41010000): BOT ONLINE IM_REP_ENand it's back, not needing a reboot. The list of options says it all:Usage: sg_reset [-b] [-d] [-h] [-V] DEVICE where: -b attempt a SCSI bus reset -d attempt a SCSI device reset -h attempt a host adapter reset -V print version string then exit {if no switch given then check if reset underway} To reset use '-d' first, if that is unsuccessful, then use '-b', then '-h'
Yesterday evening I installed a 6-tape DDS-3 changer in the homeserver greenblatt and activated the latest ubuntu kernel updates. The tape changer works great but the mISDN drivers got confused because the 'loading drivers' stage at boot loads them without the right parameters which results in confused drivers (hardware not found) which I can't unload because that causes a kernel panic. Workaround: remove the mISDN drivers, reboot the system, reinstall the mISDN drivers and let /etc/init.d/mISDN load the drivers in the correct way.
Free ups test! It seems the power company decided not to deliver at all for 9 minutes. Interesting is that they don't mention a failure on their own website.
I like my home server usually boring and stable, but virus prevention should be at the bleeding edge, especially when it handles mail for multiple domains where other people can receive it. So I don't like messages in the clamav logfile:WARNING: Your ClamAV installation is OUTDATED! WARNING: Local version: 0.92.1 Recommended version: 0.95Using the Ubuntu backports I was able to get a less older version of clamav running. I updated the home server greenblatt documentation with the exact details of just using the clamav backport and no other backports.
The Virtual Bookcase is back online too, and mail is flowing again for all the domains. Lots of typing, checking and everything to move the stuff to the home server. But, finished (I think).
Update: and all the web statistics are working again and updated. Finished?
And it is back! Idefix 4 broke in a major way: the power supply let out the magic smoke in a big way: the hosting company called me to let me know the server was smelling funny and did not want to start up at all. Since the end of idefix 4 in a rack was near anyway the decision was made to move the server home. There I used another power supply to get access to my data again. The old powersupply was a 300 Watt powersupply which seems to be way underrated for a dual Xeon system. My best guess is that the instability the system had came from the powersupply anyway. So, time to move more domains home. Content from idefix.net is now here at home and virtualbookcase will be next when I find time. I had started migrating Camp Wireless so I finished that migration fast. Mail is diverted to a different place so I have a bit of time to configure all the mailing lists and other things.
Ok, the imap storage for asterisk voicemail works like the proverbial charm. I needed some work on the home dialplan and setup before I could test it, but I was able to leave a message to the home mailbox, seeing it stored in the voicemail imap box and retrieve and delete it using a telephone connected to the ISDN port accessing the VoicemailMain application. The access number for voicemail is now set to 0140-1233 to (sort of) stay in line with the Dutch numbering plan. There is no customer-service at 0140-1200 planned...
Ok, got that bit fixed too: asterisk uses imap as storage backend for voicemail. In modules.conf:noload => app_voicemail_odbc.so noload => app_voicemail.so load => app_voicemail_imap.soThis is with the ubuntu package recompiled to use misdn, so the selection of voicemail storage is a question of which .so to load. In voicemail.conf :[general] imapserver=koos.idefix.net imapfolder=INBOX.calls [default] 9911 => 19999,House mailbox,,,Tz=european|imapuser=housemail|imappassword=S3cr1tNow voicemail is saved only on the imap-server, so I can view it with Thunderbird. Or use the asterisk voicemail application to retrieve and delete it. That bit is not tested yet. After all the testing of drivers including heavy torture it's now time to set up a dialplan for the home pbx. Rule 1 of playing with the phones at home is that normal dialing still has to work so my wife can call the numbers without having to dial '0' for an outside line or other tricks, and that the phone in the living room rings when a call comes in. So I have to set up a 'number plan' which allows for special things but also makes all normal numbers work as they should. Solution: I use the 0140 area code, which is reserved (in the Netherlands) for test-numbers for the telecom provider. I am my own telecom provider so I can divert 0140 and do stuff with it, like provide voicemail or internal dialing.
I took the plunge and migrated from the old homeserver gosper to the new homeserver greenblatt. The physical migration was several hours of de-installing and installing hardware in the big tower case. Most software came up as planned, some minor nits to fix after stuff started running. Most statistics were only fixed after I got things running again, but the assorted sensors at home are available again.
A busy weekend filled with being available to fix the network for Nwerc 2008. But the network decided to behave and most of the work was just in making all the computers return to their normal state as soon as the competition was over. Funny how thoroughly re-imaging all the systems helps availability: this morning we have a 100% availability of student computers. That does not happen very often: 86 computers with an issue where they shutdown their network card from time to time will usually show a few missing ones.The other thing I managed to find some time for was work on the new home server greenblatt. In the previous week I spent some evenings migrating all my nameservices to a new configuration where the old homeserver is primary and I duplicated this structure to greenblatt so this stuff will keep running once I swap systems. In the weekend I copied and tested all the configs for web sites running at home such as webcam.idefix.net. Lots of little details show up in the configs which don't work out of the box. I do use the ubuntu package for apache2 now because everything I want in a webserver is available in the package. And trying to squeeze the last bit of optimization from it is not necessary with gigahertzes plenty and upstream bandwidth probably the first bottleneck.
Sunday evening I finally had time to look at the new home server greenblatt and I tried to get the sitecom dc-105 isdn card in NT (network termination) mode connected to the fixed line (outside) isdn port of the fritz!box 7170. It took a bit of work as a lot of documentation about the mISDN drivers mentions NT mode but the needed cable isn't very well documented. I finally found it, chapter 2.2 of the PBX4Linux manual. By itself the crossed cable did not work (and the fritz!box is good at diagnosing problems with SIP dialing, but just goes 'meh' when ISDN dialing fails). I didn't need the fancy solution with power, but I had to look for a while for termination resistors. I remembered the sitecom dc-105 isdn card had some jumpers near the ISDN port. Those are indeed 100 ohm ISDN termination resistors. Nowhere to be found in any of the manuals of the dc-105 online.After setting those jumpers it all started working. At first the dialtone sounded weird but that was caused by
[general] country=usin indications.conf. Changing this to[general] country=nlmade it suddenly sound a lot more familiar: KPN style. Now the test calls are running again via the modem connected to the fritz!box.
Conclusion: the jumpers on the sitecom dc-105 are isdn termination jumpers and can help to make an NT mode cross cable work.
I have done serious testing of the mISDN drivers. Sofar in TE mode (terminal equipment). Ingredients for testing were an analog phone and analog modem connected to a fritzbox connected to the ISDN card and another asterisk testserver quite willing to play hours of music on hold or very short answers to calls. The fritzbox does not have the built-in answering machine I expected but the modem was also very good in dialing out on command (I had to dig up how to make cu dial in to another system again) or in listening for a ring, answering the phone and giving it up after a while. The driver worked, in the first few hours of testing I saw 2 spurious kernel messages. What I am worried about is the memory use, I think it slowly leaks kernel memory.Tomorrow I'll be at the meeting of the network group which means the other test Asterisk server will be unavailable anyway. After I return I'll have a look at changing the setup to NT mode and see how things work then.
I just realised I have the hardware (I think) for some serious call handling testing: the 7170 fritz!box as TE connected to the ISDN card with mISDN drivers in NT mode. The 7170 has a built-in answering machine (either by default or after upgrading to experimental firmware). Using the auto-calling features of Asterisk I could just constantly setup calls and either let the caller hang up (terminate call while the answering machine still holds the line) or let the called party hang up (let the answering machine hang up) and see what happens. Originating calls on the other side (not asterisk) is also not too complicated: just hook up a modem to the analog port of the fritz!box and a few well-placed ATDT strings should do the trick.
Free ups test! It seems the power company decided not to deliver at all
for 9 minutes. Interesting is that they don't mention a failure on their
own website.