2023-06-24 Time to replace half of a mirrored disk (again)
Error messages like this make me fix things fast:Jun 24 13:42:59 conway kernel: [6925745.388604] sd 0:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT Jun 24 13:42:59 conway kernel: [6925745.389388] sd 0:0:0:0: [sda] tag#6 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 Jun 24 13:42:59 conway kernel: [6925745.390157] print_req_error: I/O error, dev sda, sector 616464 Jun 24 13:42:59 conway kernel: [6925745.390923] md: super_written gets error=10 Jun 24 13:42:59 conway kernel: [6925745.391705] md/raid1:md127: Disk failure on sda3, disabling device. Jun 24 13:42:59 conway kernel: [6925745.391705] md/raid1:md127: Operation continuing on 1 devices. Jun 24 13:42:59 conway mdadm[2559]: Fail event detected on md device /dev/md127, component device /dev/sda3The part that makes me go 'hmmm' is that this was another Kingston A400 SSD, just like the one that failed in December 2021 for which I ordered a replacement from a different brand. Since that disk failed under warranty it was replaced with another Kingston A400 which I still had available in packaging. So that is now in use and the failed SSD is removed. I wonder how long that replacement disk will work fine. I did all the bits to replace the disk and recreate the software raid mirror. This worked fine, and all my work to make sure the system can boot from either disk of the mirror worked.
2022-11-03 It seems the rcu_sched messages stopped after I reseated SATA cables
In the beginning of October I shut down the home server conway and reseated the SATA cables in the hopes of having less problems with timeouts. And started the whole system again to also fix other problems. About a month later I think this worked, I've never seen a rcu_sched message again since doing that reseating.
2022-10-12 Peeking a bit at Kea DHCP server
Yesterday I learned that ISC DHCP server will be end of life at the end of this year. For a package I started using around 1998 with one of the first versions I expected a bit more announcement time. At the same time I'm so used to using ISC dhcp server in my home network I never subscribed to any mailing list or other announcements about ISC dhcp server, it's just there, I can configure it to do what I want including supporting pxe booting systems for installation or diagnostics or supporting special dhcp options for APC AP7920 rackmount power distribution units. And all the virtual lans of my home network. ISC suggests using Kea DHCP server to replace it in most server implementations. Kea DHCP server should be able to get a lot of configuration data from databases and allow for dynamic updates of the configuration. That is an improvement over ISC dhcp as it is at the moment, which needs a full restart for every change. So time to peek at Kea DHCP server. I don't think ISC dhcp server will be unavailable after 31 December 2022 but I don't expect updates anymore and when a good replacement is normalized I expect ISC dhcp server to slowly fall away from linux distributions. Currently it's not even available for Debian or Devuan stable or oldstable strangely enough. I wonder what happened there. But there are distribution packages for debian buster at Cloudsmith - Repositories - ISC - Internet Systems Consortium (isc) - kea-2-3 (kea-2-3) - Packages / format:deb. Time to install the latest and let apt fix the dependencies:koos@testrouter:~$ sudo dpkg -i isc-kea-dhcp4_2.3.1-isc20220928105532_amd64.deb isc-kea-dhcp6_2.3.1-isc20220928105532_amd64.deb isc-kea-common_2.3.1-isc20220928105532_amd64.deb Selecting previously unselected package isc-kea-dhcp4. (Reading database ... 46609 files and directories currently installed.) Preparing to unpack isc-kea-dhcp4_2.3.1-isc20220928105532_amd64.deb ... Unpacking isc-kea-dhcp4 (2.3.1-isc20220928105532) ... Selecting previously unselected package isc-kea-dhcp6. Preparing to unpack isc-kea-dhcp6_2.3.1-isc20220928105532_amd64.deb ... Unpacking isc-kea-dhcp6 (2.3.1-isc20220928105532) ... Selecting previously unselected package isc-kea-common. Preparing to unpack isc-kea-common_2.3.1-isc20220928105532_amd64.deb ... Unpacking isc-kea-common (2.3.1-isc20220928105532) ... dpkg: dependency problems prevent configuration of isc-kea-dhcp4: isc-kea-dhcp4 depends on libboost-system1.67.0; however: Package libboost-system1.67.0 is not installed. [..] koos@testrouter:~$ sudo apt install -f Reading package lists... Done Building dependency tree Reading state information... Done Correcting dependencies... Done The following additional packages will be installed: libboost-system1.67.0 liblog4cplus-1.1-9 libmariadb3 libpq5 mariadb-common mysql-common The following NEW packages will be installed: libboost-system1.67.0 liblog4cplus-1.1-9 libmariadb3 libpq5 mariadb-common mysql-common 0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded. 3 not fully installed or removed. Need to get 760 kB of archives. After this operation, 4,001 kB of additional disk space will be used. [..]Looking at the sample configuration makes me think I can do this with a text-based configuration (it's actually JSON) and get it going fast. For my home network that is probably the best solution. Kea does have options to use MariaDB or PostgreSQL backends for storage which does look really nice for my home network but at the same time adds a dependency and a layer of complexity. I can see IPAM systems totally going to Kea DHCP and give a full interface on managing the databases directly including APIs for adding/removing objects as they are added in other systems.
2022-10-09 I moved the 1-wire interface to a Raspberry Pi
After the problems with detaching and attaching the USB 1-wire interface from a kvm virtual machine to fix an interference issue showed up again I decided to move the USB 1-wire interface to a different machine, one where kvm virtualisation isn't in the mix. The closest available machine that can deal with the 1-wire interface is a Raspberry Pi which also has other monitoring tasks. This move worked fine and the 1-wire temperatures are showing up again in influxdb. I decided not to update the rrdtool temperature database. I will have to find time to migrate the rrdtool history to influxdb. Ideally there will be some aggregation for older measurements but I'd like an "infinite" archive of a daily average.
2022-09-24 Can't live-attach a USB device to a kvm virtual host after upgrades
I have a DS2490 USB 1-wire interface on the home server conway which is rerouted to one of the virtual machines so that that virtual machine can read the sensors on the 1-wire network. This rerouting works when the machine is started, the DS2490 USB 1-wire shows up in the virtual machine fine. From time to time this DS2490 USB 1-wire interface gets confused when I am transmitting on the radio so the solution is to detach it from the virtual machine, unplug it from the server, plug it in again and attach it to the virtual machine again. Today this had to be done and I got an unexpected error message:root@conway:~# virsh attach-device --live gosper /etc/onewire-for-gosper.xml error: Failed to attach device from /etc/onewire-for-gosper.xml error: internal error: unable to execute QEMU command 'device_add': failed to find host usb device 2:8In logfile /var/log/libvirt/libvirtd.log:2022-09-24 21:16:38.655+0000: 10923: error : qemuMonitorJSONCheckError:395 : internal error: unable to execute QEMU command 'device_add': failed to find host usb device 2:8To be complete about it: usb device 2:8 is exactly the right one!root@conway:~# lsusb | grep 2490 Bus 002 Device 008: ID 04fa:2490 Dallas Semiconductor DS1490F 2-in-1 Fob, 1-Wire adapterThis seems to be new since I upgraded the homeserver to Devuan beowulf giving me versions:| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Descripti +++-=====================================-===============-============-========= ii libvirt-clients 5.0.0-4+deb10u1 amd64 Programs ii libvirt-daemon 5.0.0-4+deb10u1 amd64 Virtualiz un libvirt-daemon-driver-storage-gluster(no descr un libvirt-daemon-driver-storage-rbd (no descr un libvirt-daemon-driver-storage-zfs (no descr ii libvirt-daemon-system 5.0.0-4+deb10u1 amd64 Libvirt d ii libvirt-glib-1.0-0:amd64 1.0.0-1 amd64 libvirt G ii libvirt0:amd64 5.0.0-4+deb10u1 amd64 library f First idea: AppArmor
The first search result that came up was Bug #1552241 “libvirt-bin apparmor settings for usb host device” : Bugs : libvirt package : Ubuntu. So I tried changing the /etc/apparmor.d/abstractions/libvirt-qemu file. After a few tries and reading the warnings in the rest of the file I made sure the source was AppArmor by completely disabling it. The error did not go away so I reverted the libvirt-qemu rules to the original settings, restarted AppArmor and kept debugging.Second idea: usb rights
Based on QEMU USB passthrough broken after Ubuntu 18.04 upgrade I added udev rules to make sure group libvirt-qemu had read and write rights on the usb device, with /lib/udev/rules.d/51-qemu-usb-passthrough.rules containing:SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="04fa", ATTRS{idProduct}=="2490", MODE="0664", GROUP="libvirt-qemu"And doing theroot@conway:~# udevadm control --reload-rulesAnd verifying the resulting rule:root@conway:~# udevadm test -a -p $(udevadm info -q path -n /dev/bus/usb/002/008) calling: test version 3.2.9 This program is for debugging only, it does not run any program specified by a RUN key. It may show incorrect results, because some values may be different, or not available at a simulation run. [..] GROUP 110 /lib/udev/rules.d/51-qemu-usb-passthrough.rules:1 MODE 0664 /lib/udev/rules.d/51-qemu-usb-passthrough.rules:1 handling device node '/dev/bus/usb/002/008', devnum=c189:135, mode=0664, uid=0, gid=110 [..]Indeed the right groupid, but still the same error message when trying the attach-device command.Interesting find: it's specific to the virtual machine that had the device before
Small update: I can attach the USB device to a different host and detach it from that host again. I just can't attach it to the 'original' host again. I also posted this question on serverfault: Can't live-attach a USB device to a kvm virtual host again after upgrades. Update: After a complete reboot of the homeserver the USB 1-wire interface worked again (as I could imagine). But after another interference problem it's now in the same state again. I did change the definition in both the virthost configuration and the xml file from managed='no' to managed='yes' before the reboot but that hasn't helped. Contents of the /etc/onewire-for-gosper.xml file now:<hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x04fa'/> <product id='0x2490'/> </source> </hostdev>
2022-09-04 Minecraft java edition has issues with IPv6 and CPU
Our child plays minecraft regularly. The start was with the Microsoft minecraft edition but recently the java edition became available too without paying again. I have set up the bedrock server for the Microsoft minecraft edition to make it possible to play with other people outside the house. So the most recent request was to do this for the java edition too. I don't know much about minecraft but I can do enough with just some websearching and finding a howto. So I started with How to Set Up a Dedicated Minecraft Server on Linux which seems to be a way to try to sell dedicated servers but I have enough server hardware here at home so I just used the same virtual machine which ran the minecraft bedrock server. It turned out the default-jdk resulted in openjdk-11 getting installed and this resulted in not being able to run the latest minecraft java server. I switched to openjdk-17-jre-headless because I only need the runtime and I never want to run the graphical stuff, so that saved a lot in needed libraries and other overhead. The server started fine, but the minecraft java edition couldn't connect to it when trying to connect by name, but gave no usable error message. That's a different rant. I checked on the server side and saw the listening socket in dual-stack mode. With tcpdump I soon found out the minecraft java edition starts with the IPv4 address and gives up when that fails. The solution was to remove the IPv4 address (A record) from the name, flush the dns cache and after that it worked. This does mean that when friends want to connect that are behind ISPs that only support legacy Internet addresses they will have a different problem.Read the rest of Minecraft java edition has issues with IPv6 and CPU
2022-07-07 Upgraded the homeserver OS to devuan beowulf and replaced the UPS battery
A few days ago I noticed some interesting messages in the apcupsd log:Read the rest of Upgraded the homeserver OS to devuan beowulf and replaced the UPS battery2022-07-04 10:14:15 +0200 Battery disconnected. 2022-07-04 10:16:24 +0200 Battery reattached. 2022-07-04 10:19:53 +0200 Battery disconnected. 2022-07-04 10:20:40 +0200 Battery reattached.Checking the UPS statistics showed me the battery charge was dropping to about 7 % of the capacity while the mains power was available. Since the battery was over 5 years old I ordered a new one to replace it. This battery was scheduled to arrive Wednesday at the start of the afternoon and I wanted to do an upgrade of the Linux distribution on the main homeserver conway anyway because devuan ascii is already 'oldoldstable' (but still getting updates). The homeserver uses 2 disks with the main lvm volume in a raid-1. The /boot and /boot/efi filesystems are mirrored by hand with the idea to end with a working boot even when 1 disk is missing. After the shutdown and replacing the UPS battery I switched the server on again and I was greeted by a grub prompt and nothing to boot. After a few tries I got the system booting again, after that I went searching for what went wrong. Eventually I found out the file /boot/efi/EFI/devuan/grub.cfg pointed at a missing filesystem. I found out the best way to fix this is with# dpkg-reconfigure grub-efi-amd64both with /dev/sda and /dev/sdb filesystems on /boot and /boot/efi.
2022-06-15 Grafana 9.0.0 available, and downgraded back to 8.5.6 and back up...
I saw an upgrade of Grafana available, which turned out to be 9.0.0. When upgrading to 9.0.0 I get...An unexpected error happened TypeError: Object(...) is not a function t@[..]public/plugins/grafana-clock-panel/module.js:2:15615 WithTheme(undefined)So maybe the grafana-clock-panel plugin isn't compatible with 9.0.0 somehow. Downgrading to 8.5.6 and reloading everything makes it work again. Update: I checked the grafana-clock-panel plugin and noticed it hadn't been updated. So I did that update and retried grafana 9.0.0, and that made everything run smoothly again.
2022-05-09 Grafana alerts working again
After reverting to Grafana 8.4.7 for a while because alerts were failing in Grafana 8.5.0 I had a look at the available version today and saw version 8.5.2. I assumed the problem with DataSourceNoData errors was fixed by now and did the upgrade. Indeed the alerts are seeing data fine now and I trust they will work when needed.
2022-04-23 Grafana alerts failing in 8.5.0
Items with tag homeserver before 2022-04-23I installed Grafana from their debian repository, so I get updates via the normal apt update / apt dist-upgrade process. Since upgrading to version 8.5.0 the alerts were all firing because of 'DatasourceNoData' errors. According to Alert Rule returned no data (after upgrade to 8.5.0) #48128 other people are seeing this too. For now I downgraded to version 8.4.7 where things work fine and I'll see if a newer version shows up.