2022-08-28 Maintenance for the pi4raz igate / learning about esp32 power requirements
Since last Thursday the aprs server at aprs.pa4tw.nl is down. I used that aprs server for the weather station and for the igate. The change for the weather station was one word in a script, for the igate I had to remember how to change this with the Arduino development environment set up to support the esp32 board. The easiest way seemed to be from the computer, but every time after the igate started the running process after the setup it crashed and rebooted itself. I spent a lot of time looking for the answers, added debug statements all over the code and ended up in the WiFi initialization code as the place of crashing. And that was the hint, according to Crash when trying to connect to wifi - Issue #3935 - espressif/arduino-esp32 this is a sign of a power shortage. This is purely my fault: the pi4raz igate design calls for an external power supply feeding it. The solution was to go back to the separate USB power supply and not use a USB hub connected to the computer. Now the igate is started again and visible on the APRS network: track PE4KH-10 on aprs.fi.
2022-07-07 Upgraded the homeserver OS to devuan beowulf and replaced the UPS battery
A few days ago I noticed some interesting messages in the apcupsd log:Read the rest of Upgraded the homeserver OS to devuan beowulf and replaced the UPS battery2022-07-04 10:14:15 +0200 Battery disconnected. 2022-07-04 10:16:24 +0200 Battery reattached. 2022-07-04 10:19:53 +0200 Battery disconnected. 2022-07-04 10:20:40 +0200 Battery reattached.Checking the UPS statistics showed me the battery charge was dropping to about 7 % of the capacity while the mains power was available. Since the battery was over 5 years old I ordered a new one to replace it. This battery was scheduled to arrive Wednesday at the start of the afternoon and I wanted to do an upgrade of the Linux distribution on the main homeserver conway anyway because devuan ascii is already 'oldoldstable' (but still getting updates). The homeserver uses 2 disks with the main lvm volume in a raid-1. The /boot and /boot/efi filesystems are mirrored by hand with the idea to end with a working boot even when 1 disk is missing. After the shutdown and replacing the UPS battery I switched the server on again and I was greeted by a grub prompt and nothing to boot. After a few tries I got the system booting again, after that I went searching for what went wrong. Eventually I found out the file /boot/efi/EFI/devuan/grub.cfg pointed at a missing filesystem. I found out the best way to fix this is with# dpkg-reconfigure grub-efi-amd64both with /dev/sda and /dev/sdb filesystems on /boot and /boot/efi.
2022-06-05 Having multiple wsjt-x instances available from CQRLOG
I'm currently also doing some contacts with a special event station call and I wanted to separate the wsjt-x history for my normal call from the history for the special event station call, just like I split the log databases in CQRLOG. For the non-amateurradio persons: I have my own callsign, PE4KH which is linked to me. It is also possible to have one extra temporary callsign. Those are usually linked to an event or some other reason for a 'special' callsign. Temporary callsigns in the Netherlands have either the digit 6 or more than one digit. There is an option for multiple profiles in wsjt-x but those are just for the settings (including callsign) but not for the logging location. This means all different profiles share the same history and will show the same countries as 'new' or 'already contacted'. When I was looking at the options for starting wsjt-x with different settings I noticed the -r --rig-name <rig-name> Whereis for multi-instance support. option in the help. With this option, all the logging is in ~/.local/share/WSJT-X - <rig-name>/ which is what I want. The next challenge is to start wsjt-x with the extra commandline paramater from CQRLOG. It seems the 'path to wsjt-x' setting doesn't accept commandline parameters. So I created a script ~/bin/ses-wsjtx with:#!/bin/sh /usr/bin/wsjtx -r sesChanged the 'path to wsjt-x' setting to /home/koos/bin/ses-wsjtx and now I get what I want.
2022-03-18 Using grafana for alerting too
I've been playing with grafana for about a year since starting with updating my statistics gathering and I keep seeing new options and updates in grafana. Grafana recently got some new options for alerting and I am trying a few of those. Alerts for things that are a real problem and can cause other problems are a good start. Based on some earlier problems I keep an eye on some filesystems that are over 90% full. Today I read Three DDoS attacks on my personal website found via Three DDoS attacks on my personal website : r/homelab reddit and this made me wonder about overloads on my webserver. The easiest way to detect problems with web serving I could think of is to look at the queue size in haproxy which is monitored in influxdb/grafana anyway for nice graphs of website traffic. I did have a time with too high queues for backend webservers. But that was when the backend server was completely broken due to a filesystem problem so that was a logical reason. It would be nice if I could iterate alerts, like 'for the root filesystem of every monitored system'. Or at least copy them changing only the system name in the rules and alerts.
2022-03-10 Dear linux kernel, I know what I want with nomodeset
Just noted on bootup of a virtual machine:Mar 10 19:42:14 turing kernel: [ 0.181861] You have booted with nomodeset. This means your GPU drivers are DISABLED Mar 10 19:42:14 turing kernel: [ 0.181862] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly Mar 10 19:42:14 turing kernel: [ 0.181862] Unless you actually understand what nomodeset does, you should reboot without enabling itIt's a virtual machine which does server tasks. Anything more than 80x25 VGA text mode is pure overkill. It's currently the default card in qemu (Cirrus CLGD 5446 PCI VGA card), I could try the virtio VGA card to see if that saves on memory/cpu.
2022-02-23 Filtering logs to only get relevant reports
I want to know if something goes wrong but with the number of (virtual) servers here at home it is not possible to check all logs constantly. So the main machines use logcheck to find the interesting error messages and the rest gets filtered out. Ideally that leaves no messages, but I do want to know about patterns that indicate attacks so I do get messages constantly about ssh attack attempts and weird nameserver requests or misconfigured nameserver responses. Recently I've been checking the resulting reports again carefully and noticed some more patterns that could be filtered. And I found two misconfigurations that I solved. Normally those misconfigurations would drown in the noise of the log, only to be found if I was looking for something else. Now it started to stand out after filtering out a lot of messages that are to be expected.
2021-12-28 I tried to upgrade my laptop to an SSD.. and failed
After fixing the server hardware I had some time due to the Christmas holidays to look at my laptop, a Dell. It's getting a bit aged (originally from January 2016) and especially the disk is getting slow. Due to the upgrade of SSD storage in the homeserver I still have two 240 gigabyte solid state drives. So I tried to migrate the laptop to one of those solid state drives. Which was interesting in a number of ways: there are two operating systems to migrate: Linux and Windows 10 and the harddisk is 500 gigabyte, so 240 gigabyte would need an amount of cleanup before all could be moved. I thought the harddisk was 320 gigabyte, so the downgrade from 500 to 240 gigabyte was worse than I expected. I did some reading on migrating Windows 10 to an SSD and found out I needed a cloning tool. Navigating between subscriptions and expensive versions I found Macrium Reflect which according to How to Copy Your Windows Installation to an SSD - PCMag should be able to do this. I have an external USB to IDE/SATA interface which is great for this kind of work. So the SSD started in that slot. First windows didn't want to delete the EFI partition from the GPT partition table. Since the original disk has an msdos partition table and the laptop doesn't have UEFI firmware I booted linux and created partitions as I wanted them with the right type. After that I created the Linux swapspace and filesystem and copied all Linux data to the filesystem. After that the Macrium Reflect tool would not copy Windows 10 partitions to existing partitions so I had to delete the two Windows 10 partitions. I have no idea why, but this laptop has a Dell partition, a windows partition named RECOVERY and a windows partition named OS. Deleting the two windows partitions on the target disk also made the linux swap and root filesystem disappear without any questions whether that was a good idea. After that it was several hours to copy the windows filesystems. After that was done I used the windows disk and partition manager to resize the big partition to leave space for the linux installation. I booted Linux again, created the swap partitions and root filesystem again and copied the data again. At least rsync with the right options is faster than Macrium Reflect. After that I tried to install grub on the new disk with the right options and did the first test boot of the new disk. Open laptop underside, take out disk carrier, swap disk, put the disk carrier back in and close the laptop again. No dice: grub stopped really early. I did more searching and found I needed to use grub-install /dev/sdb --skip-fs-probe --boot-directory=/mnt/newinstall/boot so time to remove the new drive again, revert to the old, rerun grub with those options, remove old drive, insert new drive and try again. This time the menu showed that I wanted but I got an error about accessing the disk by uuid. After that I also tried windows on the SSD but that gave an error it needed the Windows recovery boot. So again back to the old disk and looking at options for creating a recovery boot USB stick. The 'Create recovery disk' program was busy with disk i/o for about 15 minutes and reported the USB stick for recovery has to be at least 16 Gigabytes which I didn't have available. At this point I gave up. This process took most of the afternoon and it started to feel frustrating.
2021-12-27 Raid-1 on the homeserver rebuilt
After seeing read errors on one disk in the raid-1 of the homeserver I ordered a replacement SSD of a different brand and exactly the same size. It arrived today, and I did the work to replace the suspect disk. First set the old disk as failed and removed from the array. And note the complete serial number on a piece of paper to make sure I removed the faulty disk. After that the server was shut down, disconnected from a lot of cables, dragged from the homerack in the attic and I worked on it. It took a while to open the side with the SSDs (below the mainboard) and with two exactly the same SSDs it was a 50% chance which one to remove. After removing the disk tray and unscrewing the SSD from the disk tray I was able to read the physical label on the underside and I guessed right. After that the new disk was installed, the case closed again and dragged back to its place and cables connected again. After boot it came all up fine. After bootup I partitioned the new disk, added it to the raid-1 again and set up the EFI and Linux boot partitions on the disk. Last step was to setup the boot menu with efibootmgr to set both disks as bootable.
2021-12-21 New ssd for the homeserver ordered
I noticed syslog messages I don't like:[17200683.290921] md: data-check of RAID array md127 [17200683.291277] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [17200683.291619] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [17200683.291935] md: using 128k window, over a total of 937253184k. [17201245.784689] ata2.00: exception Emask 0x0 SAct 0x1fe00000 SErr 0x0 action 0x0 [17201245.785175] ata2.00: irq_stat 0x40000008 [17201245.785465] ata2.00: failed command: READ FPDMA QUEUED [17201245.785766] ata2.00: cmd 60/80:a8:00:52:51/00:00:0c:00:00/40 tag 21 ncq dma 65536 in res 41/40:20:60:52:51/00:00:0c:00:00/00 Emask 0x409 (media error) <F> [17201245.786402] ata2.00: status: { DRDY ERR } [17201245.786737] ata2.00: error: { UNC } [17201245.787281] ata2.00: configured for UDMA/133 [17201245.787619] sd 1:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [17201245.787966] sd 1:0:0:0: [sdb] tag#21 Sense Key : Medium Error [current] [17201245.788317] sd 1:0:0:0: [sdb] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed [17201245.788689] sd 1:0:0:0: [sdb] tag#21 CDB: Read(10) 28 00 0c 51 52 00 00 00 80 00 [17201245.789123] blk_update_request: I/O error, dev sdb, sector 206656096 [17201245.789530] ata2: EH completeAnd a number of other errors on sdb. Time to replace it! I ordered a new ssd. This time a different brand. Current configuration is with 2 Kingston drives with very close serial numbers, so maybe the other drive will give similar issues soon. The check of the raid1 mirror was also showing differences. I'm waiting for the replacement ssd to show up, and at that moment I will remove the suspect ssd from the array and replace it. Update 2021-12-24: Writing about the order helped speed things up: I just received notification the replacement ssd is being sent. Which will not show up until after Christmas. I also noticed the problematic Kingston still has warranty, so maybe I can get a replacement for that one too. They came in about 1.5 years ago when I upgraded the storage on the homeserver.
2021-11-22 Resizing a filesystem through several layers
Items with tag linux before 2021-11-22For work I use a supplied laptop with Windows 10. For some of my work I want to have a Linux environment available so I have VirtualBox with a Linux virtual machine running. And because some of the work I do on that Linux virtual machine I use full-disk encryption. And the installation was done with the encrypted lvm setting. Resizing the filesystem because it was getting full turned out to be a lot of steps! After stopping the virtual machine I wanted to resize the disk from the VirtualBox media manager but that gave an error. After that I tried the commandline, giving about the same error:> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" modifymedium rotterdam.vdi --resize 32768 0%... Progress state: VBOX_E_NOT_SUPPORTED VBoxManage.exe: error: Failed to resize medium VBoxManage.exe: error: Resizing to new size 34359738368 is not yet supported for medium 'C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam.vdi' VBoxManage.exe: error: Details: code VBOX_E_NOT_SUPPORTED (0x80bb0009), component MediumWrap, interface IMedium VBoxManage.exe: error: Context: "enum RTEXITCODE __cdecl handleModifyMedium(struct HandlerArg *)" at line 816 of file VBoxManageDisk.cppIt turns out the .vdi is the wrong type for dynamic resizing. Solution: clone it! The new .vdi will have the dynamic type automatically and there is a "before" .vdi now on disk to revert to if anything goes wrong.> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" showhdinfo rotterdam.vdi UUID: f832b0b4-8738-491d-bd9c-291d755a4af7 Parent UUID: base State: created Type: normal (base) Location: C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam.vdi Storage format: VDI Format variant: fixed default Capacity: 26067 MBytes Size on disk: 26070 MBytes Encryption: disabled Property: AllocationBlockSize=1048576 In use by VMs: rotterdam (UUID: 2454dadb-a82d-4d74-bbea-8dcf2b2d1bf1) > "\Program Files\Oracle\VirtualBox\VBoxManage.exe" clonehd rotterdam.vdi rotterdam-2.vdi 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% Clone medium created in format 'VDI'. UUID: 835e2f75-c19d-4e98-865e-d7acf1359fc7 > "\Program Files\Oracle\VirtualBox\VBoxManage.exe" showhdinfo rotterdam-2.vdi UUID: 835e2f75-c19d-4e98-865e-d7acf1359fc7 Parent UUID: base State: created Type: normal (base) Location: C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam-2.vdi Storage format: VDI Format variant: dynamic default Capacity: 26067 MBytes Size on disk: 26069 MBytes Encryption: disabled Property: AllocationBlockSize=1048576 > "\Program Files\Oracle\VirtualBox\VBoxManage.exe" modifymedium rotterdam-2.vdi --resize 32768 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%I moved the old .vdi out of the way and added the new .vdi to the virtual machine and started it again. This worked fine, but the root volume wasn't any bigger (yet). Next steps: enlarge the extended partition and the Linux partition in it on disk using parted. You really have to know what you are doing here, so I'm not just going to give a cut-and-paste sample. Now I can resize the encrypted and mounted volume! With the right passphrase.# cryptsetup resize /dev/mapper/sda5_cryptAnd grow the 'physical' (ahem) volume:# pvresize /dev/mapper/sda5_cryptResize the logical volume:# lvextend /dev/rotterdam-vg/root -l +1674And finally resize the mounted filesystem:# resize2fs /dev/mapper/rotterdam--vg-rootAnd the filesystem has grown, and looks good in a fsck on the next boot. So solid state disk → Windows filesystem → vdi file → VirtualBox → disk in Linux virtual machine → partition → lukscrypt → logical volume manager → volume → filesystem.