News items for tag linux - Koos van den Hout

2022-07-07 Upgraded the homeserver OS to devuan beowulf and replaced the UPS battery
A few days ago I noticed some interesting messages in the apcupsd log:
2022-07-04 10:14:15 +0200  Battery disconnected.
2022-07-04 10:16:24 +0200  Battery reattached.
2022-07-04 10:19:53 +0200  Battery disconnected.
2022-07-04 10:20:40 +0200  Battery reattached.
Checking the UPS statistics showed me the battery charge was dropping to about 7 % of the capacity while the mains power was available. Since the battery was over 5 years old I ordered a new one to replace it.

This battery was scheduled to arrive Wednesday at the start of the afternoon and I wanted to do an upgrade of the Linux distribution on the main homeserver conway anyway because devuan ascii is already 'oldoldstable' (but still getting updates).

The homeserver uses 2 disks with the main lvm volume in a raid-1. The /boot and /boot/efi filesystems are mirrored by hand with the idea to end with a working boot even when 1 disk is missing.

After the shutdown and replacing the UPS battery I switched the server on again and I was greeted by a grub prompt and nothing to boot. After a few tries I got the system booting again, after that I went searching for what went wrong. Eventually I found out the file /boot/efi/EFI/devuan/grub.cfg pointed at a missing filesystem. I found out the best way to fix this is with
# dpkg-reconfigure grub-efi-amd64
both with /dev/sda and /dev/sdb filesystems on /boot and /boot/efi.

I was hoping the complete upgrade made my rcu_sched problems go away which have caused serious problems before but they haven't gone away.

Again I see this in a virtual machine:
[62988.027890] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[62988.036492] rcu:     0-...!: (1 GPs behind) idle=656/1/0x4000000000000002 softirq=592140/600579 fqs=0
[62988.036943] rcu:     (detected by 0, t=2 jiffies, g=2877673, q=701)
[62988.037327] NMI backtrace for cpu 0
But this time I see on the hardware:
[63178.224120] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[63178.224211] ata1.00: failed command: FLUSH CACHE EXT
[63178.224255] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 3
                        res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[63178.224351] ata1.00: status: { DRDY }
[63178.224379] ata1: hard resetting link
[63183.576100] ata1: link is slow to respond, please be patient (ready=0)
[63183.696118] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[63183.696333] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT1.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
[63183.696400] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT1.SPT0._GTF, AE_NOT_FOUND (20180810/psparse-516)
[63183.696597] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT1.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
[63183.696634] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT1.SPT0._GTF, AE_NOT_FOUND (20180810/psparse-516)
[63183.696717] ata1.00: configured for UDMA/133
[63183.696732] ata1.00: retrying FLUSH 0xea Emask 0x4
[63183.696772] ata1: EH complete
which suggests to me I should try whether using a different channel from SATA1 would change things.

Tags: , ,
2022-06-05 Having multiple wsjt-x instances available from CQRLOG
I'm currently also doing some contacts with a special event station call and I wanted to separate the wsjt-x history for my normal call from the history for the special event station call, just like I split the log databases in CQRLOG.

For the non-amateurradio persons: I have my own callsign, PE4KH which is linked to me. It is also possible to have one extra temporary callsign. Those are usually linked to an event or some other reason for a 'special' callsign. Temporary callsigns in the Netherlands have either the digit 6 or more than one digit.

There is an option for multiple profiles in wsjt-x but those are just for the settings (including callsign) but not for the logging location. This means all different profiles share the same history and will show the same countries as 'new' or 'already contacted'.

When I was looking at the options for starting wsjt-x with different settings I noticed the -r --rig-name <rig-name> Where is for multi-instance support. option in the help. With this option, all the logging is in ~/.local/share/WSJT-X - <rig-name>/ which is what I want.

The next challenge is to start wsjt-x with the extra commandline paramater from CQRLOG. It seems the 'path to wsjt-x' setting doesn't accept commandline parameters. So I created a script ~/bin/ses-wsjtx with:
#!/bin/sh

/usr/bin/wsjtx -r ses
Changed the 'path to wsjt-x' setting to /home/koos/bin/ses-wsjtx and now I get what I want.

Tags: , ,
2022-03-18 Using grafana for alerting too
I've been playing with grafana for about a year since starting with updating my statistics gathering and I keep seeing new options and updates in grafana.

Grafana recently got some new options for alerting and I am trying a few of those. Alerts for things that are a real problem and can cause other problems are a good start. Based on some earlier problems I keep an eye on some filesystems that are over 90% full.

Today I read Three DDoS attacks on my personal website found via Three DDoS attacks on my personal website : r/homelab reddit and this made me wonder about overloads on my webserver. The easiest way to detect problems with web serving I could think of is to look at the queue size in haproxy which is monitored in influxdb/grafana anyway for nice graphs of website traffic.

I did have a time with too high queues for backend webservers. But that was when the backend server was completely broken due to a filesystem problem so that was a logical reason.

It would be nice if I could iterate alerts, like 'for the root filesystem of every monitored system'. Or at least copy them changing only the system name in the rules and alerts.

Tags: ,
2022-03-10 Dear linux kernel, I know what I want with nomodeset
Just noted on bootup of a virtual machine:
Mar 10 19:42:14 turing kernel: [    0.181861] You have booted with nomodeset. This means your GPU drivers are DISABLED
Mar 10 19:42:14 turing kernel: [    0.181862] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly
Mar 10 19:42:14 turing kernel: [    0.181862] Unless you actually understand what nomodeset does, you should reboot without enabling it
It's a virtual machine which does server tasks. Anything more than 80x25 VGA text mode is pure overkill. It's currently the default card in qemu (Cirrus CLGD 5446 PCI VGA card), I could try the virtio VGA card to see if that saves on memory/cpu.

Tags: , ,
2022-02-23 Filtering logs to only get relevant reports
I want to know if something goes wrong but with the number of (virtual) servers here at home it is not possible to check all logs constantly. So the main machines use logcheck to find the interesting error messages and the rest gets filtered out.

Ideally that leaves no messages, but I do want to know about patterns that indicate attacks so I do get messages constantly about ssh attack attempts and weird nameserver requests or misconfigured nameserver responses.

Recently I've been checking the resulting reports again carefully and noticed some more patterns that could be filtered. And I found two misconfigurations that I solved. Normally those misconfigurations would drown in the noise of the log, only to be found if I was looking for something else. Now it started to stand out after filtering out a lot of messages that are to be expected.

Tags: , ,
2021-12-28 I tried to upgrade my laptop to an SSD.. and failed
After fixing the server hardware I had some time due to the Christmas holidays to look at my laptop, a Dell. It's getting a bit aged (originally from January 2016) and especially the disk is getting slow. Due to the upgrade of SSD storage in the homeserver I still have two 240 gigabyte solid state drives. So I tried to migrate the laptop to one of those solid state drives. Which was interesting in a number of ways: there are two operating systems to migrate: Linux and Windows 10 and the harddisk is 500 gigabyte, so 240 gigabyte would need an amount of cleanup before all could be moved.

I thought the harddisk was 320 gigabyte, so the downgrade from 500 to 240 gigabyte was worse than I expected.

I did some reading on migrating Windows 10 to an SSD and found out I needed a cloning tool. Navigating between subscriptions and expensive versions I found Macrium Reflect which according to How to Copy Your Windows Installation to an SSD - PCMag should be able to do this.

I have an external USB to IDE/SATA interface which is great for this kind of work. So the SSD started in that slot.

First windows didn't want to delete the EFI partition from the GPT partition table. Since the original disk has an msdos partition table and the laptop doesn't have UEFI firmware I booted linux and created partitions as I wanted them with the right type.

After that I created the Linux swapspace and filesystem and copied all Linux data to the filesystem.

After that the Macrium Reflect tool would not copy Windows 10 partitions to existing partitions so I had to delete the two Windows 10 partitions. I have no idea why, but this laptop has a Dell partition, a windows partition named RECOVERY and a windows partition named OS. Deleting the two windows partitions on the target disk also made the linux swap and root filesystem disappear without any questions whether that was a good idea.

After that it was several hours to copy the windows filesystems. After that was done I used the windows disk and partition manager to resize the big partition to leave space for the linux installation.

I booted Linux again, created the swap partitions and root filesystem again and copied the data again. At least rsync with the right options is faster than Macrium Reflect.

After that I tried to install grub on the new disk with the right options and did the first test boot of the new disk. Open laptop underside, take out disk carrier, swap disk, put the disk carrier back in and close the laptop again.

No dice: grub stopped really early. I did more searching and found I needed to use grub-install /dev/sdb --skip-fs-probe --boot-directory=/mnt/newinstall/boot so time to remove the new drive again, revert to the old, rerun grub with those options, remove old drive, insert new drive and try again. This time the menu showed that I wanted but I got an error about accessing the disk by uuid.

After that I also tried windows on the SSD but that gave an error it needed the Windows recovery boot.

So again back to the old disk and looking at options for creating a recovery boot USB stick. The 'Create recovery disk' program was busy with disk i/o for about 15 minutes and reported the USB stick for recovery has to be at least 16 Gigabytes which I didn't have available.

At this point I gave up. This process took most of the afternoon and it started to feel frustrating.

Tags: , ,
2021-12-27 Raid-1 on the homeserver rebuilt
After seeing read errors on one disk in the raid-1 of the homeserver I ordered a replacement SSD of a different brand and exactly the same size. It arrived today, and I did the work to replace the suspect disk.

First set the old disk as failed and removed from the array. And note the complete serial number on a piece of paper to make sure I removed the faulty disk.

After that the server was shut down, disconnected from a lot of cables, dragged from the homerack in the attic and I worked on it. It took a while to open the side with the SSDs (below the mainboard) and with two exactly the same SSDs it was a 50% chance which one to remove. After removing the disk tray and unscrewing the SSD from the disk tray I was able to read the physical label on the underside and I guessed right.

After that the new disk was installed, the case closed again and dragged back to its place and cables connected again. After boot it came all up fine.

After bootup I partitioned the new disk, added it to the raid-1 again and set up the EFI and Linux boot partitions on the disk.

Last step was to setup the boot menu with efibootmgr to set both disks as bootable.

Tags: , ,
2021-12-21 New ssd for the homeserver ordered
I noticed syslog messages I don't like:
[17200683.290921] md: data-check of RAID array md127
[17200683.291277] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[17200683.291619] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[17200683.291935] md: using 128k window, over a total of 937253184k.
[17201245.784689] ata2.00: exception Emask 0x0 SAct 0x1fe00000 SErr 0x0 action 0x0
[17201245.785175] ata2.00: irq_stat 0x40000008
[17201245.785465] ata2.00: failed command: READ FPDMA QUEUED
[17201245.785766] ata2.00: cmd 60/80:a8:00:52:51/00:00:0c:00:00/40 tag 21 ncq dma 65536 in
                           res 41/40:20:60:52:51/00:00:0c:00:00/00 Emask 0x409 (media error) <F>
[17201245.786402] ata2.00: status: { DRDY ERR }
[17201245.786737] ata2.00: error: { UNC }
[17201245.787281] ata2.00: configured for UDMA/133
[17201245.787619] sd 1:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[17201245.787966] sd 1:0:0:0: [sdb] tag#21 Sense Key : Medium Error [current] 
[17201245.788317] sd 1:0:0:0: [sdb] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
[17201245.788689] sd 1:0:0:0: [sdb] tag#21 CDB: Read(10) 28 00 0c 51 52 00 00 00 80 00
[17201245.789123] blk_update_request: I/O error, dev sdb, sector 206656096
[17201245.789530] ata2: EH complete
And a number of other errors on sdb. Time to replace it! I ordered a new ssd. This time a different brand. Current configuration is with 2 Kingston drives with very close serial numbers, so maybe the other drive will give similar issues soon.

The check of the raid1 mirror was also showing differences. I'm waiting for the replacement ssd to show up, and at that moment I will remove the suspect ssd from the array and replace it.

Update 2021-12-24: Writing about the order helped speed things up: I just received notification the replacement ssd is being sent. Which will not show up until after Christmas. I also noticed the problematic Kingston still has warranty, so maybe I can get a replacement for that one too. They came in about 1.5 years ago when I upgraded the storage on the homeserver.

Tags: , ,
2021-11-22 Resizing a filesystem through several layers
For work I use a supplied laptop with Windows 10. For some of my work I want to have a Linux environment available so I have VirtualBox with a Linux virtual machine running. And because some of the work I do on that Linux virtual machine I use full-disk encryption. And the installation was done with the encrypted lvm setting.

Resizing the filesystem because it was getting full turned out to be a lot of steps! After stopping the virtual machine I wanted to resize the disk from the VirtualBox media manager but that gave an error. After that I tried the commandline, giving about the same error:
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" modifymedium rotterdam.vdi --resize 32768
0%...
Progress state: VBOX_E_NOT_SUPPORTED
VBoxManage.exe: error: Failed to resize medium
VBoxManage.exe: error: Resizing to new size 34359738368 is not yet supported for medium 'C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam.vdi'
VBoxManage.exe: error: Details: code VBOX_E_NOT_SUPPORTED (0x80bb0009), component MediumWrap, interface IMedium
VBoxManage.exe: error: Context: "enum RTEXITCODE __cdecl handleModifyMedium(struct HandlerArg *)" at line 816 of file VBoxManageDisk.cpp
It turns out the .vdi is the wrong type for dynamic resizing. Solution: clone it! The new .vdi will have the dynamic type automatically and there is a "before" .vdi now on disk to revert to if anything goes wrong.
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" showhdinfo rotterdam.vdi
UUID:           f832b0b4-8738-491d-bd9c-291d755a4af7
Parent UUID:    base
State:          created
Type:           normal (base)
Location:       C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam.vdi
Storage format: VDI
Format variant: fixed default
Capacity:       26067 MBytes
Size on disk:   26070 MBytes
Encryption:     disabled
Property:       AllocationBlockSize=1048576
In use by VMs:  rotterdam (UUID: 2454dadb-a82d-4d74-bbea-8dcf2b2d1bf1)
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" clonehd rotterdam.vdi rotterdam-2.vdi
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Clone medium created in format 'VDI'. UUID: 835e2f75-c19d-4e98-865e-d7acf1359fc7
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" showhdinfo rotterdam-2.vdi
UUID:           835e2f75-c19d-4e98-865e-d7acf1359fc7
Parent UUID:    base
State:          created
Type:           normal (base)
Location:       C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam-2.vdi
Storage format: VDI
Format variant: dynamic default
Capacity:       26067 MBytes
Size on disk:   26069 MBytes
Encryption:     disabled
Property:       AllocationBlockSize=1048576
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" modifymedium rotterdam-2.vdi --resize 32768
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
I moved the old .vdi out of the way and added the new .vdi to the virtual machine and started it again. This worked fine, but the root volume wasn't any bigger (yet). Next steps: enlarge the extended partition and the Linux partition in it on disk using parted. You really have to know what you are doing here, so I'm not just going to give a cut-and-paste sample.

Now I can resize the encrypted and mounted volume! With the right passphrase.
# cryptsetup resize /dev/mapper/sda5_crypt
And grow the 'physical' (ahem) volume:
# pvresize /dev/mapper/sda5_crypt
Resize the logical volume:
# lvextend /dev/rotterdam-vg/root -l +1674
And finally resize the mounted filesystem:
# resize2fs /dev/mapper/rotterdam--vg-root
And the filesystem has grown, and looks good in a fsck on the next boot.

So solid state disk → Windows filesystem → vdi file → VirtualBox → disk in Linux virtual machine → partition → lukscrypt → logical volume manager → volume → filesystem.

Tags: , ,
2021-11-20 Trying to get DKIM running
My recent issues with getting my e-mail delivered made me look at DKIM signing of outgoing e-mail messages. To not break things I have started testing this with outgoing e-mail from camp-wireless.com which normally publishes it doesn't send mail at all, so the first steps were to change that policy: changing the MX record and SPF record.

I started reading into configuring sendmail with dkim and found OpenDKIM which can work as a sendmail milter.

Based on How to configure DKIM & SPF & DMARC on Sendmail for multiple domains on CentOS 7 I took the same steps for my Devuan installation.

In Devuan (and probably Debian/Ubuntu) there is a opendkim package for the service and a opendkim-tools package for the associated tools. I needed the second one to get the opendkim-genkey command. I can imagine keys being generated/managed on a different system than the actual signing server.

After configuring this for camp-wireless.com including generating a keypair and publishing the public key via DNS I started sending test messages but had no luck. It turned out the sending host has to be in the InternalHosts table of opendkim. I added the address ranges and after that things started to work.

After fixing that I got the results I wanted:
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=camp-wireless.com;
        s=gosper; t=1637408594;
        bh=YewDlohOT9RvALNQw4cVukwSpmAm5tXtGWJxLDUJZa4=;
        h=To:From:Subject:Date:From;
        b=GGMEeCY5xmgFDBQ5NzgZfAVvyr+ctBKOTGpwMqq1W/tgJYMY8WyzaM5XfEiWijGKr
        abBN5WLbiyoXsd62lNVxcDOBUYWzkOnwZCw5WgdlzZJSIxgRdnWMQLxL1E9BJdudwR
        zriX1/vAaR34RFM1kiSVp0dqa98/Kxfdp2DPPRDsAVJ6sdxqz1YHD4odveDcLEQQZv
        jUMNPVmQps90mZORtdKtOOWQP0RYkZvmjNsJZuwIrRkFvUzOmAVT6MDDf4kZ35lbes
        oAp0me8tQgoffNLRQpO7akSKhbh1Kn5fAv50WILhM0rK/ChkWqvOrcfgIwbSSPduzM
        DI1w23jCnwaKQ==
And a verification:
Authentication-Results: xs4all.nl; spf=pass smtp.mailfrom=camp-wireless.com;
dkim=pass header.d=camp-wireless.com
I was wondering about roaming users who authenticate to my mailserver and send messages that way. In a first test those messages get signed too. That means I can start signing mail from idefix.net and other production domain names!
Read the rest of Trying to get DKIM running

Tags: , ,

IPv6 check

Running test...
, reachable as koos+website@idefix.net. PGP encrypted e-mail preferred. PGP key 5BA9 368B E6F3 34E4 local copy PGP key 5BA9 368B E6F3 34E4 via keyservers

RSS
Meningen zijn die van mezelf, wat ik schrijf is beschermd door auteursrecht. Sommige publicaties bevatten een expliciete vermelding dat ze ongevraagd gedeeld mogen worden.
My opinions are my own, what I write is protected by copyrights. Some publications contain an explicit license statement which allows sharing without asking permission.
Other webprojects: Camp Wireless, wireless Internet access at campsites, The Virtual Bookcase, book reviews
This page generated by $Id: newstag.cgi,v 1.37 2022/02/15 21:48:19 koos Exp $ in 0.026112 seconds.