Upgraded the homeserver OS to devuan beowulf and replaced the UPS battery / 2022-07-07

2022-07-07 Upgraded the homeserver OS to devuan beowulf and replaced the UPS battery
A few days ago I noticed some interesting messages in the apcupsd log:
2022-07-04 10:14:15 +0200  Battery disconnected.
2022-07-04 10:16:24 +0200  Battery reattached.
2022-07-04 10:19:53 +0200  Battery disconnected.
2022-07-04 10:20:40 +0200  Battery reattached.
Checking the UPS statistics showed me the battery charge was dropping to about 7 % of the capacity while the mains power was available. Since the battery was over 5 years old I ordered a new one to replace it.

This battery was scheduled to arrive Wednesday at the start of the afternoon and I wanted to do an upgrade of the Linux distribution on the main homeserver conway anyway because devuan ascii is already 'oldoldstable' (but still getting updates).

The homeserver uses 2 disks with the main lvm volume in a raid-1. The /boot and /boot/efi filesystems are mirrored by hand with the idea to end with a working boot even when 1 disk is missing.

After the shutdown and replacing the UPS battery I switched the server on again and I was greeted by a grub prompt and nothing to boot. After a few tries I got the system booting again, after that I went searching for what went wrong. Eventually I found out the file /boot/efi/EFI/devuan/grub.cfg pointed at a missing filesystem. I found out the best way to fix this is with
# dpkg-reconfigure grub-efi-amd64
both with /dev/sda and /dev/sdb filesystems on /boot and /boot/efi.

I was hoping the complete upgrade made my rcu_sched problems go away which have caused serious problems before but they haven't gone away.

Again I see this in a virtual machine:
[62988.027890] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[62988.036492] rcu:     0-...!: (1 GPs behind) idle=656/1/0x4000000000000002 softirq=592140/600579 fqs=0
[62988.036943] rcu:     (detected by 0, t=2 jiffies, g=2877673, q=701)
[62988.037327] NMI backtrace for cpu 0
But this time I see on the hardware:
[63178.224120] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[63178.224211] ata1.00: failed command: FLUSH CACHE EXT
[63178.224255] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 3
                        res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[63178.224351] ata1.00: status: { DRDY }
[63178.224379] ata1: hard resetting link
[63183.576100] ata1: link is slow to respond, please be patient (ready=0)
[63183.696118] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[63183.696333] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT1.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
[63183.696400] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT1.SPT0._GTF, AE_NOT_FOUND (20180810/psparse-516)
[63183.696597] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT1.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
[63183.696634] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT1.SPT0._GTF, AE_NOT_FOUND (20180810/psparse-516)
[63183.696717] ata1.00: configured for UDMA/133
[63183.696732] ata1.00: retrying FLUSH 0xea Emask 0x4
[63183.696772] ata1: EH complete
which suggests to me I should try whether using a different channel from SATA1 would change things.

Tags: , ,

IPv6 check

Running test...
, reachable as koos+website@idefix.net. PGP encrypted e-mail preferred. PGP key 5BA9 368B E6F3 34E4 local copy PGP key 5BA9 368B E6F3 34E4 via keyservers

RSS
Meningen zijn die van mezelf, wat ik schrijf is beschermd door auteursrecht. Sommige publicaties bevatten een expliciete vermelding dat ze ongevraagd gedeeld mogen worden.
My opinions are my own, what I write is protected by copyrights. Some publications contain an explicit license statement which allows sharing without asking permission.
Other webprojects: Camp Wireless, wireless Internet access at campsites, The Virtual Bookcase, book reviews
This page generated by $Id: newsitem.cgi,v 1.57 2022/02/15 21:48:18 koos Exp $ in 0.006296 seconds.