News items for tag linux - Koos van den Hout

2022-03-18 Using grafana for alerting too
I've been playing with grafana for about a year since starting with updating my statistics gathering and I keep seeing new options and updates in grafana.

Grafana recently got some new options for alerting and I am trying a few of those. Alerts for things that are a real problem and can cause other problems are a good start. Based on some earlier problems I keep an eye on some filesystems that are over 90% full.

Today I read Three DDoS attacks on my personal website found via Three DDoS attacks on my personal website : r/homelab reddit and this made me wonder about overloads on my webserver. The easiest way to detect problems with web serving I could think of is to look at the queue size in haproxy which is monitored in influxdb/grafana anyway for nice graphs of website traffic.

I did have a time with too high queues for backend webservers. But that was when the backend server was completely broken due to a filesystem problem so that was a logical reason.

It would be nice if I could iterate alerts, like 'for the root filesystem of every monitored system'. Or at least copy them changing only the system name in the rules and alerts.

Tags: ,
2022-03-10 Dear linux kernel, I know what I want with nomodeset
Just noted on bootup of a virtual machine:
Mar 10 19:42:14 turing kernel: [    0.181861] You have booted with nomodeset. This means your GPU drivers are DISABLED
Mar 10 19:42:14 turing kernel: [    0.181862] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly
Mar 10 19:42:14 turing kernel: [    0.181862] Unless you actually understand what nomodeset does, you should reboot without enabling it
It's a virtual machine which does server tasks. Anything more than 80x25 VGA text mode is pure overkill. It's currently the default card in qemu (Cirrus CLGD 5446 PCI VGA card), I could try the virtio VGA card to see if that saves on memory/cpu.

Tags: , ,
2022-02-23 Filtering logs to only get relevant reports
I want to know if something goes wrong but with the number of (virtual) servers here at home it is not possible to check all logs constantly. So the main machines use logcheck to find the interesting error messages and the rest gets filtered out.

Ideally that leaves no messages, but I do want to know about patterns that indicate attacks so I do get messages constantly about ssh attack attempts and weird nameserver requests or misconfigured nameserver responses.

Recently I've been checking the resulting reports again carefully and noticed some more patterns that could be filtered. And I found two misconfigurations that I solved. Normally those misconfigurations would drown in the noise of the log, only to be found if I was looking for something else. Now it started to stand out after filtering out a lot of messages that are to be expected.

Tags: , ,
2021-12-28 I tried to upgrade my laptop to an SSD.. and failed
After fixing the server hardware I had some time due to the Christmas holidays to look at my laptop, a Dell. It's getting a bit aged (originally from January 2016) and especially the disk is getting slow. Due to the upgrade of SSD storage in the homeserver I still have two 240 gigabyte solid state drives. So I tried to migrate the laptop to one of those solid state drives. Which was interesting in a number of ways: there are two operating systems to migrate: Linux and Windows 10 and the harddisk is 500 gigabyte, so 240 gigabyte would need an amount of cleanup before all could be moved.

I thought the harddisk was 320 gigabyte, so the downgrade from 500 to 240 gigabyte was worse than I expected.

I did some reading on migrating Windows 10 to an SSD and found out I needed a cloning tool. Navigating between subscriptions and expensive versions I found Macrium Reflect which according to How to Copy Your Windows Installation to an SSD - PCMag should be able to do this.

I have an external USB to IDE/SATA interface which is great for this kind of work. So the SSD started in that slot.

First windows didn't want to delete the EFI partition from the GPT partition table. Since the original disk has an msdos partition table and the laptop doesn't have UEFI firmware I booted linux and created partitions as I wanted them with the right type.

After that I created the Linux swapspace and filesystem and copied all Linux data to the filesystem.

After that the Macrium Reflect tool would not copy Windows 10 partitions to existing partitions so I had to delete the two Windows 10 partitions. I have no idea why, but this laptop has a Dell partition, a windows partition named RECOVERY and a windows partition named OS. Deleting the two windows partitions on the target disk also made the linux swap and root filesystem disappear without any questions whether that was a good idea.

After that it was several hours to copy the windows filesystems. After that was done I used the windows disk and partition manager to resize the big partition to leave space for the linux installation.

I booted Linux again, created the swap partitions and root filesystem again and copied the data again. At least rsync with the right options is faster than Macrium Reflect.

After that I tried to install grub on the new disk with the right options and did the first test boot of the new disk. Open laptop underside, take out disk carrier, swap disk, put the disk carrier back in and close the laptop again.

No dice: grub stopped really early. I did more searching and found I needed to use grub-install /dev/sdb --skip-fs-probe --boot-directory=/mnt/newinstall/boot so time to remove the new drive again, revert to the old, rerun grub with those options, remove old drive, insert new drive and try again. This time the menu showed that I wanted but I got an error about accessing the disk by uuid.

After that I also tried windows on the SSD but that gave an error it needed the Windows recovery boot.

So again back to the old disk and looking at options for creating a recovery boot USB stick. The 'Create recovery disk' program was busy with disk i/o for about 15 minutes and reported the USB stick for recovery has to be at least 16 Gigabytes which I didn't have available.

At this point I gave up. This process took most of the afternoon and it started to feel frustrating.

Tags: , ,
2021-12-27 Raid-1 on the homeserver rebuilt
After seeing read errors on one disk in the raid-1 of the homeserver I ordered a replacement SSD of a different brand and exactly the same size. It arrived today, and I did the work to replace the suspect disk.

First set the old disk as failed and removed from the array. And note the complete serial number on a piece of paper to make sure I removed the faulty disk.

After that the server was shut down, disconnected from a lot of cables, dragged from the homerack in the attic and I worked on it. It took a while to open the side with the SSDs (below the mainboard) and with two exactly the same SSDs it was a 50% chance which one to remove. After removing the disk tray and unscrewing the SSD from the disk tray I was able to read the physical label on the underside and I guessed right.

After that the new disk was installed, the case closed again and dragged back to its place and cables connected again. After boot it came all up fine.

After bootup I partitioned the new disk, added it to the raid-1 again and set up the EFI and Linux boot partitions on the disk.

Last step was to setup the boot menu with efibootmgr to set both disks as bootable.

Tags: , ,
2021-12-21 New ssd for the homeserver ordered
I noticed syslog messages I don't like:
[17200683.290921] md: data-check of RAID array md127
[17200683.291277] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[17200683.291619] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[17200683.291935] md: using 128k window, over a total of 937253184k.
[17201245.784689] ata2.00: exception Emask 0x0 SAct 0x1fe00000 SErr 0x0 action 0x0
[17201245.785175] ata2.00: irq_stat 0x40000008
[17201245.785465] ata2.00: failed command: READ FPDMA QUEUED
[17201245.785766] ata2.00: cmd 60/80:a8:00:52:51/00:00:0c:00:00/40 tag 21 ncq dma 65536 in
                           res 41/40:20:60:52:51/00:00:0c:00:00/00 Emask 0x409 (media error) <F>
[17201245.786402] ata2.00: status: { DRDY ERR }
[17201245.786737] ata2.00: error: { UNC }
[17201245.787281] ata2.00: configured for UDMA/133
[17201245.787619] sd 1:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[17201245.787966] sd 1:0:0:0: [sdb] tag#21 Sense Key : Medium Error [current] 
[17201245.788317] sd 1:0:0:0: [sdb] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
[17201245.788689] sd 1:0:0:0: [sdb] tag#21 CDB: Read(10) 28 00 0c 51 52 00 00 00 80 00
[17201245.789123] blk_update_request: I/O error, dev sdb, sector 206656096
[17201245.789530] ata2: EH complete
And a number of other errors on sdb. Time to replace it! I ordered a new ssd. This time a different brand. Current configuration is with 2 Kingston drives with very close serial numbers, so maybe the other drive will give similar issues soon.

The check of the raid1 mirror was also showing differences. I'm waiting for the replacement ssd to show up, and at that moment I will remove the suspect ssd from the array and replace it.

Update 2021-12-24: Writing about the order helped speed things up: I just received notification the replacement ssd is being sent. Which will not show up until after Christmas. I also noticed the problematic Kingston still has warranty, so maybe I can get a replacement for that one too. They came in about 1.5 years ago when I upgraded the storage on the homeserver.

Tags: , ,
2021-11-22 Resizing a filesystem through several layers
For work I use a supplied laptop with Windows 10. For some of my work I want to have a Linux environment available so I have VirtualBox with a Linux virtual machine running. And because some of the work I do on that Linux virtual machine I use full-disk encryption. And the installation was done with the encrypted lvm setting.

Resizing the filesystem because it was getting full turned out to be a lot of steps! After stopping the virtual machine I wanted to resize the disk from the VirtualBox media manager but that gave an error. After that I tried the commandline, giving about the same error:
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" modifymedium rotterdam.vdi --resize 32768
Progress state: VBOX_E_NOT_SUPPORTED
VBoxManage.exe: error: Failed to resize medium
VBoxManage.exe: error: Resizing to new size 34359738368 is not yet supported for medium 'C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam.vdi'
VBoxManage.exe: error: Details: code VBOX_E_NOT_SUPPORTED (0x80bb0009), component MediumWrap, interface IMedium
VBoxManage.exe: error: Context: "enum RTEXITCODE __cdecl handleModifyMedium(struct HandlerArg *)" at line 816 of file VBoxManageDisk.cpp
It turns out the .vdi is the wrong type for dynamic resizing. Solution: clone it! The new .vdi will have the dynamic type automatically and there is a "before" .vdi now on disk to revert to if anything goes wrong.
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" showhdinfo rotterdam.vdi
UUID:           f832b0b4-8738-491d-bd9c-291d755a4af7
Parent UUID:    base
State:          created
Type:           normal (base)
Location:       C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam.vdi
Storage format: VDI
Format variant: fixed default
Capacity:       26067 MBytes
Size on disk:   26070 MBytes
Encryption:     disabled
Property:       AllocationBlockSize=1048576
In use by VMs:  rotterdam (UUID: 2454dadb-a82d-4d74-bbea-8dcf2b2d1bf1)
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" clonehd rotterdam.vdi rotterdam-2.vdi
Clone medium created in format 'VDI'. UUID: 835e2f75-c19d-4e98-865e-d7acf1359fc7
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" showhdinfo rotterdam-2.vdi
UUID:           835e2f75-c19d-4e98-865e-d7acf1359fc7
Parent UUID:    base
State:          created
Type:           normal (base)
Location:       C:\Users\hout0101\VirtualBox VMs\rotterdam\rotterdam-2.vdi
Storage format: VDI
Format variant: dynamic default
Capacity:       26067 MBytes
Size on disk:   26069 MBytes
Encryption:     disabled
Property:       AllocationBlockSize=1048576
> "\Program Files\Oracle\VirtualBox\VBoxManage.exe" modifymedium rotterdam-2.vdi --resize 32768
I moved the old .vdi out of the way and added the new .vdi to the virtual machine and started it again. This worked fine, but the root volume wasn't any bigger (yet). Next steps: enlarge the extended partition and the Linux partition in it on disk using parted. You really have to know what you are doing here, so I'm not just going to give a cut-and-paste sample.

Now I can resize the encrypted and mounted volume! With the right passphrase.
# cryptsetup resize /dev/mapper/sda5_crypt
And grow the 'physical' (ahem) volume:
# pvresize /dev/mapper/sda5_crypt
Resize the logical volume:
# lvextend /dev/rotterdam-vg/root -l +1674
And finally resize the mounted filesystem:
# resize2fs /dev/mapper/rotterdam--vg-root
And the filesystem has grown, and looks good in a fsck on the next boot.

So solid state disk → Windows filesystem → vdi file → VirtualBox → disk in Linux virtual machine → partition → lukscrypt → logical volume manager → volume → filesystem.

Tags: , ,
2021-11-20 Trying to get DKIM running
My recent issues with getting my e-mail delivered made me look at DKIM signing of outgoing e-mail messages. To not break things I have started testing this with outgoing e-mail from which normally publishes it doesn't send mail at all, so the first steps were to change that policy: changing the MX record and SPF record.

I started reading into configuring sendmail with dkim and found OpenDKIM which can work as a sendmail milter.

Based on How to configure DKIM & SPF & DMARC on Sendmail for multiple domains on CentOS 7 I took the same steps for my Devuan installation.

In Devuan (and probably Debian/Ubuntu) there is a opendkim package for the service and a opendkim-tools package for the associated tools. I needed the second one to get the opendkim-genkey command. I can imagine keys being generated/managed on a different system than the actual signing server.

After configuring this for including generating a keypair and publishing the public key via DNS I started sending test messages but had no luck. It turned out the sending host has to be in the InternalHosts table of opendkim. I added the address ranges and after that things started to work.

After fixing that I got the results I wanted:
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;;
        s=gosper; t=1637408594;
And a verification:
Authentication-Results:; spf=pass;
I was wondering about roaming users who authenticate to my mailserver and send messages that way. In a first test those messages get signed too. That means I can start signing mail from and other production domain names!
Read the rest of Trying to get DKIM running

Tags: , ,
2021-10-22 Naming interfaces used by libvirt virtual machines
The homeserver conway has an ever growing list of network interfaces, also due to adding a DMZ network.

This was starting to look a bit messy, with things like:
koos@conway:~$ /sbin/brctl show brwireless
bridge name     bridge id               STP enabled     interfaces
brwireless              8000.4ccc6a8efa4b       no              enp10s0.3
Solution: name the interfaces in the VM definitions, like:
    <interface type='bridge'>
      <source bridge='brdmz'/>
      <target dev='dmz-minsky'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
And now names are more logical:
koos@conway:~$ /sbin/brctl show brdmz
bridge name     bridge id               STP enabled     interfaces
brdmz           8000.4ccc6a8efa4b       no              dmz-minsky

Tags: , ,
2021-09-28 Debugging a systemd issue .. without having to curse
Today I ran into an issue related to systemd and I decided to try to fix it without too much cursing. The result was a number of google searches ending up on but eventually I fixed the problem.

At work we use splunk for security monitoring and one of the indexers failed to start the splunk processes after a reboot. On browsing the systemd boot log with journalctl -b -l I discovered that the main issue was that creating files in /opt/splunk failed. This was due to an interesting race condition: splunk may start as soon as target has been reached, but mounting /opt over iscsi also needs to start. So the unit file has been updated to:
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start' opt.mount
The next problem was the systemctl start Splunkd.service failing in some intricate way. I had a look at the logging and saw that it was actually trying to restart the service and failed at killing one of the old processes. It turned out the /opt/splunk/var/run/splunk/ file had old contents and one of the PIDs in that file was now in use by a kernel thread. Those you can't kill, the restart failed and therefore the service did not start at all. Solution: remove the .pid file.

Tags: , ,

IPv6 check

Running test...
, reachable as PGP encrypted e-mail preferred. PGP key 5BA9 368B E6F3 34E4 local copy PGP key 5BA9 368B E6F3 34E4 via keyservers

Meningen zijn die van mezelf, wat ik schrijf is beschermd door auteursrecht. Sommige publicaties bevatten een expliciete vermelding dat ze ongevraagd gedeeld mogen worden.
My opinions are my own, what I write is protected by copyrights. Some publications contain an explicit license statement which allows sharing without asking permission.
Other webprojects: Camp Wireless, wireless Internet access at campsites, The Virtual Bookcase, book reviews
This page generated by $Id: newstag.cgi,v 1.37 2022/02/15 21:48:19 koos Exp $ in 0.022133 seconds.