2021-10-22 Naming interfaces used by libvirt virtual machines
The homeserver conway has an ever growing list of network interfaces, also due to adding a DMZ network. This was starting to look a bit messy, with things like:koos@conway:~$ /sbin/brctl show brwireless bridge name bridge id STP enabled interfaces brwireless 8000.4ccc6a8efa4b no enp10s0.3 vnet2 vnet9Solution: name the interfaces in the VM definitions, like:<interface type='bridge'> <source bridge='brdmz'/> <target dev='dmz-minsky'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface>And now names are more logical:koos@conway:~$ /sbin/brctl show brdmz bridge name bridge id STP enabled interfaces brdmz 8000.4ccc6a8efa4b no dmz-minsky enp10s0.11
2021-10-18 Securing the home network: a separate DMZ network
I have a lot of control over the software that runs on systems at home but there are limits to what I can fix and sometimes things are insecure. Things like the recent wordpress brute force attacks show that random 'loud' attackers who don't care about the chance of getting noticed will try. I sometimes do worry about the silent and more targeted attackers. So recently I updated my home network and I now have a DMZ network. At this moment it is a purely virtual network as it doesn't leave the KVM server. Hosts in the DMZ have a default-deny firewall policy to the other inside networks. Specific services on specific hosts have been enabled. I first moved the development webserver, which allowed me to tune those firewall rules and fix some other errors. Now other webservers and other servers offering things to the outside world have moved.
2021-07-12 Checking the rcu_sched messages finds repeated mention of cdrom scans
I was going through some rcu_sched messages and noticed kernel routines related to the cdrom drive showed up a few times in the tasks that were 'behind'.[335894.319961] [<ffffffffc03d864a>] ? scsi_execute+0x12a/0x1d0 [scsi_mod] [335894.320702] [<ffffffffc03da586>] ? scsi_execute_req_flags+0x96/0x100 [scsi_mod] [335894.321820] [<ffffffffc04a7703>] ? sr_check_events+0xc3/0x2c0 [sr_mod] [335894.322551] [<ffffffffb58224a5>] ? __switch_to_asm+0x35/0x70 [335894.323256] [<ffffffffb58224b1>] ? __switch_to_asm+0x41/0x70 [335894.323906] [<ffffffffc047d05a>] ? cdrom_check_events+0x1a/0x30 [cdrom] [335894.324545] [<ffffffffc04a8289>] ? sr_block_check_events+0x89/0xe0 [sr_mod] [335894.325186] [<ffffffffb551a9a9>] ? disk_check_events+0x69/0x150Because the virtual machines don't do anything with the virtual cdrom after the first installation I'm removing them from all virtual machines and see what that does for these messages.
2021-07-08 Another panic in a virtual machine
At the end of this morning I noticed the root filesystem of the shell server on the homeserver had turned itself read-only. Another DRIVER_TIMEOUT error in the kernel messages. And I didn't want to get to a situation with half of the filesystem in lost+found like the previous time. This time I decided to use a different approach in the hopes of getting back to a working system faster. And they worked this time.
After things ran ok for a while I removed the snapshot. I also changed the configuration to use virtio disks and not ide emulation. Ide emulation disks have a timeout (DRIVER_TIMEOUT) after which things are given up. The fact that (emulated) I/O hangs for 30 seconds is bad, but maybe related to the rcu_sched messages. Maybe time for some more updates.
- echo s > /proc/sysrq-trigger to force a sync
- echo u > /proc/sysrq-trigger to force an unmount of all filesystems
- I killed the virtual machine with virsh destroy (the virtualization equivalent of pulling the plug)
- I created a snapshot of the virtual machine disk to make have a state of file system to return to in case of problems in the next steps
- I booted the virtual machine and it had indeed filesystem issues
- So reboot in maintainance mode and did a filesystem check
- After that it booted fine and the filesystem was fine, nothing in lost+found
2021-07-03 Trying a DNSSEC zone signing key (ZSK) rollover
Time to do a zone signing key (ZSK) rollover. That rollover is relatively easy because I don't need to synchronize it with the DS key in the parent zone. I generated a 'successor' key for camp-wireless.com and set a short-notice publication date. The old ZSK has keytag 02908 and the new one has keytag 25619. There is an overlap of a month in which both keys are seen as valid because caching of DNS answers mean there can be signatures created with the old ZSK in caches. Generating a signed zone after the validity of the new ZSK has started shows both ZSKs signed as valid. Old and new zone signing key:Read the rest of Trying a DNSSEC zone signing key (ZSK) rollover; This is a zone-signing key, keyid 2908, for camp-wireless.com. ; Created: 20190704113915 (Thu Jul 4 13:39:15 2019) ; Publish: 20190704113915 (Thu Jul 4 13:39:15 2019) ; Activate: 20190704113915 (Thu Jul 4 13:39:15 2019) ; Inactive: 20210705000000 (Mon Jul 5 02:00:00 2021) ; Delete: 20210805000000 (Thu Aug 5 02:00:00 2021) camp-wireless.com. IN DNSKEY 256 3 13 lXntnbvQqHy+OSG/2RpHEbcYzeUAB2tFE+d5Us9M07Ndw7TI2DF2TIDx vC3bPomCE2102FJSr8/DnzoRiMHreg== ; This is a zone-signing key, keyid 25619, for camp-wireless.com. ; Created: 20210702115321 (Fri Jul 2 13:53:21 2021) ; Publish: 20210703000000 (Sat Jul 3 02:00:00 2021) ; Activate: 20210705000000 (Mon Jul 5 02:00:00 2021) camp-wireless.com. IN DNSKEY 256 3 13 kJpmrljuP7PncZij7G1Yn9xngKe1xUpuONG2XAx8AYXu//qXClAbgg3B bmzyeDpFAw2gDRhjQ7f5o20c1QK9OA==So I generated the key on 2 July 2021, with a set publication date of 3 July 2021. I shortened the prepublication period to avoid problems with other things happening in the near future and today it changed to published. If I generate new signatures again on 5 July 2021 those will use the new key. DNSSEC is a process with lots of things to get your brains around, and a key rollover is one of those things. A key signing key rollover is even harder because uploading of the public key to the registrar has to be kept synchronized with the published information. That is why I am testing all this on camp-wireless.com where it is not a major problem if something fails.
2021-06-03 Uitgaande mail via xs4all gaat binnenkort niet meer zonder authenticatie
Ik beheer mijn eigen mailserver (met al meer dan 25 jaar sendmail in gebruik) en nu kreeg ik ook de brief over de aanpassingen in SMTP van xs4all. Het komt er op neer dat relaying op basis van IP adres gaat verdwijnen. Om een helpdeskramp te voorkomen gaat het uitschakelen per gebruiker. Ik heb een brief gekregen dat ik soms gebruik maak van deze route en dat moet aanpassen. Dat klopt, voor sommige servers was het feit dat ik weinig mail naar die servers stuur een reden om het te blokkeren. Of het ooit ontbreken van een IPv6 reverse pointer. Dat laatste heb ik goed laten zetten toen. Op de website van xs4all staat wel een uitleg: Veilig e-mailen 2020 - xs4all maar daar staat niets bij over sendmail. Thuisservers die mailen zijn blijkbaar niet meer hun doelgroep (mijn Cron Daemon is er anders best goed in!). Ik ben maar eens begonnen met het leeggooien van de lijst in de mailertable. We gaan zien welke domeinen nu onbereikbaar zijn.
2021-03-17 Upgraded another system at home, now serving webpages with TLSv1.3
After the recent work on updating the TLS settings for the webservers at home there was one element missing: TLSv1.3 support. This needed an upgrade of openssl and the 'easy' way to get there was a full upgrade of the server running the external facing proxy. So I took that step yesterday evening. Made a snapshot first and started upgrading devuan ascii to beowulf. After the update a lot of things were broken: I defined a non-standard location for bind9 logging and AppArmor disagreed. Without a working nameserver a lot of stuff breaks internally! So after managing to get on the upgraded system with console I changed the AppArmor rules to allow it. After that things started again. For the next time I manage to break the resolving nameserver: I should remember that avahi/multicast dns works on most systems even when DNS resolving fails. I checked and I can use .local names to get to the right equipment. After checking how everything is running for about a day I threw out the old snapshot.
2021-03-06 Digging for more entropy
Looking at the newest graphs I created with grafana of system statistics I noticed the available entropy was still getting dangerously low from time to time on the system that runs the home server. For some reason this system has no available hardware random number generator. Even after the earlier changes to add more sources of randomness it was sometimes dropping low, especially during dnssec signing operations. This does mean that the encryption processes for TLS in the webservers may also get delayed. Which is really not what I want. Time to update settings on randomsound and haveged: I want a minimum of 2048 bits of available entropy. Sofar, this seems to have the desired effect.
2020-12-13 Makefile logic not working perfectly
I noticed the certificate for idefix.net was expired according to my webbrowser. I dug up the reason and found out the scripts to maintain the ocsp files managed to confuse the Makefile to keep the haproxy certificates updated. The ocsp responses have more updates than the certificates, but a certificate update needs to be processed anyway. So I updated the Makefile in the previous post. The dependency is now certificate-stamp depends on installed certificates, installed certificates depend on copied certificates. And installing the certificate also updates the ocsp response.
2020-12-04 Using a snapshot for an upgrade so I can roll backItems with tag homeserver before 2020-12-04
This evening I upgraded the production webserver from Devuan ascii to Devuan beowulf and to have the option available to roll back everything I created a snapshot and left that running until I was satisfied with the new configuration and everything worked. The steps were simple, found via Commit or revert a Linux LVM snapshot? - serverfault: Before starting the upgrade, create a snapshot:# lvcreate -L 10G -s -n turing_upgrade /dev/conway_ssd/turing_rootDo all the upgrade stuff, reboot, make sure everything works again. The usage of the snapshot went up to 22.38 percent:# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert turing_root conway_ssd owi-aos--- 30.00g turing_upgrade conway_ssd swi-a-s--- 10.00g turing_root 13.17After everything worked, remove the snapshot:# lvremove /dev/conway_ssd/turing_upgrade