2021-09-28 Debugging a systemd issue .. without having to curse 3 weeks ago
Today I ran into an issue related to systemd and I decided to try to fix it without too much cursing. The result was a number of google searches ending up on unix.stackexchange.com but eventually I fixed the problem. At work we use splunk for security monitoring and one of the indexers failed to start the splunk processes after a reboot. On browsing the systemd boot log with journalctl -b -l I discovered that the main issue was that creating files in /opt/splunk failed. This was due to an interesting race condition: splunk may start as soon as target network.target has been reached, but mounting /opt over iscsi also needs network.target to start. So the unit file has been updated to:[Unit] Description=Systemd service file for Splunk, generated by 'splunk enable boot-start' After=network.target opt.mountThe next problem was the systemctl start Splunkd.service failing in some intricate way. I had a look at the logging and saw that it was actually trying to restart the service and failed at killing one of the old processes. It turned out the /opt/splunk/var/run/splunk/splunkd.pid file had old contents and one of the PIDs in that file was now in use by a kernel thread. Those you can't kill, the restart failed and therefore the service did not start at all. Solution: remove the .pid file.