Today I ran into an issue related to systemd and I decided to try to fix it
without too much cursing. The result was a number of google searches ending
up on unix.stackexchange.com
but eventually I fixed the problem.
At work we use splunk for security monitoring and one of the indexers failed
to start the splunk processes after a reboot. On browsing the systemd boot
log with journalctl -b -l
I discovered that the main issue was that
creating files in /opt/splunk
failed. This was due to an interesting
race condition: splunk may start as soon as target network.target
has been reached, but mounting /opt
over iscsi also needs
to start. So the unit file has been updated to:
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
The next problem was the systemctl start Splunkd.service
some intricate way. I had a look at the logging and saw that it was actually
trying to restart the service and failed at killing one of the old processes.
It turned out the /opt/splunk/var/run/splunk/splunkd.pid
file had old
contents and one of the PIDs in that file was now in use by a kernel thread.
Those you can't kill, the restart failed and therefore the service did not
start at all. Solution: remove the .pid file.