Today I ran into an issue related to systemd and I decided to try to fix it
without too much cursing. The result was a number of google searches ending
up on
unix.stackexchange.com
but eventually I fixed the problem.
At work we use splunk for security monitoring and one of the indexers failed
to start the splunk processes after a reboot. On browsing the systemd boot
log with
journalctl -b -l I discovered that the main issue was that
creating files in
/opt/splunk failed. This was due to an interesting
race condition: splunk may start as soon as target
network.target
has been reached, but mounting
/opt over iscsi also needs
network.target to start. So the unit file has been updated to:
[Unit]
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
After=network.target opt.mount
The next problem was the
systemctl start Splunkd.service failing in
some intricate way. I had a look at the logging and saw that it was actually
trying to restart the service and failed at killing one of the old processes.
It turned out the
/opt/splunk/var/run/splunk/splunkd.pid file had old
contents and one of the PIDs in that file was now in use by a kernel thread.
Those you can't kill, the restart failed and therefore the service did not
start at all. Solution: remove the .pid file.