Andrew Geissler | 51d4014 | 2017-09-07 14:32:45 -0500 | [diff] [blame] | 1 | # This file overrides some defaults for systemd |
| 2 | # |
| 3 | # - Change the RestartSec from 100ms to 1s. |
| 4 | # When a service hits a failure, our new debug collection service kicks |
| 5 | # in. When a core file is involved, it's been found that generating 5 core |
| 6 | # files within ~500ms puts a huge strain on the BMC. Also, if the bmc is |
| 7 | # going to get a fix on a restart of a service, the more time the better |
| 8 | # (think retries on device driver scenarios). |
| 9 | # |
Andrew Geissler | 11a2b4d | 2018-01-03 13:42:01 -0600 | [diff] [blame] | 10 | # - Change the StartLimitBurst to 2 |
Andrew Geissler | 51d4014 | 2017-09-07 14:32:45 -0500 | [diff] [blame] | 11 | # Five just seems excessive for our services in openbmc. In all fail |
| 12 | # scenarios seen so far (other then with phosphor-hwmon), either |
| 13 | # restarting once does the job or restarting all 5 times does not help |
| 14 | # and we just end up hitting the 5 limit anyway. |
| 15 | # |
Andrew Geissler | ee52526 | 2018-09-17 10:36:08 -0500 | [diff] [blame] | 16 | # - Change the StartLimitIntervalSec to 240s |
Andrew Geissler | 3fedf8d | 2018-04-26 11:07:37 -0700 | [diff] [blame] | 17 | # The BMC CPU performance is already challenged. When a service is |
| 18 | # failing and a core dump is being generated and collected into a dump, |
| 19 | # it's even more challenged. Recent failures have shown situations where |
| 20 | # the service does not fail again until 15-20 seconds after the initial |
| 21 | # failure which means the default of 10s for this results in the service |
Andrew Geissler | ee52526 | 2018-09-17 10:36:08 -0500 | [diff] [blame] | 22 | # being restarted indefinitely. |
| 23 | # Another issue that has cropped up recently is that the DefaultTimeoutStartSec |
| 24 | # is 90s. If a service is hitting this timeout repeatedly then there |
| 25 | # is a similar issue as noted above. Because of this, the StartLimitIntervalSec |
| 26 | # needs to be StartLimitBurst*DefaultTimeoutStartSec + |
| 27 | # StartLimitBurst* worst case processing time (30s) |
| 28 | # which currently would be 2x90 + 2x30 |
Andrew Geissler | 3fedf8d | 2018-04-26 11:07:37 -0700 | [diff] [blame] | 29 | # |
Andrew Geissler | 51d4014 | 2017-09-07 14:32:45 -0500 | [diff] [blame] | 30 | # See systemd-system.conf(5) for details on the conf files |
| 31 | |
| 32 | [Manager] |
| 33 | DefaultRestartSec=1s |
Andrew Geissler | 11a2b4d | 2018-01-03 13:42:01 -0600 | [diff] [blame] | 34 | DefaultStartLimitBurst=2 |
Andrew Geissler | ee52526 | 2018-09-17 10:36:08 -0500 | [diff] [blame] | 35 | DefaultStartLimitIntervalSec=240s |