| # This file overrides some defaults for systemd |
| # |
| # - Change the RestartSec from 100ms to 1s. |
| # When a service hits a failure, our new debug collection service kicks |
| # in. When a core file is involved, it's been found that generating 5 core |
| # files within ~500ms puts a huge strain on the BMC. Also, if the bmc is |
| # going to get a fix on a restart of a service, the more time the better |
| # (think retries on device driver scenarios). |
| # |
| # - Change the StartLimitBurst to 2 |
| # Five just seems excessive for our services in openbmc. In all fail |
| # scenarios seen so far (other then with phosphor-hwmon), either |
| # restarting once does the job or restarting all 5 times does not help |
| # and we just end up hitting the 5 limit anyway. |
| # |
| # - Change the StartLimitIntervalSec to 240s |
| # The BMC CPU performance is already challenged. When a service is |
| # failing and a core dump is being generated and collected into a dump, |
| # it's even more challenged. Recent failures have shown situations where |
| # the service does not fail again until 15-20 seconds after the initial |
| # failure which means the default of 10s for this results in the service |
| # being restarted indefinitely. |
| # Another issue that has cropped up recently is that the DefaultTimeoutStartSec |
| # is 90s. If a service is hitting this timeout repeatedly then there |
| # is a similar issue as noted above. Because of this, the StartLimitIntervalSec |
| # needs to be StartLimitBurst*DefaultTimeoutStartSec + |
| # StartLimitBurst* worst case processing time (30s) |
| # which currently would be 2x90 + 2x30 |
| # |
| # See systemd-system.conf(5) for details on the conf files |
| |
| [Manager] |
| DefaultRestartSec=1s |
| DefaultStartLimitBurst=2 |
| DefaultStartLimitIntervalSec=240s |