| Andrew Geissler | 51d4014 | 2017-09-07 14:32:45 -0500 | [diff] [blame] | 1 | # This file overrides some defaults for systemd | 
|  | 2 | # | 
|  | 3 | # - Change the RestartSec from 100ms to 1s. | 
|  | 4 | # When a service hits a failure, our new debug collection service kicks | 
|  | 5 | # in.  When a core file is involved, it's been found that generating 5 core | 
|  | 6 | # files within ~500ms puts a huge strain on the BMC.  Also, if the bmc is | 
|  | 7 | # going to get a fix on a restart of a service, the more time the better | 
|  | 8 | # (think retries on device driver scenarios). | 
|  | 9 | # | 
| Andrew Geissler | 11a2b4d | 2018-01-03 13:42:01 -0600 | [diff] [blame] | 10 | # - Change the StartLimitBurst to 2 | 
| Andrew Geissler | 51d4014 | 2017-09-07 14:32:45 -0500 | [diff] [blame] | 11 | # Five just seems excessive for our services in openbmc.  In all fail | 
|  | 12 | # scenarios seen so far (other then with phosphor-hwmon), either | 
|  | 13 | # restarting once does the job or restarting all 5 times does not help | 
|  | 14 | # and we just end up hitting the 5 limit anyway. | 
|  | 15 | # | 
| Andrew Geissler | ee52526 | 2018-09-17 10:36:08 -0500 | [diff] [blame] | 16 | # - Change the StartLimitIntervalSec to 240s | 
| Andrew Geissler | 3fedf8d | 2018-04-26 11:07:37 -0700 | [diff] [blame] | 17 | # The BMC CPU performance is already challenged. When a service is | 
|  | 18 | # failing and a core dump is being generated and collected into a dump, | 
|  | 19 | # it's even more challenged. Recent failures have shown situations where | 
|  | 20 | # the service does not fail again until 15-20 seconds after the initial | 
|  | 21 | # failure which means the default of 10s for this results in the service | 
| Andrew Geissler | ee52526 | 2018-09-17 10:36:08 -0500 | [diff] [blame] | 22 | # being restarted indefinitely. | 
|  | 23 | # Another issue that has cropped up recently is that the DefaultTimeoutStartSec | 
|  | 24 | # is 90s. If a service is hitting this timeout repeatedly then there | 
|  | 25 | # is a similar issue as noted above. Because of this, the StartLimitIntervalSec | 
|  | 26 | # needs to be StartLimitBurst*DefaultTimeoutStartSec + | 
|  | 27 | # StartLimitBurst* worst case processing time (30s) | 
|  | 28 | # which currently would be 2x90 + 2x30 | 
| Andrew Geissler | 3fedf8d | 2018-04-26 11:07:37 -0700 | [diff] [blame] | 29 | # | 
| Andrew Geissler | 51d4014 | 2017-09-07 14:32:45 -0500 | [diff] [blame] | 30 | # See systemd-system.conf(5) for details on the conf files | 
|  | 31 |  | 
|  | 32 | [Manager] | 
|  | 33 | DefaultRestartSec=1s | 
| Andrew Geissler | 11a2b4d | 2018-01-03 13:42:01 -0600 | [diff] [blame] | 34 | DefaultStartLimitBurst=2 | 
| Andrew Geissler | ee52526 | 2018-09-17 10:36:08 -0500 | [diff] [blame] | 35 | DefaultStartLimitIntervalSec=240s |