fail-boot: allow host to gracefully shutdown
Utilization of this feature in IBM manufacturing brought to light a
deficiency in this design. When the host firmware is booting and logs an
error that triggers this function, it would be ideal to allow that host
firmware the opportunity to gracefully shut themself down prior to
stopping them and moving the host state to quiesced.
This allows the host firmware to find and log any other relevant errors
(think something like verifying all dimms in the system). It's much
better to find all of these in one boot vs. needing to boot each time to
find each bad dimm.
This also allows host firmware a chance to properly write out any cached
data and handle any other relevant shutdown operations.
Change-Id: I99dd4a6afa2bf943eff87ef8f2fe670ebd264052
Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
diff --git a/designs/fail-boot-on-hw-error.md b/designs/fail-boot-on-hw-error.md
index 3ad7c65..468ebaa 100644
--- a/designs/fail-boot-on-hw-error.md
+++ b/designs/fail-boot-on-hw-error.md
@@ -7,6 +7,7 @@
Other contributors:
Created: Feb 20, 2020
+Updated: Apr 12, 2022
## Problem Description
Some groups, for example a manufacturing team, have a requirement for the BMC
@@ -43,6 +44,8 @@
- The halt must be obvious to the user when it occurs
- The log which causes the halt must be identifiable
- The halt must only stop the chassis/host instance that encountered the error
+ - The halt must allow the host firmware the opportunity to gracefully shut
+ itself down
- The halt must stop the host (run obmc-host-stop@X.target) associated with
the error and attempt to leave system in the fail state (i.e. chassis power
remains on if it is on)
@@ -89,9 +92,10 @@
See the phosphor-logging [callout][4] design for more information on callouts.
-The appropriate `obmc-host-stop@.target` instance will also be called when
-`obmc-bmc-quiesce.target` is started. This ensures the host is stopped as soon as
-the error is discovered.
+A new `obmc-host-graceful-quiesce@.target` systemd target will be started.
+This new target will ensure a graceful shutdown of the host is initated
+and then start the `obmc-host-quiesce@.target` which will stop the host
+and move the host state to Quiesced.
obmcutil will be enhanced to look for these block interfaces and notify the
user via the `obmcutil state` command if a block is enabled and what log
@@ -117,10 +121,6 @@
one other then IBM sees value, we could roll this into the PEL-specific
portion of phosphor-logging.
-A systemd target could be created to do the host stop and quiesce (and any
-other system specific things people need) but at this point there doesn't
-seem to be a ton of value in it. Could always be added later if needed.
-
## Impacts
This will require some additional checking on reported logs but should have
minimal overhead.