bmc-reset: ensure recovery when host unresponsive
This new feature will ensure an error is logged and the host is put in
quiesce state in situations where the host was attempting to boot before
a BMC reboot, and crashed while the BMC was going through its reboot.
Going to the quiesce state will kick in whatever recovery has been
defined for the system.
There are windows during the boot of the host where it requires the BMC
be available, and if not, it will crash itself. A recovery and clean
reboot is much simpler than handling all of the different corner cases
that can occur in this scenario.
Change-Id: Id1f2f326d08d4d77a38fffb0cfd2227fb91453e1
Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
diff --git a/designs/bmc-reset-with-host-up.md b/designs/bmc-reset-with-host-up.md
index 3aab505..bbfa7b7 100644
--- a/designs/bmc-reset-with-host-up.md
+++ b/designs/bmc-reset-with-host-up.md
@@ -111,6 +111,19 @@
- obmc-chassis-powerreset@.target.require
- obmc-host-reset@.target.requires
+### Automated Recovery when host does not respond
+
+A separate service and application will be created within phosphor-state-manager
+to execute the following logic in situations where chassis power is on
+but the host has failed to respond to any of the different mechanisms to
+communicate with it:
+- If chassis power on (/run/openbmc/chassis@%i-on)
+- And host is off (!ConditionPathExists=!/run/openbmc/host@%i-on)
+- And restored BootProgress is not None
+- Then (assume host was booting before BMC reboot)
+ - Log error indicating situation
+ - Move host to Quiesce and allow automated recovery to kick in
+
### Note on custom mechanism for IBM systems
IBM systems will utilize a processor CFAM register. The specific register is
**Mailbox scratch register 12**.