power-recovery: bmc and system recovery paths

Provide a mechanism for the power recovery software to not run in
situations where a user has manually intervened to try and recover the
system.

For example, if a system is unresponsive and a user manually resets
it in some way (for example via a pin hole reset), then do not run
the power recovery logic. This allows the user to manually work with
the system to debug and not worry about it automatically doing things
on them.

Change-Id: I3bead46df5d7ad4344d47affc066c7e36379e0db
Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
diff --git a/designs/power-recovery.md b/designs/power-recovery.md
index 71119a5..7b9865a 100644
--- a/designs/power-recovery.md
+++ b/designs/power-recovery.md
@@ -17,6 +17,13 @@
 on? Should it leave it off? Or maybe the user would like the system to
 go to whichever state it was at before the power loss.
 
+There are also instances where the user may not want automatic power recovery
+to occur. For example, some systems have op-panels, and on these op-panels
+there can be a pin hole reset. This is a manual mechanism for the user to
+force a hard reset to the BMC in situations where it is hung or not responding.
+In these situations, the user may wish for the system to not automatically
+power on the system, because they want to debug the reason for the BMC error.
+
 The goal of this design document is to describe how OpenBMC firmware will
 deal with these questions.
 
@@ -37,6 +44,8 @@
 of errors that can occur in this area on systems.
 
 ## Requirements
+
+### Automated Power-On Recovery
 OpenBMC software must ensure it persists the state of power to the chassis so
 it can know what to restore it to if necessary
 
@@ -62,7 +71,21 @@
 recovery function for other areas like firmware update scenarios where a
 certain power on behavior is desired once an update has completed.
 
+### BMC and System Recovery Paths
+In situations where the BMC or the system have gotten into a bad state, and
+the user has initiated some form of manual reset which is detectable by the
+BMC as being user initiated, the BMC software must:
+- Fill in appropriate `RebootCause` within the [BMC state interface][bmc-state]
+  - At a minimum, `PinholeReset` will be added. Others can be added as needed
+- Log an error indicating a user initiated forced reset has occurred
+- Not log an error indicating a blackout has occurred if chassis power was on
+  prior to the pin hole reset
+- Not implement any power recovery policy on the system
+- Turn power recovery back on once BMC has a normal reboot
+
 ## Proposed Design
+
+### Automated Power-On Recovery
 An application will be run after the chassis and host states have been
 determined which will only run if the chassis power is not on.
 
@@ -75,6 +98,33 @@
 This function will be hosted in phosphor-state-manger and potentially
 x86-power-control.
 
+### BMC and System Recovery Paths
+The BMC state manager application currently looks at a file in the
+sysfs to try and determine the cause of a BMC reboot. It then puts this
+reason in the `RebootCause` property.
+
+One possible cause of a BMC reset is an external reset (EXTRST). There are
+a variety of reasons an external reset can occur. Some systems are adding
+GPIOs to provide additional detail on these types of resets.
+
+A new GPIO name will be added to the [device-tree-gpio-naming.md][dev-tree]
+which reports whether a pin hole reset has occurred on the previous reboot of
+the BMC. The BMC state manager application will enhance its support of the
+`RebootCause` to look for this GPIO and if present, read it and set
+`RebootCause` accordingly when it can either not determine the reason for
+the reboot via the sysfs or sysfs reports a EXTRST reason (in which case
+the GPIO will be utilized to enhance the reboot reason).
+
+If the power recovery software sees the `PinholeReset` reason within the
+`RebootCause` then it will not implement any of its policy. Future BMC
+reboots which are not pin hole reset caused, will cause `RebootCause` to go
+back to a default and therefore power recovery policy will be reenabled on that
+BMC boot.
+
+The phosphor-state-manager chassis software will not log a blackout error
+if it sees the `PinholeReset` reason (or any other reason that indicates a user
+initiated a reset of the system).
+
 ## Alternatives Considered
 None, this is a pretty basic feature that does not have a lot of alternatives
 (other then just not doing it).
@@ -96,5 +146,12 @@
 Validate that when multiple black outs occur, the firmware continues to try
 and power on the system when policy is `AlwaysOn` or `Restore`.
 
+On supported systems, a pin hole reset should be done with a system that has
+a policy set to always power on. Tester should verify system does not
+automatically power on after a pin hole reset. Verify it does automatically
+power on when a normal reboot of the BMC is done.
+
 [pdi-restore]:https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Control/Power/RestorePolicy.interface.yaml
 [state-mgr]: https://github.com/openbmc/phosphor-state-manager
+[bmc-state]:https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/State/BMC.interface.yaml
+[dev-tree]:https://github.com/openbmc/docs/blob/master/designs/device-tree-gpio-naming.md