bmc-boot-ready: ensure power on dependencies

IBM has seen a few occurrences where a power on and boot of a system is
requested before the needed BMC services have been started. The goal of
this design is to provide a flexible solution to ensuring all needed BMC
services have started prior to allowing a power on and boot of a system.

Change-Id: Icbc1268903204f0417b2962c3b4c37c57eb4d208
Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
diff --git a/designs/bmc-boot-ready.md b/designs/bmc-boot-ready.md
new file mode 100644
index 0000000..05ea5c8
--- /dev/null
+++ b/designs/bmc-boot-ready.md
@@ -0,0 +1,104 @@
+# BMC Boot Ready
+
+Author: Andrew Geissler (geissonator)
+
+Other contributors:
+
+Created: May 12, 2022
+
+## Problem Description
+There are services which run on the BMC which are required for the BIOS (host
+firmware) to power on and boot the system. The goal of this design is to
+define a mechanism to ensure these dependencies are met before a power on
+or boot is started.
+
+For example, on some system, you can not power on the chassis until the VPD
+has been collected from the VRM's by the BMC to determine their characteristics.
+On other systems, the BIOS service is needed so the host firmware can look
+for any overrides.
+
+Currently, OpenBMC has an undefined behavior in this area. If a particular
+BMC has a large time gap between when the webserver is available and when all
+BMC services have completed running, there is a window there that a user could
+request a power on via the webserver when not all needed services are running.
+
+## Background and References
+
+The mailing list discussion can be found [here][1]. The BMC currently has
+three major [state][2] management interfaces in a system. The BMC, Chassis, and
+Host. Within each state interface, the current state and requested state are
+tracked.
+
+The [BMC][3] state object is considered `Ready` once the systemd
+`multi-user.target` has successfully started all if its services.
+
+There are three options that have been discussed to solve this issue:
+1. D-Bus objects don't exist until the backend is prepared to handle them.
+2. If a user tries something that system is not in proper state to handle then
+   return an error code.
+3. If a user tries something that system is not in proper state to handle then
+   queue it up.
+
+Option 1 is challenging because D-Bus interfaces provided by OpenBMC state
+applications have a mix of read-only properties (like current state) and
+writeable properties that are used to request state changes. Not showing any
+until everything is available could have unknown consequences. This also has
+similar issues to option 2 in that applications and clients must have proper
+code to handle missing interfaces.
+
+Option 2 is challenging because Redfish clients and internal applications like
+the op-panel code now need to properly handle error codes like this. You can
+argue that they already should, but that is definitely not the case with a lot
+of OpenBMC applications and clients.
+
+Option 3 is the most user friendly option. No client or OpenBMC application
+changes would be needed. One concern is that having a system somewhat randomly
+power on at some later point in time could be unexpected. The general consensus
+in this review though has been that this is the most preferred option.
+
+[1]: https://lists.ozlabs.org/pipermail/openbmc/2022-April/030175.html
+[2]: https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/State
+[3]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/State/BMC.interface.yaml
+
+## Requirements
+
+- Queue up chassis and host requested state changes until the BMC is in the
+  proper state to allow the request
+  - What the "proper state" is will be implementation specific but by default
+    phosphor-state-manager will queue all requests until the BMC state has
+    reached `Ready`
+
+## Proposed Design
+
+If a power on or boot request is made to the Chassis or Host state objects and
+the BMC is not at `Ready` then the request will be queued and the state
+management code will begin monitoring for BMC `Ready`. Once reached, the
+requested operation will be automatically executed.
+
+## Alternatives Considered
+The "Background and References" section covered some alternative options
+and the complexity behind them.
+
+Another option is to code the dependencies directly into the services. For
+example, if the power on service requires the vrm vpd collection service,
+encode that dependency in the systemd files. This is easy to say but in practice
+has been challenging. Some OpenBMC services have built in assumptions that
+the multi-user.target and all of it's dependent services have completed prior
+to a power on being started. The general consensus within IBM was that it's
+much easier and safer to just have a global wait-for-bmc-ready function as
+proposed in this design.
+
+## Impacts
+
+Users will need to understand that their request to power on the system may
+be delayed by an undefined amount of time. In general, a BMC gets to Ready state
+within a couple of minutes.
+
+### Organizational
+This function will be implemented within the existing phosphor-state-manager
+repository. x86-power-control, an alternative to phosphor-state-manager, could
+also implement this logic.
+
+## Testing
+- Ensure a power on request is properly queued and executed when it is made
+  prior to the BMC being `Ready`.