BMC Boot Ready

Author: Andrew Geissler (geissonator)

Other contributors:

Created: May 12, 2022

Problem Description

There are services which run on the BMC which are required for the BIOS (host firmware) to power on and boot the system. The goal of this design is to define a mechanism to ensure these dependencies are met before a power on or boot is started.

For example, on some system, you can not power on the chassis until the VPD has been collected from the VRM's by the BMC to determine their characteristics. On other systems, the BIOS service is needed so the host firmware can look for any overrides.

Currently, OpenBMC has an undefined behavior in this area. If a particular BMC has a large time gap between when the webserver is available and when all BMC services have completed running, there is a window there that a user could request a power on via the webserver when not all needed services are running.

Background and References

The mailing list discussion can be found here. The BMC currently has three major state management interfaces in a system. The BMC, Chassis, and Host. Within each state interface, the current state and requested state are tracked.

The BMC state object is considered Ready once the systemd multi-user.target has successfully started all if its services.

There are three options that have been discussed to solve this issue:

  1. D-Bus objects don't exist until the backend is prepared to handle them.
  2. If a user tries something that system is not in proper state to handle then return an error code.
  3. If a user tries something that system is not in proper state to handle then queue it up.

Option 1 is challenging because D-Bus interfaces provided by OpenBMC state applications have a mix of read-only properties (like current state) and writeable properties that are used to request state changes. Not showing any until everything is available could have unknown consequences. This also has similar issues to option 2 in that applications and clients must have proper code to handle missing interfaces.

Option 2 is challenging because Redfish clients and internal applications like the op-panel code now need to properly handle error codes like this. You can argue that they already should, but that is definitely not the case with a lot of OpenBMC applications and clients.

Option 3 is the most user friendly option. No client or OpenBMC application changes would be needed. One concern is that having a system somewhat randomly power on at some later point in time could be unexpected. The general consensus in this review though has been that this is the most preferred option.

Requirements

  • Queue up chassis and host requested state changes until the BMC is in the proper state to allow the request
    • What the "proper state" is will be implementation specific but by default phosphor-state-manager will queue all requests until the BMC state has reached Ready

Proposed Design

If a power on or boot request is made to the Chassis or Host state objects and the BMC is not at Ready then the request will be queued and the state management code will begin monitoring for BMC Ready. Once reached, the requested operation will be automatically executed.

Alternatives Considered

The "Background and References" section covered some alternative options and the complexity behind them.

Another option is to code the dependencies directly into the services. For example, if the power on service requires the vrm vpd collection service, encode that dependency in the systemd files. This is easy to say but in practice has been challenging. Some OpenBMC services have built in assumptions that the multi-user.target and all of it's dependent services have completed prior to a power on being started. The general consensus within IBM was that it's much easier and safer to just have a global wait-for-bmc-ready function as proposed in this design.

Impacts

Users will need to understand that their request to power on the system may be delayed by an undefined amount of time. In general, a BMC gets to Ready state within a couple of minutes.

Organizational

This function will be implemented within the existing phosphor-state-manager repository. x86-power-control, an alternative to phosphor-state-manager, could also implement this logic.

Testing

  • Ensure a power on request is properly queued and executed when it is made prior to the BMC being Ready.