Add support to mask and unmask host crash target

When the host crashes, BMC gets to know about that and causes a
transition to Quiesce state as part of executing a target.
Doing so, makes the system move out of fail state.
This patch set masks and unmasks the target on recoveryoff
and recoveryon respectively

Tested:

:/tmp# ./obmcutil recoveryoff
Created symlink /etc/systemd/system/obmc-host-timeout@0.target → /dev/null.
Created symlink /etc/systemd/system/obmc-host-crash@0.target → /dev/null.

:/tmp# ./obmcutil recoveryon
Removed /etc/systemd/system/obmc-host-timeout@0.target.
Removed /etc/systemd/system/obmc-host-crash@0.target.

Change-Id: I49dff827b78a6fc2a35d2dd0d79b5c05c20eb39b
Signed-off-by: Vishwanatha Subbanna <vishwa@linux.vnet.ibm.com>
1 file changed
tree: ddbd62f425aa6e7073db02c5985470d69fe34347
  1. test/
  2. .clang-format
  3. .gitignore
  4. bmc_state_manager.cpp
  5. bmc_state_manager.hpp
  6. bmc_state_manager_main.cpp
  7. bootstrap.sh
  8. chassis_state_manager.cpp
  9. chassis_state_manager.hpp
  10. chassis_state_manager_main.cpp
  11. configure.ac
  12. discover_system_state.cpp
  13. host_check_main.cpp
  14. host_state_manager.cpp
  15. host_state_manager.hpp
  16. host_state_manager_main.cpp
  17. LICENSE
  18. MAINTAINERS
  19. Makefile.am
  20. obmcutil
  21. phosphor-systemd-target-monitor.service.in
  22. phosphor-target-monitor-default.json
  23. README.md
  24. settings.cpp
  25. settings.hpp
  26. systemd_target_monitor.cpp
  27. systemd_target_parser.cpp
  28. systemd_target_parser.hpp
  29. systemd_target_signal.cpp
  30. systemd_target_signal.hpp
README.md

Phosphor State Manager Documentation

This repository contains the software responsible for tracking and controlling the state of different objects within OpenBMC. This currently includes the BMC, Chassis, and Host. The most critical feature of phosphor-state-manager software is its support for requests to power on and off the system by the user.

This software also enforces any restore policy (i.e. auto power on system after a system power event or bmc reset) and ensures its states are updated correctly in situations where the BMC is rebooted and the chassis or host are in on/running states.

This repository also provides a command line tool, obmcutil, which provides basic command line support to query and control phosphor-state-manager applications running within an OpenBMC system. This tool itself runs within an OpenBMC system and utilizes D-Bus APIs. These D-Bus APIs are used for development and debug and are not intended for end users.

As with all OpenBMC applications, interfaces and properties within phosphor-state-manager are D-Bus interfaces. These interfaces are then used by external interface protocols, such as Redfish and IPMI, to report and control state to/by the end user.

State Tracking and Control

phosphor-state-manager makes extensive use of systemd. There is a writeup here with an overview of systemd and its use by OpenBMC.

phosphor-state-manager follows some basics design guidelines in its implementation and use of systemd:

  • Keep the different objects as independent as possible (host, chassis, bmc)
  • Use systemd targets for everything and keep the code within phosphor-state-manager minimal
  • Ensure it can support required external interfaces, but don't necessarily create 1x1 mappings otherwise every external interface will end up with its own special chassis or host state request
  • If something like a hard power off can be done by just turning off the chassis, don't provide a command in the host to do the same thing

phosphor-state-manager implements states and state requests as defined in phosphor-dbus-interfaces for each object it supports.

  • bmc: The BMC has very minimal states. It is Ready once all services within the default.target have executed. The only state change request you can make of the BMC is for it to reboot itself.
    • CurrentBMCState: NotReady, Ready
    • RequestedBMCTransition: Reboot
  • chassis: The chassis represents the physical hardware in which the system is contained. It usually has the power supplies, fans, and other hardware associated with it. It can be either On or Off.
    • CurrentPowerState: On, Off
    • RequestedPowerTransition: On, Off
  • host: The host represents the software running on the system. In most cases this is an operating system of some sort. The host can be Off, Running, or Quiesced(error condition)
    • CurrentHostState: Off, Running, Quiesced
    • RequestedHostTransition: Off, On, Reboot

As noted above, phosphor-state-manager provides a command line tool, obmcutil, which takes a state parameter. This will use D-Bus commands to retrieve the above states and present them to the user. It also provides other commands which will send the appropriate D-Bus commands to the above properties to power on/off the chassis and host (see obmcutil --help within an OpenBMC system).

The above objects also implement other D-Bus objects like power on hours, boot progress, reboot attempts, and operating system status. These D-Bus objects are also defined out in the phosphor-dbus-interfaces repository.

Restore Policy on Power Events

The RestorePolicy defines the behavior the user wants when the BMC is reset. If the chassis or host is on/running then this service will not run. If they are off then the RestorePolicy will be read and executed by phosphor-state-manager code.

BMC Reset with Host and/or Chassis On

In situations where the BMC is reset and the chassis and host are on and running, its critical that the BMC software do two things:

  • Never impact the state of the system (causing a power off of a running system is very bad)
  • Ensure the BMC, Chassis, and Host states accurately represent the state of the system.

Note that some of this logic is provided via service files in system-specific meta layers. That is because the logic to determine if the chassis is on or if the host is running can vary from system to system. The requirement to create the files defined below and ensure the common targets go active is a must for anyone wishing to enable this feature.

phosphor-state-manager discovers state vs. trying to cache and save states. This ensure it's always getting the most accurate state information. It discovers the chassis state by checking the pgood value from the power application. If it determines that power is on then it will do the following:

  • Create a file called /run/openbmc/chassis@0-on
    • The presence of this file tells the services to alter their behavior because the chassis is already powered on
  • Start the obmc-chassis-poweron@0.target
    • The majority of services in this target will "fake start" due to the file being present. They will report to systemd that they started and ran successfully but they actually do nothing. This is what you would want in this case. Power is already on so you don't want to run the services to turn power on. You do want to get the obmc-chassis-poweron@0.target in the Active state though so that the chassis object within phosphor-state-manager will correctly report that the chassis is On
  • Start a service to check if the host is on

The chassis@0-on file is removed when power is removed from the chassis.

The logic to check if the host is on sends a command to the host, and if a response is received then similar logic to chassis is done:

  • Create a file called /run/openbmc/host@0-on
  • Start the obmc-host-start@0.target
    • Similar to above, most services will not run due to the file being created and their service files implementing a "ConditionPathExists=!/run/openbmc/host@0-request"

The host@0-on file is removed when the host is stopped.

Building the Code

./bootstrap.sh
./configure ${CONFIGURE_FLAGS} && make