commit | 1ff926e0f055dde8f068d671d4df524ce54d7725 | [log] [tgz] |
---|---|---|
author | Andrew Geissler <geissonator@yahoo.com> | Thu Jan 26 08:14:22 2023 -0700 |
committer | Andrew Geissler <geissonator@yahoo.com> | Thu Jan 26 08:14:22 2023 -0700 |
tree | 5afa74651033e500c568b727c9c09ed3818b747b | |
parent | 9051685138dd5708d7c4ccb104a3ea9c69b146bf [diff] |
power down when host fail detected in power off path Some code was introduced recently within hw-diags to ensure it was still running while the host was in the process of powering off. Prior to this change, hw-diags was only ever running while either hostboot or PHYP was running. Now that hw-diags can be running in the power off path, a change in logic is needed on which systemd target to call on detection of a host error. Both the quiesce and crash targets can only be called from a host running state. If the host is in the process of powering off when an error is detected then an appropriate error should be logged, and then the obmc-chassis-hard-poweroff@.target should be called. This will ensure the services waiting for the host to indicate it has shutdown are properly stopped and the system is powered off. If more of these types of situation arise, it may be pertinent to revisit service directly calling systemd targets. An alternative to putting the responsibility on the calling service is to have a central authority that services call instead of the systemd target directly. This would be a large change requiring extensive changes and testing. Tested: - Injected PHYP TI while graceful power off was in process - Verified hw-diags generate error with TI data - Verified hard-poweroff target was called and pldm soft power off service was stopped and system properly powered off - Injected error with dumps disabled and system at runtime, verified system went to Quiesced and auto rebooted - Injected error with dumps enabled and system at runtime, verified MPIPL was done and SYSDUMP generated Signed-off-by: Andrew Geissler <geissonator@yahoo.com> Change-Id: I13e2bc45a948930c31ede2728f9d78b9b8bff5b1
In the event of a system fatal error reported by the internal system hardware (processor chips, memory chips, I/O chips, system memory, etc.), POWER Systems have the ability to diagnose the root cause of the failure and perform any service action needed to avoid repeated system failures.
Aditional details TBD.
For a standard OpenBMC release build, you want something like:
meson -Dtests=disabled <build_dir> ninja -C <build_dir> ninja -C <build_dir> install
For a test / debug build, a typical configuration is:
meson -Dtests=enabled <build_dir> ninja -C <build_dir> test