commit | f0295f52536d9e508305a0e522184157966ee2f5 | [log] [tgz] |
---|---|---|
author | Chris Cain <cjcain@us.ibm.com> | Thu Sep 12 15:41:14 2024 -0500 |
committer | Chris Cain <cjcain@us.ibm.com> | Thu Oct 03 14:44:41 2024 -0500 |
tree | 0e59a484f870205e0d173712ec482160f59e47b5 | |
parent | 9a8fe27557808f1f48d2a9a040290a52a7998e76 [diff] |
Improve BMC error handling for OCC comm failures - Delay starting OCC reset until all OCCs have been detected (or timeout). It will prevent multiple resets from being triggered and to help detecting when reset is completed (active sensor being set after reset is complete) - Wait for PLDM response to OCC reset and HRESET requests and retry if they fail - If HRESET returns NOT_READY, collect SBE FFDC and try OCC reset. A persistent failure will put the system in safe state. - Prevent overwriting dvfs over-temp filename for p10 and beyond since that old file is only present in old kernel - Prevent assert when opening sysfs files. (added catch and then created an OCC Comm failure PEL, which will force an OCC reset.) - Check return code after reading sysfs files to confirm success. If read fails, try reset to recover. - Updated traces to include which processor/OCC encountered issues. - Better recovery to close windows that were leaving system in partial good state. JIRA: PFES-66 Change-Id: I0b087d0e05bd8562682062e1c662f9e18164a720 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
This service will handle communications to the On-Chip Controller (OCC) on Power processors. The OCC provides processor and memory temperatures, power readings, power cap support, system power mode support, and idle power saver support. OCC Control will be interfacing with the OCC to collect the temperatures and power readings, updating the system power mode, setting power caps, and idle power save parameters.
The service is started automatically when the BMC is started.
This project can be built with meson. The typical meson workflow is: meson builddir && ninja -C builddir.
The server will start automatically after BMC is powered on.
Server status: systemctl status org.open_power.OCC.Control.service
To restart the service: systemctl restart org.open_power.OCC.Control.service
Service files are located in service_files subdirectory.
IBM EnergyScale for Power10 Processor-Based Systems whitepaper: https://www.ibm.com/downloads/cas/E7RL9N4E
OCC Firmware Interface Spec for Power10: https://github.com/open-power/docs/blob/P10/occ/OCC_P10_FW_Interfaces_v1_17.pdf
OCC Firmware: https://github.com/open-power/occ/tree/master-p10
IBM EnergyScale for POWER9 Processor-Based Systems: https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=49019149USEN&
OCC Firmware Interface Spec for POWER9: https://github.com/open-power/docs/blob/P9/occ/OCC_P9_FW_Interfaces.pdf