Error handling for power Hardware Abstraction Layer (pHAL)

Author: Devender Rao devenrao@in.ibm.com

Other contributors: None

Created: 14/01/2020

Problem Description

Proposal to provide a mechanism to convert the failure data captured as part of power Hardware Abstraction Layer(pHAL) library calls to Platform Event Log (PEL) format.

Background and References

OpenBmc Applications use the pHAL layer for hardware access and hardware initialization, any software/hardware error returned by the pHAL layer need to be converted to PEL format for logging the error entry. PEL helps to improve the firmware and platform serviceability during product development, manufacturing and in customer environment.

Error data includes register data, targets to guard and callout. Guard refers to the action of "guarding" faulty hardware from impacting future system operation. Callout points to a specific hardware with in the server that relates to the identified error.

Phosphor-logging Create interface is used for creating PELs.

pHAL layer constitutes below libraries and and these libraries return different return codes.

  1. libipl used for initial program load
  2. libfdt for device tree access
  3. libekb for hardware procedure execution
  4. libpdbg for hardware access

Proposal is to structure the return data to a standard return code format so that the caller can just handle the single return code format for conversion to PEL.

Glossary

pHAL: power Hardware Abstraction Layer. pHAL is group of libraries running in BMC. These libraries are used by Open Power specific application for hardware complex interactions, hostboot and Self Boot Engine initialization, diagnostics and debugging.

libfdt: pHAL uses to construct the in-memory tree structure of all targets. Reference

libpdbg: library to allow debugging of the host POWER processors from the BMC Reference

MRW: Machine readable workbook. An XML description of a machine as specified by the system owner.

HWP: Hardware procedure. A "black box" code module supplied by the hardware team that initializes host processor and memory subsystems in a platform -independent fashion.

Device Tree: A device tree is a data structure describing the hardware components of a particular computer so that the operating system's kernel can use and manage those components, including the CPU or CPUs, the memory, the buses and the peripherals. Reference

EKB: EKB library contains all the hardware procedures (HWP) for the specific platform and corresponding error XML files for each hardware procedure.

PEL: Platform Entity Log

Requirement

libekb

EKB library contains hardware procedures for the specific platform and the corresponding error xml files for each hardware procedure. Error XML specifies attribute data, targets to callout, targets to guard, and targets to deconfigure for a specific error. Parsers in EKB library parse the error XML file and generate a c++ header file which is used by the hardware procedure in capturing the failure data.

Add parser in libekb to parse the error XML file and provide methods that can parse the failure data returned by the hardware procedure methods and return data in key, value pairs so that the same can be used in the creation of PEL.

libipl

Initial program load library used for booting the system. Library internally calls hardware procedures (HWP) of EKB library. Hardware procedure execution status need to be returned to the caller so that caller can create PEL on hardware procedure execution failure.

libpdbg

libpdbg library is used for hardware access, any hardware access errors need to be captured as part of the PEL.

Message Registry Entries

For errors to be raised in pHal corresponding error message registry entries need to be created in the message registry.

Proposed design

Hardware procedure failure

Add parser in libekb to parse the error XML file and provide methods that can parse the failure data returned by the hardware procedure methods and return data in key, value pairs so that the same can be used in the Create interface for the creation of PEL.

Inventory strings for the targets to Callout/Guard/Deconfig need to be added to the additional data section of the Create interface.

Applications need to register callback methods in libekb library to get back the error logging traces.

Debug traces returned through the callback method will be added to the PEL.

libipl internal failure

Applications need to register callback methods in libipl library to get back the error logging traces.

Debug traces returned through the callback method will be added to the PEL.

libpdbg internal failure

Applications need to register callback methods to get the debug traces from libpdbg library.

Debug traces returned through the callback method will be added to the PEL.

Sequence diagrams

Register for debug traces and boot errors

image

Process debug traces

image

Process boot failures

image

Alternatives Considered

None

Impacts

None

Future changes

At present using Create by providing the data in std::map format the same will be changed to JSON format when the corresponding support to pass JSON files to the Create interface is added.

Testing

  1. Simulate hardware procedure failure and check if PEL is created.