Author: James Feist !jfei
Other contributors: None
Created: 2019-04-17
Redfish Status has 3 main properties: Health, HealthRollup, and State. We need ways to be able to determine from a high level the health of contained components. We also need to be able to determine the health of individual components.
HealthRollup contains the combined health for all components below it. Main use cases are:
The more difficult examples that we need to cover are:
https:///redfish/v1/Systems/system/Memory/dimm0, where we need to roll the health of the dimm up multiple levels to the system health.
https:///redfish/v1/Managers/bmc, where we need to get the health of the bmc with no direct sensors.
https:///redfish/v1/Chassis/1Ux16_Riser_1/, where multiple sensor types roll up to the chassis health.
https:///redfish/v1/Managers/bmc/EthernetInterfaces/eth1, where the ethernet interface failing needs to contribute to bmc health.
Some examples of health with different use cases are:
Currently operational status only has pass / fail options. Where Redfish health is tri-state: https://github.com/openbmc/bmcweb/blob/master/static/redfish/v1/schema/Resource_v1.xml#L197
Threshold interface (Currently unused by some vendors): https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Sensor/Threshold.errors.yaml
Associations In Sensors: https://lists.ozlabs.org/pipermail/openbmc/2019-February/015188.html
Association Definition Interface: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Association/Definitions.interface.yaml
Operational Status Interface: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/State/Decorator/OperationalStatus.interface.yaml
Redfish schema guide: https://www.dmtf.org/sites/default/files/standards/documents/DSP2046_2018.1_0.pdf
BMC inventory interface: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Inventory/Item/Bmc.interface.yaml
System inventory interface: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Inventory/Item/System.interface.yaml
Item interface: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Inventory/Item.interface.yaml
Map critical thresholds to health critical, and warning thresholds to health warning. If thresholds do not exist or do not indicate a problem, map OperationalStatus failed to critical.
Chassis have individual sensors. Cross reference the individual sensors with the global health. If any of the sensors of the chassis are in the global health association for warning, the chassis rollup is warning. Likewise if any inventory for the chassis is in the global health critical, the chassis is critical. The global inventory item will be xyz.openbmc_project.Inventory.Item.Global.
Any other associations ending in "critical" or "warning" are combined, and searched for inventory. The worst inventory item in the Chassis is the rollup for the Chassis. System is treated the same.
A fan will be marked critical if its threshold has crossed critical, or its operational state is marked false. This fan may then be placed in the global health warning association if the system determines this failure should contribute as such. The chassis that the fan is in will then be marked as warning as the inventory that it contains is warning at a global level.
A new daemon to track global health state. Although this would be difficult to reuse to track individual component health.
Testing will be performed using sensor values and the status LED.