Author: Josh Lehan1
Other contributors: Ed Tanous, Peter Lundgren, Alex Qiu
Created: March 19, 2021
In OpenBMC, the dbus-sensors2 package contains a suite of sensor daemons. Each daemon monitors a particular type of sensor. This document provides rationale and motivation for adding ExternalSensor, another sensor daemon, and gives some example usages of it.
There are 10 existing sensor daemons in dbus-sensors. Why add another sensor daemon?
Most of the existing sensor daemons are tied to one particular physical quantity they are measuring, such as temperature, and are hardcoded as such. An externally-updated sensor has no such limitation, and should be flexible enough to measure any physical quantity currently supported by OpenBMC.
Essentially all of the existing sensor daemons obtain the sensor values they publish to D-Bus by reading from local hardware (typically by reading from virtual files provided by the hwmon3 subsystem of the Linux kernel). None of the daemons are currently designed with the intention of accepting values pushed in from an external source. Although there is some debugging functionality to add this feature to other sensor daemons25, it is not the primary purpose for which they were designed.
Even if the debugging functionality of an existing daemon were to be used, the daemon would still need a valid configuration tied to recognized hardware, as detected by entity-manager4, in order for the daemon to properly initialize itself and participate in the OpenBMC software stack.
For the same reason it is desirable for existing sensor daemons to detect and properly indicate failures of their underlying hardware, it is desirable for ExternalSensor to detect and properly indicate loss of timely sensor updates from their external source. This is a new feature, and does not cleanly fit into the architecture of any existing sensor daemon, thus a new daemon is the correct choice for this behavior.
For these reasons, ExternalSensor has been added5, as the eleventh sensor daemon in dbus-sensors.
After some discussion, a proof-of-concept HostSensor6 was published. This was a stub, but it revealed the minimal implementation that would still be capable of fully initializing and participating in the OpenBMC software stack. ExternalSensor was formed by using this example HostSensor, and also one of the simplest existing sensor daemons, HwmonTempSensor7, as references to build upon.
As written, after validating parameters during initialization, there is essentially no work for ExternalSensor to do. The main loop is mostly idle, remaining blocked in the Boost ASIO8 library, handling D-Bus requests as they come in. This utilizes the functionality in the underlying Sensor9 class, which already contains the D-Bus hooks necessary to receive values from the external source.
An example external source is the IPMI service10, receiving values from the host via the IPMI "Set Sensor Reading" command11. ExternalSensor is intended to be source-agnostic, so it does not matter if this is IPMI or Redfish12 or something else in the future, as long as they are received similarly over D-Bus.
The timeout feature is the primary feature which distinguishes ExternalSensor from other sensor daemons. Once an external source starts providing updates, the external source is expected to continue to provide timely updates. Each update will be properly published onto D-Bus, in the usual way done by all sensor daemons, as a floating-point value.
A timer is used, the same Boost ASIO13 timer mechanism used by other sensor daemons to poll their hardware, but in this case, is used to manage how long it has been since the last known good external update. When the timer expires, the sensor value will be deemed stale, and will be replaced with floating-point quiet NaN14.
The advantage of floating-point NaN is that it is a drop-in replacement for the valid floating-point value of the sensor. A subtle difference of the earlier OpenBMC sensor "Value" schema change, from integer to floating-point, is that the field is essentially now nullable. Instead of having to arbitrarily choose an arbitrary integer value to indicate "not valid", such as -1 or 9999 or whatever, floating-point explicitly has NaN to indicate this. So, there is no possibility of confusion that this will be mistaken for a valid sensor value, as NaN is literally not a number, and thus can not be misparsed as a valid sensor reading. It thus saves having to add a second field to reliably indicate validity, which would break the existing schema15.
An alternative to using NaN for staleness indication would have been to use a timestamp, which would introduce the complication of having to parse and compare timestamps within OpenBMC, and all the subtle difficulties thereof16. What's more, adding a second field might require a second D-Bus message to update, and D-Bus messages are computationally expensive17 and should be used sparingly. Periodic things like sensors, which send out regular updates, could easily lead to frequent D-Bus traffic and thus should be kept as minimal as practical. And finally, changing the Value schema would cause a large blast radius, both in design and in code, necessitating a large refactoring effort well beyond the scope of what is needed for ExternalSensor.
Configuring a sensor for use with ExternalSensor should be done in the usual way18 that is done for use with other sensor daemons, namely, a JSON dictionary that is an element of the "Exposes" array within a JSON configuration file to be read by entity-manager. In that JSON dictionary, the valid names are listed below. All of these are mandatory parameters, unless mentioned as optional. For fields listed as "Numeric" below, this means that it can be either integer or valid floating-point.
"Name": String. The sensor name, which this sensor will be known as. A mandatory component of the entity-manager configuration, and the resulting D-Bus object path.
"Units": String. This parameter is unique to ExternalSensor. As ExternalSensor is not tied to any particular physical hardware, it can measure any physical quantity supported by OpenBMC. This string will be translated to another string via a lookup table19, and forms another mandatory component of the D-Bus object path.
"MinValue": Numeric. The minimum valid value for this sensor. Although not used by ExternalSensor directly, it is a valuable hint for services such as IPMI, which need to know the minimum and maximum valid sensor values in order to scale their reporting range accurately. As ExternalSensor is not tied to one particular physical quantity, there is no suitable default value for minimum and maximum. Thus, unlike other sensor daemons where this parameter is optional, in ExternalSensor it is mandatory.
"MaxValue": Numeric. The maximum valid value for this sensor. It is treated similarly to "MinValue".
"Timeout": Numeric. This parameter is unique to ExternalSensor. It is the timeout value, in seconds. If this amount of time elapses with no new updates received over D-Bus from the external source, this sensor will be deemed stale. The value of this sensor will be replaced with floating-point NaN, as described above. This field is optional. If not given, the timeout feature will be disabled for this sensor (so it will never be deemed stale).
"Type": String. Must be exactly "ExternalSensor". This string is used by ExternalSensor to obtain configuration information from entity-manager during initialization. This string is what differentiates JSON stanzas intended for ExternalSensor versus JSON stanzas intended for other dbus-sensors sensor daemons.
"Thresholds": JSON dictionary. This field is optional. It is passed through to the main Sensor class during initialization, similar to other sensor daemons. Other than that, it is not used by ExternalSensor.
"PowerState": String. This field is optional. Similarly to "Thresholds", it is passed through to the main Sensor class during initialization.
Here is an example. The sensor created by this stanza will form this object path: /xyz/openbmc_project/sensors/temperature/HostDevTemp
{ "Name": "HostDevTemp", "Units": "DegreesC", "MinValue": -16.0, "MaxValue": 111.5, "Timeout": 4.0, "Type": "ExternalSensor" }
There can be multiple ExternalSensor sensors in the configuration. There is no set limit on the number of sensors, except what is supported by a service such as IPMI.
As it stands now, ExternalSensor is up and running20. However, the timeout feature was originally implemented at the IPMI layer. Upon further investigation, it was found that IPMI was the wrong place for this feature, and that it should be moved within ExternalSensor itself21. It was originally thought that the timeout feature would be a useful enhancement available to all IPMI sensors, however, expected usage of almost all external sensor updates is a one-shot adjustment (for example, somebody wishes to change a voltage regulator setting, or fan speed setting). In this case, the timeout feature would not only not be necessary, it would get in the way and require additional coding22 to compensate for the unexpected NaN value. Only sensors intended for use with ExternalSensor are expected to receive continuous periodic updates from an external source, so it makes sense to move this timeout feature into ExternalSensor. This change also has the advantage of making ExternalSensor not dependent on IPMI as the only source of external updates.
A challenge of generalizing the timeout feature into ExternalSensor, however, was that the existing Sensor base class did not currently allow its existing D-Bus setter hook to be customized. This feature was straightforward to add23. One limitation was that the existing Sensor class, by design, dropped updates that duplicated the existing sensor value. For use with ExternalSensor, we want to recognize all updates received, even duplicates, as they are important to pet the watchdog, to avoid inadvertently triggering the timeout feature. However, it is still important to avoid needlessly sending the D-Bus PropertiesChanged event for duplicate readings.
The timeout value was originally a compiled-in constant. If ExternalSensor is to succeed as a general-purpose tool, this must be configurable. It was straightforward to add another configurable parameter24 to accept this timeout value, as shown in "Parameters" above.
The hardest task of all, however, was getting it accepted upstream. If you are reading this, then most likely, it was successful!