NVMeBasicContext: Properly cleanup resources, allowing destruction

nvmesensor was terminating with an uncaught exception, e.g:

    Jun 03 04:52:09 bmc nvmesensor[507]: terminate called after throwing an instance of 'sdbusplus::exception::SdBusError'
    Jun 03 04:52:09 bmc nvmesensor[507]:   what():  sd_bus_add_object_vtable: org.freedesktop.DBus.Error.FileExists: File exists

This would occur whenever entity-manager published a configuration for a
new drive in the system. The implementation of nvmesensor isn't the
smartest, and it tries to just scrap all NVMeContexts it knows of, along
with their NVMeSensor instances, and reconstruct them all from newly
captured entity-manager configuration data.

The problem lies in the fact that the NVMeContexts were not getting
destructed due embedding of std::shared_ptrs obtained via
shared_from_this() into async callback lambda captures whose context was
owned by the NVMeContext implementation.

Switch to capturing `this` via weak_from_this() in callback lambdas to
prevent the circular references. By doing so we are able to successfully
destruct the NVMeContext derivative instances via the clear() method on
the context map.

However, there's more to the story, as the NVMeSensors owned by a given
context are asynchronously iterated for polling purposes. To make sure
we don't have races from unbounded latency on asynchronous destruction
of NVMeSensors, we order the std::jthread and async stream member
variables of NVMeBasicContext such that destruction of the streams exits
the IO thread due to read() or write() syscall failures on the pipes.
The use of std::jthread ensures we join() to complete the cleanup prior
to returning from the NVMeBasicContext destructor.

Preventing active polling of NVMeSensor instances ensures their
destruction when clearing the NVMe context map, opening the path for
successful (re-)construction of NVMeSensor DBus objects for the
published sensor configurations.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Change-Id: I63d0e7eb3318c209b08551072a3cb7279da21269
2 files changed
tree: 365a14cd5b4c9367a7df80bd87d2bf573ed08968
  1. include/
  2. service_files/
  3. src/
  4. subprojects/
  5. tests/
  6. .clang-format
  7. .clang-ignore
  8. .clang-tidy
  9. .gitignore
  10. Jenkinsfile
  11. LICENSE
  12. MAINTAINERS
  13. meson.build
  14. meson_options.txt
  15. OWNERS
  16. README.md
README.md

dbus-sensors

dbus-sensors is a collection of sensor applications that provide the xyz.openbmc_project.Sensor collection of interfaces. They read sensor values from hwmon, d-bus, or direct driver access to provide readings. Some advance non-sensor features such as fan presence, pwm control, and automatic cpu detection (x86) are also supported.

key features

  • runtime re-configurable from d-bus (entity-manager or the like)

  • isolated: each sensor type is isolated into its own daemon, so a bug in one sensor is unlikely to affect another, and single sensor modifications are possible

  • async single-threaded: uses sdbusplus/asio bindings

  • multiple data inputs: hwmon, d-bus, direct driver access

dbus interfaces

A typical dbus-sensors object support the following dbus interfaces:

Path        /xyz/openbmc_project/sensors/<type>/<sensor_name>

Interfaces  xyz.openbmc_project.Sensor.Value
            xyz.openbmc_project.Sensor.Threshold.Critical
            xyz.openbmc_project.Sensor.Threshold.Warning
            xyz.openbmc_project.State.Decorator.Availability
            xyz.openbmc_project.State.Decorator.OperationalStatus
            xyz.openbmc_project.Association.Definitions

Sensor interfaces collection are described here.

Consumer examples of these interfaces are Redfish, Phosphor-Pid-Control, IPMI SDR.

Reactor

dbus-sensor daemons are reactors that dynamically create and update sensors configuration when system configuration gets updated.

Using asio timers and async calls, dbus-sensor daemons read sensor values and check thresholds periodically. PropertiesChanged signals will be broadcasted for other services to consume when value or threshold status change. OperationStatus is set to false if the sensor is determined to be faulty.

A simple sensor example can be found here.

configuration

Sensor devices are described using Exposes records in configuration file. Name and Type fields are required. Different sensor types have different fields. Refer to entity manager schema for complete list.

sensor documentation