| commit | 3cbd5a14c4fd9564da33890075400495c9ffccc6 | [log] [tgz] |
|---|---|---|
| author | Andrew Jeffery <andrew@aj.id.au> | Mon Jul 18 16:32:11 2022 +0930 |
| committer | Andrew Jeffery <andrew@aj.id.au> | Wed Jul 20 22:04:25 2022 +0000 |
| tree | 365a14cd5b4c9367a7df80bd87d2bf573ed08968 | |
| parent | a4d2768c83c04b8f9a640858b7343c44e7deea6d [diff] |
NVMeBasicContext: Properly cleanup resources, allowing destruction
nvmesensor was terminating with an uncaught exception, e.g:
Jun 03 04:52:09 bmc nvmesensor[507]: terminate called after throwing an instance of 'sdbusplus::exception::SdBusError'
Jun 03 04:52:09 bmc nvmesensor[507]: what(): sd_bus_add_object_vtable: org.freedesktop.DBus.Error.FileExists: File exists
This would occur whenever entity-manager published a configuration for a
new drive in the system. The implementation of nvmesensor isn't the
smartest, and it tries to just scrap all NVMeContexts it knows of, along
with their NVMeSensor instances, and reconstruct them all from newly
captured entity-manager configuration data.
The problem lies in the fact that the NVMeContexts were not getting
destructed due embedding of std::shared_ptrs obtained via
shared_from_this() into async callback lambda captures whose context was
owned by the NVMeContext implementation.
Switch to capturing `this` via weak_from_this() in callback lambdas to
prevent the circular references. By doing so we are able to successfully
destruct the NVMeContext derivative instances via the clear() method on
the context map.
However, there's more to the story, as the NVMeSensors owned by a given
context are asynchronously iterated for polling purposes. To make sure
we don't have races from unbounded latency on asynchronous destruction
of NVMeSensors, we order the std::jthread and async stream member
variables of NVMeBasicContext such that destruction of the streams exits
the IO thread due to read() or write() syscall failures on the pipes.
The use of std::jthread ensures we join() to complete the cleanup prior
to returning from the NVMeBasicContext destructor.
Preventing active polling of NVMeSensor instances ensures their
destruction when clearing the NVMe context map, opening the path for
successful (re-)construction of NVMeSensor DBus objects for the
published sensor configurations.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Change-Id: I63d0e7eb3318c209b08551072a3cb7279da21269
dbus-sensors is a collection of sensor applications that provide the xyz.openbmc_project.Sensor collection of interfaces. They read sensor values from hwmon, d-bus, or direct driver access to provide readings. Some advance non-sensor features such as fan presence, pwm control, and automatic cpu detection (x86) are also supported.
runtime re-configurable from d-bus (entity-manager or the like)
isolated: each sensor type is isolated into its own daemon, so a bug in one sensor is unlikely to affect another, and single sensor modifications are possible
async single-threaded: uses sdbusplus/asio bindings
multiple data inputs: hwmon, d-bus, direct driver access
A typical dbus-sensors object support the following dbus interfaces:
Path /xyz/openbmc_project/sensors/<type>/<sensor_name>
Interfaces xyz.openbmc_project.Sensor.Value
xyz.openbmc_project.Sensor.Threshold.Critical
xyz.openbmc_project.Sensor.Threshold.Warning
xyz.openbmc_project.State.Decorator.Availability
xyz.openbmc_project.State.Decorator.OperationalStatus
xyz.openbmc_project.Association.Definitions
Sensor interfaces collection are described here.
Consumer examples of these interfaces are Redfish, Phosphor-Pid-Control, IPMI SDR.
dbus-sensor daemons are reactors that dynamically create and update sensors configuration when system configuration gets updated.
Using asio timers and async calls, dbus-sensor daemons read sensor values and check thresholds periodically. PropertiesChanged signals will be broadcasted for other services to consume when value or threshold status change. OperationStatus is set to false if the sensor is determined to be faulty.
A simple sensor example can be found here.
Sensor devices are described using Exposes records in configuration file. Name and Type fields are required. Different sensor types have different fields. Refer to entity manager schema for complete list.