nvidia-gpu: Fix up buffering in MctpRequester

This change does a lot, for better or worse
1. Change MctpRequester to hold both buffers for send and receive
2. This requires changing the callback structure, so the reach is far
3. Changes error reporting to be through std::error_code
4. Collapses the QueuingRequeuster and Requeuster to be MctpRequeuster
5. Doing 4 gets rid of a level indirection and an extra unordered_map
6. Adds proper iid support, which is made significantly easier by 4/5
7. Fixes issues around expiry timer's where we would cancel the timer
   for a given request whenever a new packet would come in to be sent.
   This could cause lockup if a packet truly did time out and an
   interleaved packet finished sending. This moves each queue
   to have its own timer.

This fixes an issue where we were receiving buffers in from clients
and then binding them to receive_calls without ensuring that they
are the correct message, thus when receive was called, it was called
with the last bound buffer to async_receive_from. This would cause a
number of issues, ranging from incorrect device discovery results
to core dumps as well as incorrect sensor readings.

This change moves the receive and send buffers to be owned by
the MctpRequester, and a non-owning view is provided via
callback to the client. All existing clients just decode in place
given that buffer.

Tested: loaded onto nvl32-obmc. Correct number of sensors showed up
and the readings were nominal

Change-Id: I67c843691ca79e9fcccfa16df6d611918f25f6ca
Signed-off-by: Marc Olberding <molberding@nvidia.com>
19 files changed
tree: f6dbb3eba497d5d56de0565fe598ced25a1d0077
  1. include/
  2. service_files/
  3. src/
  4. subprojects/
  5. .clang-format
  6. .clang-tidy
  7. .gitignore
  8. LICENSE
  9. meson.build
  10. meson.options
  11. OWNERS
  12. README.md
README.md

dbus-sensors

dbus-sensors is a collection of sensor applications that provide the xyz.openbmc_project.Sensor collection of interfaces. They read sensor values from hwmon, d-bus, or direct driver access to provide readings. Some advance non-sensor features such as fan presence, pwm control, and automatic cpu detection (x86) are also supported.

key features

  • runtime re-configurable from d-bus (entity-manager or the like)

  • isolated: each sensor type is isolated into its own daemon, so a bug in one sensor is unlikely to affect another, and single sensor modifications are possible

  • async single-threaded: uses sdbusplus/asio bindings

  • multiple data inputs: hwmon, d-bus, direct driver access

dbus interfaces

A typical dbus-sensors object support the following dbus interfaces:

Path        /xyz/openbmc_project/sensors/<type>/<sensor_name>

Interfaces  xyz.openbmc_project.Sensor.Value
            xyz.openbmc_project.Sensor.Threshold.Critical
            xyz.openbmc_project.Sensor.Threshold.Warning
            xyz.openbmc_project.State.Decorator.Availability
            xyz.openbmc_project.State.Decorator.OperationalStatus
            xyz.openbmc_project.Association.Definitions

Sensor interfaces collection are described in phosphor-dbus-interfaces.

Consumer examples of these interfaces are Redfish, Phosphor-Pid-Control, IPMI SDR.

Reactor

dbus-sensor daemons are reactors that dynamically create and update sensors configuration when system configuration gets updated.

Using asio timers and async calls, dbus-sensor daemons read sensor values and check thresholds periodically. PropertiesChanged signals will be broadcasted for other services to consume when value or threshold status change. OperationStatus is set to false if the sensor is determined to be faulty.

A simple sensor example can be found in entity-manager examples.

configuration

Sensor devices are described using Exposes records in configuration file. Name and Type fields are required. Different sensor types have different fields. Refer to entity manager schema for complete list.

sensor documentation

Sensor Type Documentation

ADC Sensors

ADC sensors are sensors based on an Analog to Digital Converter. They are read via the Linux kernel Industrial I/O subsystem (IIO).

One of the more common use cases within OpenBMC is for reading these sensors from the ADC on the Aspeed ASTXX cards.

To utilize ADC sensors feature within OpenBMC you must first define and enable it within the kernel device tree.

When using a common OpenBMC device like the AST2600 you will find a "adc0" and "adc1" section in the aspeed-g6.dtsi file. These are disabled by default so in your system-specific dts you would enable and configure what you want with something like this:

iio-hwmon {
    compatible = "iio-hwmon";
    io-channels = <&adc0 0>;
    ...
}

&adc0 {
    status = "okay";
    ...
};

&adc1 {
    status = "okay";
    ...
};

Note that this is not meant to be an exhaustive list on the nuances of configuring a device tree but really to point users in the general direction.

You will then create an entity-manager configuration file that is of type "ADC" A very simple example would like look this:

            "Index": 0,
            "Name": "P12V",
            "PowerState": "Always",
            "ScaleFactor": 1.0,
            "Type": "ADC"

When your system is booted, a "in0_input" file will be created within the hwmon subsystem (/sys/class/hwmon/hwmonX). The adcsensor application will scan d-bus for any ADC entity-manager objects, look up their "Index" value, and try to match that with the hwmon inY_input files. When it finds a match it will create a d-bus sensor under the xyz.openbmc_project.ADCSensor service. The sensor will be periodically updated based on readings from the hwmon file.