gpu : add thresholds support to TLimit

This patch adds support to fetch TLimit thresholds from gpu

Tested.

The TEMP_0 update is disabled while testing this patch as it requires
MCTP request queueing since OCP MCTP VDM specifies at max one
outstanding request to the device. The MCTP request queueing is being
introduces with this patch -
https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/80023

Build an image for gb200nvl-obmc machine with the following patches
cherry picked. This patches are needed to enable the mctp stack.

https://gerrit.openbmc.org/c/openbmc/openbmc/+/79312
https://gerrit.openbmc.org/c/openbmc/openbmc/+/79410
https://gerrit.openbmc.org/c/openbmc/openbmc/+/79422

Copy the configuration file on gb200nvl-obmc machine and restart the
entity-manager service.
```
root@gb200nvl-obmc:~# rm -rf /var/configuration/
root@gb200nvl-obmc:~# systemctl restart xyz.openbmc_project.EntityManager.service
```

Copy the gpusensor app and run it.
```
root@gb200nvl-obmc:~# ./gpusensor
```

```
$ curl -k -u 'root:0penBmc' https://10.137.203.137/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_TEMP_1
{
  "@odata.id": "/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_TEMP_1",
  "@odata.type": "#Sensor.v1_2_0.Sensor",
  "Id": "temperature_NVIDIA_GB200_GPU_TEMP_1",
  "Name": "NVIDIA GB200 GPU TEMP 1",
  "Reading": 49.0,
  "ReadingRangeMax": 127.0,
  "ReadingRangeMin": -128.0,
  "ReadingType": "Temperature",
  "ReadingUnits": "Cel",
  "Status": {
    "Health": "OK",
    "State": "Enabled"
  },
  "Thresholds": {
    "LowerCaution": {
      "Reading": 0.0
    },
    "LowerCritical": {
      "Reading": 0.0
    }
  }
}%

root@gb200nvl-obmc:~# busctl introspect xyz.openbmc_project.GpuSensor /xyz/openbmc_project/sensors/temperature/NVIDIA_GB200_GPU_TEMP_1
NAME                                                  TYPE      SIGNATURE RESULT/VALUE                             FLAGS
org.freedesktop.DBus.Introspectable                   interface -         -                                        -
.Introspect                                           method    -         s                                        -
org.freedesktop.DBus.Peer                             interface -         -                                        -
.GetMachineId                                         method    -         s                                        -
.Ping                                                 method    -         -                                        -
org.freedesktop.DBus.Properties                       interface -         -                                        -
.Get                                                  method    ss        v                                        -
.GetAll                                               method    s         a{sv}                                    -
.Set                                                  method    ssv       -                                        -
.PropertiesChanged                                    signal    sa{sv}as  -                                        -
xyz.openbmc_project.Association.Definitions           interface -         -                                        -
.Associations                                         property  a(sss)    1 "chassis" "all_sensors" "/xyz/openbmc… emits-change
xyz.openbmc_project.Inventory.Item                    interface -         -                                        -
.PrettyName                                           property  s         "Thermal Limit(TLIMIT) Temperature is t… emits-change
xyz.openbmc_project.Sensor.Threshold.Critical         interface -         -                                        -
.CriticalAlarmHigh                                    property  b         false                                    emits-change
.CriticalAlarmLow                                     property  b         false                                    emits-change
.CriticalHigh                                         property  d         nan                                      emits-change writable
.CriticalLow                                          property  d         0                                        emits-change writable
xyz.openbmc_project.Sensor.Threshold.HardShutdown     interface -         -                                        -
.HardShutdownAlarmHigh                                property  b         false                                    emits-change
.HardShutdownAlarmLow                                 property  b         false                                    emits-change
.HardShutdownHigh                                     property  d         nan                                      emits-change writable
.HardShutdownLow                                      property  d         0                                        emits-change writable
xyz.openbmc_project.Sensor.Threshold.Warning          interface -         -                                        -
.WarningAlarmHigh                                     property  b         false                                    emits-change
.WarningAlarmLow                                      property  b         false                                    emits-change
.WarningHigh                                          property  d         nan                                      emits-change writable
.WarningLow                                           property  d         0                                        emits-change writable
xyz.openbmc_project.Sensor.Value                      interface -         -                                        -
.MaxValue                                             property  d         127                                      emits-change
.MinValue                                             property  d         -128                                     emits-change
.Unit                                                 property  s         "xyz.openbmc_project.Sensor.Value.Unit.… emits-change
.Value                                                property  d         48.9688                                  emits-change writable
xyz.openbmc_project.Sensor.ValueMutability            interface -         -                                        -
.Mutable                                              property  b         true                                     emits-change
xyz.openbmc_project.State.Decorator.Availability      interface -         -                                        -
.Available                                            property  b         true                                     emits-change writable
xyz.openbmc_project.State.Decorator.OperationalStatus interface -         -                                        -
.Functional                                           property  b         true                                     emits-change
```

Change-Id: I6f2ff2652ce9246287f9bd63c4297d9ad3229963
Signed-off-by: Harshit Aghera <haghera@nvidia.com>
6 files changed
tree: 154e4f3e374c9b6eb4d8cf87ee328339215e08b9
  1. include/
  2. service_files/
  3. src/
  4. subprojects/
  5. .clang-format
  6. .clang-tidy
  7. .gitignore
  8. LICENSE
  9. meson.build
  10. meson.options
  11. OWNERS
  12. README.md
README.md

dbus-sensors

dbus-sensors is a collection of sensor applications that provide the xyz.openbmc_project.Sensor collection of interfaces. They read sensor values from hwmon, d-bus, or direct driver access to provide readings. Some advance non-sensor features such as fan presence, pwm control, and automatic cpu detection (x86) are also supported.

key features

  • runtime re-configurable from d-bus (entity-manager or the like)

  • isolated: each sensor type is isolated into its own daemon, so a bug in one sensor is unlikely to affect another, and single sensor modifications are possible

  • async single-threaded: uses sdbusplus/asio bindings

  • multiple data inputs: hwmon, d-bus, direct driver access

dbus interfaces

A typical dbus-sensors object support the following dbus interfaces:

Path        /xyz/openbmc_project/sensors/<type>/<sensor_name>

Interfaces  xyz.openbmc_project.Sensor.Value
            xyz.openbmc_project.Sensor.Threshold.Critical
            xyz.openbmc_project.Sensor.Threshold.Warning
            xyz.openbmc_project.State.Decorator.Availability
            xyz.openbmc_project.State.Decorator.OperationalStatus
            xyz.openbmc_project.Association.Definitions

Sensor interfaces collection are described here.

Consumer examples of these interfaces are Redfish, Phosphor-Pid-Control, IPMI SDR.

Reactor

dbus-sensor daemons are reactors that dynamically create and update sensors configuration when system configuration gets updated.

Using asio timers and async calls, dbus-sensor daemons read sensor values and check thresholds periodically. PropertiesChanged signals will be broadcasted for other services to consume when value or threshold status change. OperationStatus is set to false if the sensor is determined to be faulty.

A simple sensor example can be found here.

configuration

Sensor devices are described using Exposes records in configuration file. Name and Type fields are required. Different sensor types have different fields. Refer to entity manager schema for complete list.

sensor documentation

Sensor Type Documentation

ADC Sensors

ADC sensors are sensors based on an Analog to Digital Converter. They are read via the Linux kernel Industrial I/O subsystem (IIO).

One of the more common use cases within OpenBMC is for reading these sensors from the ADC on the Aspeed ASTXX cards.

To utilize ADC sensors feature within OpenBMC you must first define and enable it within the kernel device tree.

When using a common OpenBMC device like the AST2600 you will find a "adc0" and "adc1" section in the aspeed-g6.dtsi file. These are disabled by default so in your system-specific dts you would enable and configure what you want with something like this:

iio-hwmon {
    compatible = "iio-hwmon";
    io-channels = <&adc0 0>;
    ...
}

&adc0 {
    status = "okay";
    ...
};

&adc1 {
    status = "okay";
    ...
};

Note that this is not meant to be an exhaustive list on the nuances of configuring a device tree but really to point users in the general direction.

You will then create an entity-manager configuration file that is of type "ADC" A very simple example would like look this:

            "Index": 0,
            "Name": "P12V",
            "PowerState": "Always",
            "ScaleFactor": 1.0,
            "Type": "ADC"

When your system is booted, a "in0_input" file will be created within the hwmon subsystem (/sys/class/hwmon/hwmonX). The adcsensor application will scan d-bus for any ADC entity-manager objects, look up their "Index" value, and try to match that with the hwmon inY_input files. When it finds a match it will create a d-bus sensor under the xyz.openbmc_project.ADCSensor service. The sensor will be periodically updated based on readings from the hwmon file.