nvidia-gpu: introduce notion of a device
Perform device discovery tasks only once per device to prepare for
introducing additional gpu sensors.
In the current implementation, sensor updates and device discovery via
MCTP are managed within a single class for simplicity. However, since a
GPU device typically includes multiple sensors, performing device
discovery for each individual sensor is inefficient. Instead, it would
be more effective to execute device discovery once per device.
Tested: Build an image for gb200nvl-obmc machine with the following
patches cherry picked. This patches are needed to enable the mctp stack.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/79422
```
$ curl -k -u 'root:0penBmc' https://10.137.203.137/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_0_TEMP_0
{
"@odata.id": "/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_0_TEMP_0",
"@odata.type": "#Sensor.v1_2_0.Sensor",
"Id": "temperature_NVIDIA_GB200_GPU_0_TEMP_0",
"Name": "NVIDIA GB200 GPU 0 TEMP 0",
"Reading": 37.6875,
"ReadingRangeMax": 127.0,
"ReadingRangeMin": -128.0,
"ReadingType": "Temperature",
"ReadingUnits": "Cel",
"Status": {
"Health": "OK",
"State": "Enabled"
}
}%
```
Change-Id: Ie3dcd43caa031b4aaa61d8be3f5d71aefd53bc9a
Signed-off-by: Harshit Aghera <haghera@nvidia.com>
diff --git a/src/nvidia-gpu/NvidiaDeviceDiscovery.hpp b/src/nvidia-gpu/NvidiaDeviceDiscovery.hpp
new file mode 100644
index 0000000..86211a9
--- /dev/null
+++ b/src/nvidia-gpu/NvidiaDeviceDiscovery.hpp
@@ -0,0 +1,42 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION &
+ * AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#pragma once
+
+#include "MctpRequester.hpp"
+
+#include <boost/asio/io_context.hpp>
+#include <boost/container/flat_map.hpp>
+#include <sdbusplus/asio/connection.hpp>
+#include <sdbusplus/asio/object_server.hpp>
+#include <sdbusplus/message.hpp>
+
+#include <cstdint>
+#include <memory>
+#include <string>
+
+constexpr const char* sensorPathPrefix = "/xyz/openbmc_project/sensors/";
+constexpr const char* deviceType = "NvidiaMctpVdm";
+
+struct SensorConfigs
+{
+ std::string name;
+ uint64_t pollRate{};
+};
+
+class GpuDevice;
+
+void createSensors(
+ boost::asio::io_context& io, sdbusplus::asio::object_server& objectServer,
+ boost::container::flat_map<std::string, std::shared_ptr<GpuDevice>>&
+ gpuDevices,
+ const std::shared_ptr<sdbusplus::asio::connection>& dbusConnection,
+ mctp::MctpRequester& mctpRequester);
+
+void interfaceRemoved(
+ sdbusplus::message_t& message,
+ boost::container::flat_map<std::string, std::shared_ptr<GpuDevice>>&
+ gpuDevices);