commit | 6e149e9643925345fb67f465c1dee6df317bc728 | [log] [tgz] |
---|---|---|
author | Potin Lai <potin.lai@quantatw.com> | Wed Nov 23 10:04:05 2022 +0800 |
committer | Potin Lai <potin.lai@quantatw.com> | Tue Dec 06 09:29:48 2022 +0800 |
tree | 56406cb4550a3b9448c4990304aa7a2ccb81fcc4 | |
parent | 89a24e10571f4333b393566e884498531bfbe82a [diff] |
nvme_manager: add support of configurable smbus error retry NVMe sometimes too busy to response smbus commands, this trigger fan failsafe due to sensor failed (smbus error). Add support for configurable smbus error retries to avoid sensor failures by single smbus error. readNvmeData() may return and remain NvmeSSD object uncreated when Smbus error occurs at service startup, add extra NvmeSSD object check before setSensorAvailability() to avoid service crashes. Default retry is 0 retry if maxSmbusErrorRetry not exists in config. Example of set maximum smbus error retry to 3 times: { "config": [ ... ], "threshold": [ ... ], "maxSmbusErrorRetry": 3 } Tested on Bletchley: ``` root@bletchley:~# journalctl _PID=4790 | grep -v SendSmbusRWCmdRAW Nov 22 18:57:33 bletchley nvme_main[4790]: Send command code 0 fail! Nov 22 18:57:33 bletchley nvme_main[4790]: getNVMeInfobyBusID failed, retry... Nov 22 18:57:36 bletchley nvme_main[4790]: getNVMeInfobyBusID failed, retry... Nov 22 18:57:39 bletchley nvme_main[4790]: getNVMeInfobyBusID failed, retry... Nov 22 18:57:42 bletchley nvme_main[4790]: SSD plug. Nov 22 18:57:42 bletchley nvme_main[4790]: Drive status is good but can not get data. ``` Signed-off-by: Potin Lai <potin.lai@quantatw.com> Change-Id: Ibc95efc53a212e55dcd5c5cfa7a654839a13342d
phosphor-nvme is the nvme manager service maintains for NVMe drive information update and related notification processing service. The service update information to xyz/openbmc_project/Nvme/Status.interface.yaml
, xyz/openbmc_project/Sensor/Value.interface.yaml
and other interfaces in xyz.openbmc_project.Inventory.Manager
.
The service xyz.openbmc_project.nvme.manager
provides object on D-Bus:
where object implements interface xyz.openbmc_project.Sensor.Value
.
NVMe drive export as sensor and sensor value is temperature of drive. It can get the sensor value of the drive through ipmitool command sdr elist
if the corresponding settings in the sensor map are configured correctly. For example:
To get sensor value:
### With ipmi command on BMC ipmitool sdr elist
The service also updates other NVMe drive information to D-bus xyz.openbmc_project.Inventory.Manager
. The service xyz.openbmc_project.Inventory.Manager
provides object on D-Bus:
where object implements interfaces:
Interface xyz.openbmc_project.Nvme.Status
with the following properties:
Property | Type | Description |
---|---|---|
SmartWarnings | string | Indicates smart warnings for the state |
StatusFlags | string | Indicates the status of the drives |
DriveLifeUsed | string | A vendor specific estimate of the percentage |
TemperatureFault | bool | If warning type about temperature happened |
BackupdrivesFault | bool | If warning type about backup drives happened |
CapacityFault | bool | If warning type about capacity happened |
DegradesFault | bool | If warning type about degrades happened |
MediaFault | bool | If warning type about media happened |
Interface xyz.openbmc_project.Inventory.Item
with the following properties:
Property | Type | Description |
---|---|---|
Present | bool | Whether or not the item is present |
Interface xyz.openbmc_project.Inventory.Decorator.Asset
with the following properties:
Property | Type | Description |
---|---|---|
SerialNumber | string | The item serial number |
Manufacturer | string | The item manufacturer |
Each property in the inventory manager can be obtained via the busctl get-property command. For example:
To get property Present:
### With busctl on BMC busctl get-property xyz.openbmc_project.Inventory.Manager /xyz/openbmc_project/inventory/system/chassis/motherboard/nvme0 xyz.openbmc_project.Inventory.Item Present
There is a JSON configuration file nvme_config.json
for drive index, bus ID, and the LED object path and bus name for each drive. For example,
{ "config": [ { "NVMeDriveIndex": 0, "NVMeDriveBusID": 16, "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_fault", "NVMeDriveLocateLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_locate", "NVMeDriveLocateLEDControllerBusName":"xyz.openbmc_project.LED.Controller.led_u2_0_locate", "NVMeDriveLocateLEDControllerPath":"/xyz/openbmc_project/led/physical/led_u2_0_locate", "NVMeDrivePresentPin": 148, "NVMeDrivePwrGoodPin": 161 }, { "NVMeDriveIndex": 1, "NVMeDriveBusID": 17, "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_1_fault", "NVMeDriveLocateLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_1_locate", "NVMeDriveLocateLEDControllerBusName":"xyz.openbmc_project.LED.Controller.led_u2_1_locate", "NVMeDriveLocateLEDControllerPath":"/xyz/openbmc_project/led/physical/led_u2_1_locate", "NVMeDrivePresentPin": 149, "NVMeDrivePwrGoodPin": 162 } ], "threshold":[ { "criticalHigh":70, "criticalLow":0, "maxValue":70, "minValue":0 } ] }
xyz.openbmc_project.nvme.manager
description above.This service will run automatically and look up NVMe drives every second.