| commit | 6e149e9643925345fb67f465c1dee6df317bc728 | [log] [tgz] |
|---|---|---|
| author | Potin Lai <potin.lai@quantatw.com> | Wed Nov 23 10:04:05 2022 +0800 |
| committer | Potin Lai <potin.lai@quantatw.com> | Tue Dec 06 09:29:48 2022 +0800 |
| tree | 56406cb4550a3b9448c4990304aa7a2ccb81fcc4 | |
| parent | 89a24e10571f4333b393566e884498531bfbe82a [diff] |
nvme_manager: add support of configurable smbus error retry
NVMe sometimes too busy to response smbus commands, this trigger fan
failsafe due to sensor failed (smbus error).
Add support for configurable smbus error retries to avoid sensor
failures by single smbus error.
readNvmeData() may return and remain NvmeSSD object uncreated when
Smbus error occurs at service startup, add extra NvmeSSD object check
before setSensorAvailability() to avoid service crashes.
Default retry is 0 retry if maxSmbusErrorRetry not exists in config.
Example of set maximum smbus error retry to 3 times:
{
"config": [
...
],
"threshold": [
...
],
"maxSmbusErrorRetry": 3
}
Tested on Bletchley:
```
root@bletchley:~# journalctl _PID=4790 | grep -v SendSmbusRWCmdRAW
Nov 22 18:57:33 bletchley nvme_main[4790]: Send command code 0 fail!
Nov 22 18:57:33 bletchley nvme_main[4790]: getNVMeInfobyBusID failed, retry...
Nov 22 18:57:36 bletchley nvme_main[4790]: getNVMeInfobyBusID failed, retry...
Nov 22 18:57:39 bletchley nvme_main[4790]: getNVMeInfobyBusID failed, retry...
Nov 22 18:57:42 bletchley nvme_main[4790]: SSD plug.
Nov 22 18:57:42 bletchley nvme_main[4790]: Drive status is good but can not get data.
```
Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Change-Id: Ibc95efc53a212e55dcd5c5cfa7a654839a13342d
phosphor-nvme is the nvme manager service maintains for NVMe drive information update and related notification processing service. The service update information to xyz/openbmc_project/Nvme/Status.interface.yaml, xyz/openbmc_project/Sensor/Value.interface.yaml and other interfaces in xyz.openbmc_project.Inventory.Manager.
The service xyz.openbmc_project.nvme.manager provides object on D-Bus:
where object implements interface xyz.openbmc_project.Sensor.Value.
NVMe drive export as sensor and sensor value is temperature of drive. It can get the sensor value of the drive through ipmitool command sdr elist if the corresponding settings in the sensor map are configured correctly. For example:
To get sensor value:
### With ipmi command on BMC ipmitool sdr elist
The service also updates other NVMe drive information to D-bus xyz.openbmc_project.Inventory.Manager. The service xyz.openbmc_project.Inventory.Manager provides object on D-Bus:
where object implements interfaces:
Interface xyz.openbmc_project.Nvme.Status with the following properties:
| Property | Type | Description |
|---|---|---|
| SmartWarnings | string | Indicates smart warnings for the state |
| StatusFlags | string | Indicates the status of the drives |
| DriveLifeUsed | string | A vendor specific estimate of the percentage |
| TemperatureFault | bool | If warning type about temperature happened |
| BackupdrivesFault | bool | If warning type about backup drives happened |
| CapacityFault | bool | If warning type about capacity happened |
| DegradesFault | bool | If warning type about degrades happened |
| MediaFault | bool | If warning type about media happened |
Interface xyz.openbmc_project.Inventory.Item with the following properties:
| Property | Type | Description |
|---|---|---|
| Present | bool | Whether or not the item is present |
Interface xyz.openbmc_project.Inventory.Decorator.Asset with the following properties:
| Property | Type | Description |
|---|---|---|
| SerialNumber | string | The item serial number |
| Manufacturer | string | The item manufacturer |
Each property in the inventory manager can be obtained via the busctl get-property command. For example:
To get property Present:
### With busctl on BMC busctl get-property xyz.openbmc_project.Inventory.Manager /xyz/openbmc_project/inventory/system/chassis/motherboard/nvme0 xyz.openbmc_project.Inventory.Item Present
There is a JSON configuration file nvme_config.json for drive index, bus ID, and the LED object path and bus name for each drive. For example,
{ "config": [ { "NVMeDriveIndex": 0, "NVMeDriveBusID": 16, "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_fault", "NVMeDriveLocateLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_locate", "NVMeDriveLocateLEDControllerBusName":"xyz.openbmc_project.LED.Controller.led_u2_0_locate", "NVMeDriveLocateLEDControllerPath":"/xyz/openbmc_project/led/physical/led_u2_0_locate", "NVMeDrivePresentPin": 148, "NVMeDrivePwrGoodPin": 161 }, { "NVMeDriveIndex": 1, "NVMeDriveBusID": 17, "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_1_fault", "NVMeDriveLocateLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_1_locate", "NVMeDriveLocateLEDControllerBusName":"xyz.openbmc_project.LED.Controller.led_u2_1_locate", "NVMeDriveLocateLEDControllerPath":"/xyz/openbmc_project/led/physical/led_u2_1_locate", "NVMeDrivePresentPin": 149, "NVMeDrivePwrGoodPin": 162 } ], "threshold":[ { "criticalHigh":70, "criticalLow":0, "maxValue":70, "minValue":0 } ] }
xyz.openbmc_project.nvme.manager description above.This service will run automatically and look up NVMe drives every second.