Author: Tony Lee tony.lee@quantatw.com
Created: 3-8-2019
Currently, OpenBMC does not support NVMe drive information. NVMe-MI specification defines a command that can read the NVMe drive information via SMBus directly. The NVMe drive can provide its information or status, like vendor ID, temperature, etc. The aim of this proposal is to allow users to monitor NVMe drives so appropriate action can be taken.
NVMe-MI specification defines a command called NVM Express Basic Management Command
that can read the NVMe drives information via SMBus directly. [1]. This command uses SMBus Block Read protocol specified by the SMBus specification. [2].
For our purpose is retrieve NVMe drives information, therefore, using NVM Express Basic Management Command where describe in NVMe-MI specification to communicate with NVMe drives. According to different platforms, temperature sensor, present status, LED and power sequence will be customized.
[1] NVM Express Management Interface Revision 1.0a April 8, 2017 in Appendix A. (https://nvmexpress.org/wp-content/uploads/NVM_Express_Management_Interface_1_0a_2017.04.08_-_gold.pdf) [2] System Management Bus (SMBus) Specification Version 3.0 20 Dec 2014 (http://smbus.org/specs/SMBus_3_0_20141220.pdf)
The implementation should:
Create a D-bus service "xyz.openbmc_project.nvme.manager" with object paths for each NVMe sensor: "/xyz/openbmc_project/sensors/temperature/nvme0", "/xyz/openbmc_project/sensors/temperature/nvme1", etc. There is a JSON configuration file for drive index, bus ID, and the fault LED object path for each drive. For example,
{ "NvmeDriveIndex": 0, "NVMeDriveBusID": 16, "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_fault", "NVMeDrivePresentPin": 148, "NVMeDrivePwrGoodPin": 161 }, { "NvmeDriveIndex": 1, "NVMeDriveBusID": 17, "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_fault", "NVMeDrivePresentPin": 149, "NVMeDrivePwrGoodPin": 162 }
Structure like:
Under the D-bus named "xyz.openbmc_project.nvme.manager":
/xyz/openbmc_project └─/xyz/openbmc_project/sensors └─/xyz/openbmc_project/sensors/temperature/nvme0
/xyz/openbmc_project/sensors/temperature/nvme0 Which implements:
Under the D-bus named "xyz.openbmc_project.Inventory.Manager":
/xyz/openbmc_project └─/xyz/openbmc_project/inventory └─/xyz/openbmc_project/inventory/system └─/xyz/openbmc_project/inventory/system/chassis └─/xyz/openbmc_project/inventory/system/chassis/motherboard └─/xyz/openbmc_project/inventory/system/chassis/motherboard/nvme0
/xyz/openbmc_project/inventory/system/chassis/motherboard/nvme0 Which implements:
Interface xyz.openbmc_project.Sensor.Value
, it's for hwmon to monitor temperature and with the following properties:
Property | Type | Description |
---|---|---|
MaxValue | int64 | Sensor maximum value |
MinValue | int64 | Sensor minimum value |
Scale | int64 | Sensor value scale |
Unit | string | Sensor unit |
Value | int64 | Sensor value |
Interface xyz.openbmc_project.Nvme.Status
with the following properties:
Property | Type | Description |
---|---|---|
SmartWarnings | string | Indicates smart warnings for the state |
StatusFlags | string | Indicates the status of the drives |
DriveLifeUsed | string | A vendor specific estimate of the percentage |
TemperatureFault | bool | If warning type about temperature happened |
BackupdrivesFault | bool | If warning type about backup drives happened |
CapacityFault | bool | If warning type about capacity happened |
DegradesFault | bool | If warning type about degrades happened |
MediaFault | bool | If warning type about media happened |
Interface xyz.openbmc_project.Inventory.Item
with the following properties:
Property | Type | Description |
---|---|---|
PrettyName | string | The human readable name of the item |
Present | bool | Whether or not the item is present |
Interface xyz.openbmc_project.Inventory.Decorator.Asset
with the following properties:
Property | Type | Description |
---|---|---|
PartNumber | string | The item part number, typically a stocking number |
SerialNumber | string | The item serial number |
Manufacturer | string | The item manufacturer |
BuildDate | bool | The date of item manufacture in YYYYMMDD format |
Model | bool | The model of the item |
This service has several steps:
xyz.openbmc_project.nvme.manager
description above.This service will run automatically and look up NVMe drives every second.
When the value obtained from the command corresponds to one of the warning types, it will trigger the fault LED of corresponding device and issue events.
The events TemperatureFault
, BackupdrivesFault
, CapacityFault
, DegradesFault
and MediaFault
will be generated for the NVMe errors.
TemperatureFault
set to trueBackupdrivesFault
set to trueCapacityFault
set to trueDegradesFault
set to trueMediaFault
set to trueNVMe-MI specification defines multiple commands that can communicate with NVMe drives over MCTP protocol. The NVMe-MI over MCTP has the following key capabilities:
For monitoring NVMe drives, using NVM Express Basic Management Command over SMBus directly is much simpler than NVMe-MI over MCTP protocol.
This application is monitoring NVMe drives via SMbus and set values to D-bus. The impacts should be small in the system.
This implementation is to use NVMe-MI-Basic command over SMBus and then set the response data to D-bus. Testing will send SMBus command to the drives to get the information and compare with the properties in D-bus to make sure they are the same. The testing can be performed on different NVMe drives by different manufacturers. For example: Intel P4500/P4600 and Micron 9200 Max/Pro.
Unit tests will test by function: