Design: NVMe-MI over SMBus

This commit proposes a design for NVMe-MI over SMBus.

Change-Id: I9fdba3cf00bc89ccaf82aa5f1f6b71dc47ddac1b
Signed-off-by: tony lee <tony.lee@quantatw.com>
diff --git a/designs/nvmemi_over_SMbus.md b/designs/nvmemi_over_SMbus.md
new file mode 100644
index 0000000..9841397
--- /dev/null
+++ b/designs/nvmemi_over_SMbus.md
@@ -0,0 +1,226 @@
+### NVMe-MI over SMBus
+
+Author:
+  Tony Lee <tony.lee@quantatw.com>
+Primary assignee:
+  Tony Lee <tony.lee@quantatw.com>
+Created:
+  3-8-2019
+
+#### Problem Description
+
+Currently, OpenBMC does not support NVMe drive information. NVMe-MI
+specification defines a command that can read the NVMe drive information via
+SMBus directly. The NVMe drive can provide its information or status, like
+vendor ID, temperature, etc. The aim of this proposal is to allow users to
+monitor NVMe drives so appropriate action can be taken.
+
+#### Background and References
+
+NVMe-MI specification defines a command called
+`NVM Express Basic Management Command` that can read the NVMe drives
+information via SMBus directly. [1]. This command uses SMBus Block Read
+protocol specified by the SMBus specification. [2].
+
+For our purpose is retrieve NVMe drives information, therefore, using NVM
+Express Basic Management Command where describe in NVMe-MI specification to
+communicate with NVMe drives. According to different platforms, temperature
+sensor, present status, LED and power sequence will be customized.
+
+[1] NVM Express Management Interface Revision 1.0a April 8, 2017 in Appendix A.
+(https://nvmexpress.org/wp-content/uploads/NVM_Express_Management_Interface_1_0a_2017.04.08_-_gold.pdf)
+[2] System Management Bus (SMBus) Specification Version 3.0 20 Dec 2014
+(http://smbus.org/specs/SMBus_3_0_20141220.pdf)
+
+#### Requirements
+
+The implementation should:
+
+- Provide a daemon to monitor NVMe drives. Parameters to be monitored are
+  Status Flags, SMART Warnings, Temperature, Percentage Drive Life Used, Vendor
+  ID, and Serial Number.
+- Provide a D-bus interface to allow other services to access data.
+- Capability of communication over hardware channel I2C to NVMe drives.
+- Ability to turn the fault LED on/off for each drive by SmartWarnings if the
+  object path of fault LED is defined in the configuration file.
+
+#### Proposed Design
+
+Create a D-bus service "xyz.openbmc_project.nvme.manager" with object paths for
+each NVMe sensor: "/xyz/openbmc_project/sensors/temperature/nvme0",
+"/xyz/openbmc_project/sensors/temperature/nvme1", ect.
+There is a JSON configuration file for drive index, bus ID, and the fault LED
+object path for each drive.
+For example,
+
+```json
+{
+  "NvmeDriveIndex": 0,
+  "NVMeDriveBusID": 16,
+  "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_fault",
+  "NVMeDrivePresentPin": 148,
+  "NVMeDrivePwrGoodPin": 161
+},
+{
+  "NvmeDriveIndex": 1,
+  "NVMeDriveBusID": 17,
+  "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_fault",
+  "NVMeDrivePresentPin": 149,
+  "NVMeDrivePwrGoodPin": 162
+}
+```
+
+Structure like:
+
+Under the D-bus named "xyz.openbmc_project.nvme.manager":
+
+```
+    /xyz/openbmc_project
+    └─/xyz/openbmc_project/sensors
+      └─/xyz/openbmc_project/sensors/temperature/nvme0
+```
+
+/xyz/openbmc_project/sensors/temperature/nvme0
+Which implements:
+
+- xyz.openbmc_project.Sensor.Value
+- xyz.openbmc_project.Sensor.Threshold.Warning
+- xyz.openbmc_project.Sensor.Threshold.Critical
+
+Under the D-bus named "xyz.openbmc_project.Inventory.Manager":
+
+```
+/xyz/openbmc_project
+    └─/xyz/openbmc_project/inventory
+      └─/xyz/openbmc_project/inventory/system
+        └─/xyz/openbmc_project/inventory/system/chassis
+          └─/xyz/openbmc_project/inventory/system/chassis/motherboard
+           └─/xyz/openbmc_project/inventory/system/chassis/motherboard/nvme0
+```
+
+/xyz/openbmc_project/inventory/system/chassis/motherboard/nvme0
+Which implements:
+
+- xyz.openbmc_project.Inventory.Item
+- xyz.openbmc_project.Inventory.Decorator.Asset
+- xyz.openbmc_project.Nvme.Status
+
+Interface `xyz.openbmc_project.Sensor.Value`, it's for hwmon to monitor
+temperature and with the following properties:
+
+| Property | Type | Description |
+| -------- | ---- | ----------- |
+| MaxValue | int64 | Sensor maximum value |
+| MinValue | int64 | Sensor minimum value |
+| Scale | int64 | Sensor value scale |
+| Unit | string | Sensor unit |
+| Value | int64 | Sensor value |
+
+Interface `xyz.openbmc_project.Nvme.Status` with the following properties:
+
+| Property | Type | Description |
+| -------- | ---- | ----------- |
+| SmartWarnings| string | Indicates smart warnings for the state |
+| StatusFlags | string | Indicates the status of the drives |
+| DriveLifeUsed | string | A vendor specific estimate of the percentage |
+| TemperatureFault| bool | If warning type about temperature happened |
+| BackupdrivesFault | bool | If warning type about backup drives happened |
+| CapacityFault| bool | If warning type about capacity happened |
+| DegradesFault| bool | If warning type about degrades happened |
+| MediaFault| bool | If warning type about media happened |
+
+Interface `xyz.openbmc_project.Inventory.Item` with the following properties:
+
+| Property | Type | Description |
+| -------- | ---- | ----------- |
+| PrettyName| string | The human readable name of the item |
+| Present | bool | Whether or not the item is present |
+
+Interface `xyz.openbmc_project.Inventory.Decorator.Asset` with the following
+properties:
+
+| Property | Type | Description |
+| -------- | ---- | ----------- |
+| PartNumber| string | The item part number, typically a stocking number |
+| SerialNumber | string | The item serial number |
+| Manufacturer | string | The item manufacturer |
+| BuildDate| bool | The date of item manufacture in YYYYMMDD format |
+| Model | bool | The model of the item |
+
+##### xyz.openbmc_project.nvme.manager.service
+
+This service has several steps:
+
+1. It will register a D-bus called `xyz.openbmc_project.nvme.manager`
+   description above.
+2. Obtain the drive index, bus ID, GPIO present pin, power good pin and fault
+   LED object path from the json file mentioned above.
+3. Each cycle will do following steps:
+   1. Check if the present pin of target drive is true, if true, means drive
+      exists and go to next step. If not, means drive does not exists and
+      remove object path from D-bus by drive index.
+   2. Check if the power good pin of target drive is true, if true means drive
+      is ready then create object path by drive index and go to next step. If
+      not, means drive power abnormal, turn on fault LED and log in journal.
+   3. Send a NVMe-MI command via SMBus Block Read protocol by bus ID of target
+      drive to get data. Data get from NVMe drives are "Status Flags",
+      "SMART Warnings", "Temperature", "Percentage Drive Life Used",
+      "Vendor ID", and "Serial Number".
+   4. The data will be set to the properties in D-bus.
+
+This service will run automatically and look up NVMe drives every second.
+
+##### Fault LED
+
+When the value obtained from the command corresponds to one of the warning
+types, it will trigger the fault LED of corresponding device and issue events.
+
+##### Add SEL related to NVMe
+
+The events `TemperatureFault`, `BackupdrivesFault`,
+`CapacityFault`, `DegradesFault` and `MediaFault` will be generated for the
+NVMe errors.
+
+- Temperature Fault log : when the property `TemperatureFault` set to true
+- Backupdrives Fault log : when the property `BackupdrivesFault` set to true
+- Capacity Fault log : when the property `CapacityFault` set to true
+- Degrades Fault log : when the property `DegradesFault` set to true
+- Media Fault log: when the property `MediaFault` set to true
+
+#### Alternatives Considered
+
+NVMe-MI specification defines multiple commands that can communicate with
+NVMe drives over MCTP protocol. The NVMe-MI over MCTP has the following key
+capabilities:
+
+- Discover drives that are present and learn capabilities of each drives.
+- Store data about the host environment enabling a Management Controller to
+  query the data later.
+- A standard format for VPD and defined mechanisms to read/write VPD contents.
+- Inventorying, configuring and monitoring.
+
+For monitoring NVMe drives, using NVM Express Basic Management Command over
+SMBus directly is much simpler than NVMe-MI over MCTP protocol.
+
+#### Impacts
+
+This application is monitoring NVMe drives via SMbus and set values to D-bus.
+The impacts should be small in the system.
+
+#### Testing
+
+This implementation is to use NVMe-MI-Basic command over SMBus and then set the
+response data to D-bus.
+Testing will send SMBus command to the drives to get the information and compare
+with the properties in D-bus to make sure they are the same.
+The testing can be performed on different NVMe drives by different
+manufacturers.
+For example: Intel P4500/P4600 and Micron 9200 Max/Pro.
+
+Unit tests will test by function:
+
+- It tests the length of responded data is as same as design in the function
+of getting NVMe information.
+- It tests the function of setting values to D-bus is as same as design.
+- It tests the function of turn the corresponding LED ON/OFF by different
+Smartwarnings values.
\ No newline at end of file