regulators: Document phase fault detection
Three new JSON objects are being added to support phase fault detection:
* phase_fault_detection
* i2c_capture_bytes
* log_phase_fault
Add new markdown files to document these new objects. Also update
existing documentation to include phase fault detection.
Signed-off-by: Shawn McCarney <shawnmm@us.ibm.com>
Change-Id: I1b3342970640e5942bd5dc5249dae5fb26115324
diff --git a/phosphor-regulators/docs/config_file/README.md b/phosphor-regulators/docs/config_file/README.md
index 4ebebc2..1ccc955 100644
--- a/phosphor-regulators/docs/config_file/README.md
+++ b/phosphor-regulators/docs/config_file/README.md
@@ -20,6 +20,7 @@
* Modify regulator configuration, such as output voltage or overcurrent
settings
* Read sensor values
+* Detect redundant phase faults (if necessary)
The config file does not control how voltage regulators are enabled or how to
monitor their Power Good (pgood) status. Those operations are typically
@@ -90,6 +91,7 @@
* Array of [rules](rule.md)
* Rules defining how to modify configuration of regulators
* Rules defining how to read sensors
+ * Rules defining how to detect redundant phase faults (if necessary)
* Array of [chassis](chassis.md) in the system
* Array of regulator [devices](device.md) in the chassis
* Array of voltage [rails](rail.md) produced by the regulator
@@ -112,6 +114,7 @@
* [config_file](config_file.md)
* [configuration](configuration.md)
* [device](device.md)
+* [i2c_capture_bytes](i2c_capture_bytes.md)
* [i2c_compare_bit](i2c_compare_bit.md)
* [i2c_compare_byte](i2c_compare_byte.md)
* [i2c_compare_bytes](i2c_compare_bytes.md)
@@ -120,8 +123,10 @@
* [i2c_write_byte](i2c_write_byte.md)
* [i2c_write_bytes](i2c_write_bytes.md)
* [if](if.md)
+* [log_phase_fault](log_phase_fault.md)
* [not](not.md)
* [or](or.md)
+* [phase_fault_detection](phase_fault_detection.md)
* [pmbus_read_sensor](pmbus_read_sensor.md)
* [pmbus_write_vout_command](pmbus_write_vout_command.md)
* [presence_detection](presence_detection.md)
diff --git a/phosphor-regulators/docs/config_file/action.md b/phosphor-regulators/docs/config_file/action.md
index f60e8a4..d32332f 100644
--- a/phosphor-regulators/docs/config_file/action.md
+++ b/phosphor-regulators/docs/config_file/action.md
@@ -7,6 +7,7 @@
* [presence_detection](presence_detection.md)
* [configuration](configuration.md)
* [sensor_monitoring](sensor_monitoring.md)
+* [phase_fault_detection](phase_fault_detection.md)
Many actions read from or write to a hardware device. Initially this is the
[device](device.md) that contains the regulator operation. However, the device
@@ -19,6 +20,8 @@
| and | see [notes](#notes) | array of actions | Action type [and](and.md). |
| compare_presence | see [notes](#notes) | [compare_presence](compare_presence.md) | Action type [compare_presence](compare_presence.md). |
| compare_vpd | see [notes](#notes) | [compare_vpd](compare_vpd.md) | Action type [compare_vpd](compare_vpd.md). |
+| detect_phase_fault | see [notes](#notes) | [detect_phase_fault](#detect_phase_fault) | See [detect_phase_fault](#detect_phase_fault). |
+| i2c_capture_bytes | see [notes](#notes) | [i2c_capture_bytes](i2c_capture_bytes.md) | Action type [i2c_capture_bytes](i2c_capture_bytes.md). |
| i2c_compare_bit | see [notes](#notes) | [i2c_compare_bit](i2c_compare_bit.md) | Action type [i2c_compare_bit](i2c_compare_bit.md). |
| i2c_compare_byte | see [notes](#notes) | [i2c_compare_byte](i2c_compare_byte.md) | Action type [i2c_compare_byte](i2c_compare_byte.md). |
| i2c_compare_bytes | see [notes](#notes) | [i2c_compare_bytes](i2c_compare_bytes.md) | Action type [i2c_compare_bytes](i2c_compare_bytes.md). |
@@ -26,6 +29,7 @@
| i2c_write_byte | see [notes](#notes) | [i2c_write_byte](i2c_write_byte.md) | Action type [i2c_write_byte](i2c_write_byte.md). |
| i2c_write_bytes | see [notes](#notes) | [i2c_write_bytes](i2c_write_bytes.md) | Action type [i2c_write_bytes](i2c_write_bytes.md). |
| if | see [notes](#notes) | [if](if.md) | Action type [if](if.md). |
+| log_phase_fault | see [notes](#notes) | [log_phase_fault](log_phase_fault.md) | Action type [log_phase_fault](log_phase_fault.md). |
| not | see [notes](#notes) | action | Action type [not](not.md). |
| or | see [notes](#notes) | array of actions | Action type [or](or.md). |
| pmbus_read_sensor | see [notes](#notes) | [pmbus_read_sensor](pmbus_read_sensor.md) | Action type [pmbus_read_sensor](pmbus_read_sensor.md). |
diff --git a/phosphor-regulators/docs/config_file/device.md b/phosphor-regulators/docs/config_file/device.md
index 28180d0..7f0d642 100644
--- a/phosphor-regulators/docs/config_file/device.md
+++ b/phosphor-regulators/docs/config_file/device.md
@@ -18,7 +18,8 @@
| i2c_interface | yes | [i2c_interface](i2c_interface.md) | I2C interface to this device. |
| presence_detection | no | [presence_detection](presence_detection.md) | Specifies how to detect whether this device is present. If this property is not specified, the device is assumed to always be present. |
| configuration | no | [configuration](configuration.md) | Specifies configuration changes that should be applied to this device. These changes usually override hardware default settings. The configuration changes are applied during the boot before regulators are enabled. |
-| rails | no | array of [rails](rail.md) | One or more voltage rails produced by this device. This property can only be specified if the "is_regulator" property is true. |
+| phase_fault_detection | no | [phase_fault_detection](phase_fault_detection.md) | Specifies how to detect and log redundant phase faults in this voltage regulator. Can only be specified if the "is_regulator" property is true. |
+| rails | no | array of [rails](rail.md) | One or more voltage rails produced by this device. Can only be specified if the "is_regulator" property is true. |
## Example
```
diff --git a/phosphor-regulators/docs/config_file/i2c_capture_bytes.md b/phosphor-regulators/docs/config_file/i2c_capture_bytes.md
new file mode 100644
index 0000000..be15df5
--- /dev/null
+++ b/phosphor-regulators/docs/config_file/i2c_capture_bytes.md
@@ -0,0 +1,42 @@
+# i2c_capture_bytes
+
+## Description
+Captures device register bytes to be stored in an error log.
+
+Reads the specified device register and temporarily stores the value. If a
+subsequent action (such as [log_phase_fault](log_phase_fault.md)) creates an
+error log, the captured bytes will be stored in the error log.
+
+This action allows you to capture additional data about a hardware error. The
+action can be used multiple times if you wish to capture data from multiple
+registers or devices before logging the error.
+
+Communicates with the device directly using the [I2C interface](i2c_interface.md).
+All of the bytes will be read in a single I2C operation.
+
+The bytes will be stored in the error log in the same order as they are
+received from the device. For example, a PMBus device transmits byte values in
+little-endian order (least significant byte first).
+
+Note: This action should only be used after a hardware error has been detected
+to avoid unnecessary I2C operations and memory usage.
+
+## Properties
+| Name | Required | Type | Description |
+| :--- | :------: | :--- | :---------- |
+| register | yes | string | Device register address expressed in hexadecimal. Must be prefixed with 0x and surrounded by double quotes. This is the location of the first byte. |
+| count | yes | number | Number of bytes to read from the device register. |
+
+## Return Value
+true
+
+## Example
+```
+{
+ "comments": [ "Capture 2 bytes from register 0xA0 to store in error log" ],
+ "i2c_capture_bytes": {
+ "register": "0xA0",
+ "count": 2
+ }
+}
+```
diff --git a/phosphor-regulators/docs/config_file/log_phase_fault.md b/phosphor-regulators/docs/config_file/log_phase_fault.md
new file mode 100644
index 0000000..7af0483
--- /dev/null
+++ b/phosphor-regulators/docs/config_file/log_phase_fault.md
@@ -0,0 +1,42 @@
+# log_phase_fault
+
+## Description
+Logs a redundant phase fault error for a voltage regulator. This action should
+be executed if a fault is detected during
+[phase_fault_detection](phase_fault_detection.md) for the regulator.
+
+A regulator may contain one or more redundant phases:
+* An "N+2" regulator has two redundant phases
+* An "N+1" regulator has one redundant phase
+
+A phase fault occurs when a phase stops functioning properly. The redundancy
+level of the regulator is reduced.
+
+The phase fault type indicates the level of redundancy remaining **after** the
+fault has occurred:
+
+| Type | Description |
+| :--- | :---------- |
+| n+1 | An "N+2" regulator has lost one redundant phase. The regulator is now at redundancy level "N+1". |
+| n | Regulator has lost all redundant phases. The regulator is now at redundancy level N. |
+
+If additional data about the fault was previously captured using
+[i2c_capture_bytes](i2c_capture_bytes.md), that data will be stored in the
+error log.
+
+## Properties
+| Name | Required | Type | Description |
+| :--- | :------: | :--- | :---------- |
+| type | yes | string | Phase fault type. Specify one of the following: "n+1", "n". |
+
+## Return Value
+true
+
+## Example
+```
+{
+ "log_phase_fault": {
+ "type": "n+1"
+ }
+}
+```
diff --git a/phosphor-regulators/docs/config_file/phase_fault_detection.md b/phosphor-regulators/docs/config_file/phase_fault_detection.md
new file mode 100644
index 0000000..7e0b2f8
--- /dev/null
+++ b/phosphor-regulators/docs/config_file/phase_fault_detection.md
@@ -0,0 +1,84 @@
+# phase_fault_detection
+
+## Description
+Specifies how to detect and log redundant phase faults in a voltage regulator.
+
+A voltage regulator is sometimes called a "phase controller" because it
+controls one or more phases that perform the actual voltage regulation.
+
+A regulator may have redundant phases. If a redundant phase fails, the
+regulator will continue to provide the desired output voltage. However, a
+phase fault error should be logged warning the user that the regulator has lost
+redundancy.
+
+The technique used to detect a phase fault varies depending on the regulator
+hardware. Often a bit is checked in a status register. The status register
+could exist in the regulator or in a related I/O expander.
+
+Phase fault detection is performed every 15 seconds. A phase fault must be
+detected two consecutive times (15 seconds apart) before an error is logged.
+This provides "de-glitching" to ignore transient hardware problems.
+
+Phase faults are detected and logged by executing actions:
+* Use the [if](if.md) action to implement the high level behavior "if a fault
+ is detected, then log an error".
+* Detecting the fault
+ * Use a comparison action like [i2c_compare_bit](i2c_compare_bit.md) to
+ detect the fault. For example, you may need to check a bit in a status
+ register.
+* Logging the error
+ * Use the [i2c_capture_bytes](i2c_capture_bytes.md) action to capture
+ additional data about the fault if necessary.
+ * Use the [log_phase_fault](log_phase_fault.md) action to log a phase fault
+ error. The error log will include any data previously captured using
+ i2c_capture_bytes.
+
+The actions can be specified in two ways:
+* Use the "rule_id" property to specify a standard rule to run.
+* Use the "actions" property to specify an array of actions that are unique to
+ this regulator.
+
+The default device for the actions is the voltage regulator. You can specify a
+different device using the "device_id" property. If you need to access
+multiple devices, use the [set_device](set_device.md) action.
+
+## Properties
+| Name | Required | Type | Description |
+| :--- | :------: | :--- | :---------- |
+| comments | no | array of strings | One or more comment lines describing the phase fault detection. |
+| device_id | no | string | Unique ID of the [device](device.md) to access. If not specified, the default device is the voltage regulator. |
+| rule_id | see [notes](#notes) | string | Unique ID of the [rule](rule.md) to execute. |
+| actions | see [notes](#notes) | array of [actions](action.md) | One or more actions to execute. |
+
+### Notes
+* You must specify either "rule_id" or "actions".
+
+## Examples
+```
+{
+ "comments": [ "Detect phase fault using I/O expander" ],
+ "device_id": "io_expander",
+ "rule_id": "detect_phase_fault_rule"
+}
+
+{
+ "comments": [ "Detect N phase fault using I/O expander.",
+ "A fault occurred if bit 3 is ON in register 0x02.",
+ "Capture value of registers 0x02 and 0x04 in error log." ],
+ "device_id": "io_expander",
+ "actions": [
+ {
+ "if": {
+ "condition": {
+ "i2c_compare_bit": { "register": "0x02", "position": 3, "value": 1 }
+ },
+ "then": [
+ { "i2c_capture_bytes": { "register": "0x02", "count": 1 } },
+ { "i2c_capture_bytes": { "register": "0x04", "count": 1 } },
+ { "log_phase_fault": { "type": "n" } }
+ ]
+ }
+ }
+ ]
+}
+```
diff --git a/phosphor-regulators/docs/config_file/presence_detection.md b/phosphor-regulators/docs/config_file/presence_detection.md
index 138ff9c..6397dff 100644
--- a/phosphor-regulators/docs/config_file/presence_detection.md
+++ b/phosphor-regulators/docs/config_file/presence_detection.md
@@ -12,9 +12,10 @@
Device presence is detected by executing actions, such as
[compare_presence](compare_presence.md) and [compare_vpd](compare_vpd.md).
-Device operations like [configuration](configuration.md) and [sensor
-monitoring](sensor_monitoring.md) will only be performed if the actions
-indicate the device is present.
+Device operations like [configuration](configuration.md),
+[sensor monitoring](sensor_monitoring.md), and
+[phase fault detection](phase_fault_detection.md) will only be performed if the
+actions indicate the device is present.
The actions can be specified in two ways:
* Use the "rule_id" property to specify a standard rule to run.
diff --git a/phosphor-regulators/docs/config_file/rule.md b/phosphor-regulators/docs/config_file/rule.md
index 1526432..1349774 100644
--- a/phosphor-regulators/docs/config_file/rule.md
+++ b/phosphor-regulators/docs/config_file/rule.md
@@ -9,6 +9,7 @@
* Actions that set the output voltage of a regulator rail
* Actions that read all the sensors of a regulator rail
* Actions that detect down-level hardware using version registers
+* Actions that detect phase faults
## Properties
| Name | Required | Type | Description |
diff --git a/phosphor-regulators/docs/config_file/set_device.md b/phosphor-regulators/docs/config_file/set_device.md
index 1522a64..984e162 100644
--- a/phosphor-regulators/docs/config_file/set_device.md
+++ b/phosphor-regulators/docs/config_file/set_device.md
@@ -9,8 +9,8 @@
[sensor_monitoring](sensor_monitoring.md).
Use "set_device" if you need to change the hardware device used by actions.
-For example, you need to check a bit in an I/O expander before setting the
-output voltage of a regulator.
+For example, you need to check a bit in two different I/O expanders to detect a
+phase fault.
## Property Value
String containing the unique ID of the [device](device.md).
diff --git a/phosphor-regulators/docs/design.md b/phosphor-regulators/docs/design.md
index a9ed1c2..9f6d38e 100644
--- a/phosphor-regulators/docs/design.md
+++ b/phosphor-regulators/docs/design.md
@@ -149,3 +149,22 @@
* The Value property will be set to NaN.
* The Available property will be set to false.
+### Phase Fault Monitoring
+
+When regulator monitoring is enabled, phase fault detection is performed every
+15 seconds. The timer in the Manager object calls the `detectPhaseFaults()`
+method on all the objects representing the system (System, Chassis, Device).
+
+A phase fault must be detected two consecutive times (15 seconds apart) before
+an error is logged. This provides "de-glitching" to ignore transient hardware
+problems.
+
+A phase fault error will only be logged for a regulator once per system boot.
+
+If a different error occurs while detecting phase faults in a regulator:
+* The error will be logged. If the same error occurs repeatedly on regulator,
+ it will only be logged once per system boot.
+* Any remaining actions for the regulator will be skipped.
+* Phase fault detection will continue with the next regulator.
+* Phase fault detection will be attempted again for this regulator during the
+ next monitoring cycle.