regulators: Document phase fault detection
Three new JSON objects are being added to support phase fault detection:
* phase_fault_detection
* i2c_capture_bytes
* log_phase_fault
Add new markdown files to document these new objects.  Also update
existing documentation to include phase fault detection.
Signed-off-by: Shawn McCarney <shawnmm@us.ibm.com>
Change-Id: I1b3342970640e5942bd5dc5249dae5fb26115324
diff --git a/phosphor-regulators/docs/config_file/README.md b/phosphor-regulators/docs/config_file/README.md
index 4ebebc2..1ccc955 100644
--- a/phosphor-regulators/docs/config_file/README.md
+++ b/phosphor-regulators/docs/config_file/README.md
@@ -20,6 +20,7 @@
 * Modify regulator configuration, such as output voltage or overcurrent
   settings
 * Read sensor values
+* Detect redundant phase faults (if necessary)
 
 The config file does not control how voltage regulators are enabled or how to
 monitor their Power Good (pgood) status.  Those operations are typically
@@ -90,6 +91,7 @@
 * Array of [rules](rule.md)
   * Rules defining how to modify configuration of regulators
   * Rules defining how to read sensors
+  * Rules defining how to detect redundant phase faults (if necessary)
 * Array of [chassis](chassis.md) in the system
   * Array of regulator [devices](device.md) in the chassis
     * Array of voltage [rails](rail.md) produced by the regulator
@@ -112,6 +114,7 @@
 * [config_file](config_file.md)
 * [configuration](configuration.md)
 * [device](device.md)
+* [i2c_capture_bytes](i2c_capture_bytes.md)
 * [i2c_compare_bit](i2c_compare_bit.md)
 * [i2c_compare_byte](i2c_compare_byte.md)
 * [i2c_compare_bytes](i2c_compare_bytes.md)
@@ -120,8 +123,10 @@
 * [i2c_write_byte](i2c_write_byte.md)
 * [i2c_write_bytes](i2c_write_bytes.md)
 * [if](if.md)
+* [log_phase_fault](log_phase_fault.md)
 * [not](not.md)
 * [or](or.md)
+* [phase_fault_detection](phase_fault_detection.md)
 * [pmbus_read_sensor](pmbus_read_sensor.md)
 * [pmbus_write_vout_command](pmbus_write_vout_command.md)
 * [presence_detection](presence_detection.md)
diff --git a/phosphor-regulators/docs/config_file/action.md b/phosphor-regulators/docs/config_file/action.md
index f60e8a4..d32332f 100644
--- a/phosphor-regulators/docs/config_file/action.md
+++ b/phosphor-regulators/docs/config_file/action.md
@@ -7,6 +7,7 @@
 * [presence_detection](presence_detection.md)
 * [configuration](configuration.md)
 * [sensor_monitoring](sensor_monitoring.md)
+* [phase_fault_detection](phase_fault_detection.md)
 
 Many actions read from or write to a hardware device.  Initially this is the
 [device](device.md) that contains the regulator operation.  However, the device
@@ -19,6 +20,8 @@
 | and | see [notes](#notes) | array of actions | Action type [and](and.md). |
 | compare_presence | see [notes](#notes) | [compare_presence](compare_presence.md) | Action type [compare_presence](compare_presence.md). |
 | compare_vpd | see [notes](#notes) | [compare_vpd](compare_vpd.md) | Action type [compare_vpd](compare_vpd.md). |
+| detect_phase_fault | see [notes](#notes) | [detect_phase_fault](#detect_phase_fault) | See [detect_phase_fault](#detect_phase_fault). |
+| i2c_capture_bytes | see [notes](#notes) | [i2c_capture_bytes](i2c_capture_bytes.md) | Action type [i2c_capture_bytes](i2c_capture_bytes.md). |
 | i2c_compare_bit | see [notes](#notes) | [i2c_compare_bit](i2c_compare_bit.md) | Action type [i2c_compare_bit](i2c_compare_bit.md). |
 | i2c_compare_byte | see [notes](#notes) | [i2c_compare_byte](i2c_compare_byte.md) | Action type [i2c_compare_byte](i2c_compare_byte.md). |
 | i2c_compare_bytes | see [notes](#notes) | [i2c_compare_bytes](i2c_compare_bytes.md) | Action type [i2c_compare_bytes](i2c_compare_bytes.md). |
@@ -26,6 +29,7 @@
 | i2c_write_byte | see [notes](#notes) | [i2c_write_byte](i2c_write_byte.md) | Action type [i2c_write_byte](i2c_write_byte.md). |
 | i2c_write_bytes | see [notes](#notes) | [i2c_write_bytes](i2c_write_bytes.md) | Action type [i2c_write_bytes](i2c_write_bytes.md). |
 | if | see [notes](#notes) | [if](if.md) | Action type [if](if.md). |
+| log_phase_fault | see [notes](#notes) | [log_phase_fault](log_phase_fault.md) | Action type [log_phase_fault](log_phase_fault.md). |
 | not | see [notes](#notes) | action | Action type [not](not.md). |
 | or | see [notes](#notes) | array of actions | Action type [or](or.md). |
 | pmbus_read_sensor | see [notes](#notes) | [pmbus_read_sensor](pmbus_read_sensor.md) | Action type [pmbus_read_sensor](pmbus_read_sensor.md). |
diff --git a/phosphor-regulators/docs/config_file/device.md b/phosphor-regulators/docs/config_file/device.md
index 28180d0..7f0d642 100644
--- a/phosphor-regulators/docs/config_file/device.md
+++ b/phosphor-regulators/docs/config_file/device.md
@@ -18,7 +18,8 @@
 | i2c_interface | yes | [i2c_interface](i2c_interface.md) | I2C interface to this device. |
 | presence_detection | no | [presence_detection](presence_detection.md) | Specifies how to detect whether this device is present.  If this property is not specified, the device is assumed to always be present. |
 | configuration | no | [configuration](configuration.md) | Specifies configuration changes that should be applied to this device.  These changes usually override hardware default settings.  The configuration changes are applied during the boot before regulators are enabled. |
-| rails | no | array of [rails](rail.md) | One or more voltage rails produced by this device.  This property can only be specified if the "is_regulator" property is true. |
+| phase_fault_detection | no | [phase_fault_detection](phase_fault_detection.md) | Specifies how to detect and log redundant phase faults in this voltage regulator.  Can only be specified if the "is_regulator" property is true. |
+| rails | no | array of [rails](rail.md) | One or more voltage rails produced by this device.  Can only be specified if the "is_regulator" property is true. |
 
 ## Example
 ```
diff --git a/phosphor-regulators/docs/config_file/i2c_capture_bytes.md b/phosphor-regulators/docs/config_file/i2c_capture_bytes.md
new file mode 100644
index 0000000..be15df5
--- /dev/null
+++ b/phosphor-regulators/docs/config_file/i2c_capture_bytes.md
@@ -0,0 +1,42 @@
+# i2c_capture_bytes
+
+## Description
+Captures device register bytes to be stored in an error log.
+
+Reads the specified device register and temporarily stores the value.  If a
+subsequent action (such as [log_phase_fault](log_phase_fault.md)) creates an
+error log, the captured bytes will be stored in the error log.
+
+This action allows you to capture additional data about a hardware error.  The
+action can be used multiple times if you wish to capture data from multiple
+registers or devices before logging the error.
+
+Communicates with the device directly using the [I2C interface](i2c_interface.md).
+All of the bytes will be read in a single I2C operation.
+
+The bytes will be stored in the error log in the same order as they are
+received from the device.  For example, a PMBus device transmits byte values in
+little-endian order (least significant byte first).
+
+Note: This action should only be used after a hardware error has been detected
+to avoid unnecessary I2C operations and memory usage.
+
+## Properties
+| Name | Required | Type | Description |
+| :--- | :------: | :--- | :---------- |
+| register | yes | string | Device register address expressed in hexadecimal.  Must be prefixed with 0x and surrounded by double quotes.  This is the location of the first byte. |
+| count | yes | number | Number of bytes to read from the device register. |
+
+## Return Value
+true
+
+## Example
+```
+{
+  "comments": [ "Capture 2 bytes from register 0xA0 to store in error log" ],
+  "i2c_capture_bytes": {
+    "register": "0xA0",
+    "count": 2
+  }
+}
+```
diff --git a/phosphor-regulators/docs/config_file/log_phase_fault.md b/phosphor-regulators/docs/config_file/log_phase_fault.md
new file mode 100644
index 0000000..7af0483
--- /dev/null
+++ b/phosphor-regulators/docs/config_file/log_phase_fault.md
@@ -0,0 +1,42 @@
+# log_phase_fault
+
+## Description
+Logs a redundant phase fault error for a voltage regulator.  This action should
+be executed if a fault is detected during
+[phase_fault_detection](phase_fault_detection.md) for the regulator.
+
+A regulator may contain one or more redundant phases:
+* An "N+2" regulator has two redundant phases
+* An "N+1" regulator has one redundant phase
+
+A phase fault occurs when a phase stops functioning properly.  The redundancy
+level of the regulator is reduced.
+
+The phase fault type indicates the level of redundancy remaining **after** the
+fault has occurred:
+
+| Type | Description |
+| :--- | :---------- |
+| n+1 | An "N+2" regulator has lost one redundant phase.  The regulator is now at redundancy level "N+1". |
+| n | Regulator has lost all redundant phases.  The regulator is now at redundancy level N. |
+
+If additional data about the fault was previously captured using
+[i2c_capture_bytes](i2c_capture_bytes.md), that data will be stored in the
+error log.
+
+## Properties
+| Name | Required | Type | Description |
+| :--- | :------: | :--- | :---------- |
+| type | yes | string | Phase fault type.  Specify one of the following: "n+1", "n". |
+
+## Return Value
+true
+
+## Example
+```
+{
+  "log_phase_fault": {
+    "type": "n+1"
+  }
+}
+```
diff --git a/phosphor-regulators/docs/config_file/phase_fault_detection.md b/phosphor-regulators/docs/config_file/phase_fault_detection.md
new file mode 100644
index 0000000..7e0b2f8
--- /dev/null
+++ b/phosphor-regulators/docs/config_file/phase_fault_detection.md
@@ -0,0 +1,84 @@
+# phase_fault_detection
+
+## Description
+Specifies how to detect and log redundant phase faults in a voltage regulator.
+
+A voltage regulator is sometimes called a "phase controller" because it
+controls one or more phases that perform the actual voltage regulation.
+
+A regulator may have redundant phases.  If a redundant phase fails, the
+regulator will continue to provide the desired output voltage.  However, a
+phase fault error should be logged warning the user that the regulator has lost
+redundancy.
+
+The technique used to detect a phase fault varies depending on the regulator
+hardware.  Often a bit is checked in a status register.  The status register
+could exist in the regulator or in a related I/O expander.
+
+Phase fault detection is performed every 15 seconds.  A phase fault must be
+detected two consecutive times (15 seconds apart) before an error is logged.
+This provides "de-glitching" to ignore transient hardware problems.
+
+Phase faults are detected and logged by executing actions:
+* Use the [if](if.md) action to implement the high level behavior "if a fault
+  is detected, then log an error".
+* Detecting the fault
+  * Use a comparison action like [i2c_compare_bit](i2c_compare_bit.md) to
+    detect the fault.  For example, you may need to check a bit in a status
+    register.
+* Logging the error
+  * Use the [i2c_capture_bytes](i2c_capture_bytes.md) action to capture
+    additional data about the fault if necessary.
+  * Use the [log_phase_fault](log_phase_fault.md) action to log a phase fault
+    error.  The error log will include any data previously captured using
+    i2c_capture_bytes.
+
+The actions can be specified in two ways:
+* Use the "rule_id" property to specify a standard rule to run.
+* Use the "actions" property to specify an array of actions that are unique to
+  this regulator.
+
+The default device for the actions is the voltage regulator.  You can specify a
+different device using the "device_id" property.  If you need to access
+multiple devices, use the [set_device](set_device.md) action.
+
+## Properties
+| Name | Required | Type | Description |
+| :--- | :------: | :--- | :---------- |
+| comments | no | array of strings | One or more comment lines describing the phase fault detection. |
+| device_id | no | string | Unique ID of the [device](device.md) to access.  If not specified, the default device is the voltage regulator. |
+| rule_id | see [notes](#notes) | string | Unique ID of the [rule](rule.md) to execute. |
+| actions | see [notes](#notes) | array of [actions](action.md) | One or more actions to execute. |
+
+### Notes
+* You must specify either "rule_id" or "actions".
+
+## Examples
+```
+{
+  "comments": [ "Detect phase fault using I/O expander" ],
+  "device_id": "io_expander",
+  "rule_id": "detect_phase_fault_rule"
+}
+
+{
+  "comments": [ "Detect N phase fault using I/O expander.",
+                "A fault occurred if bit 3 is ON in register 0x02.",
+                "Capture value of registers 0x02 and 0x04 in error log." ],
+  "device_id": "io_expander",
+  "actions": [
+    {
+      "if": {
+        "condition": {
+          "i2c_compare_bit": { "register": "0x02", "position": 3, "value": 1 }
+        },
+        "then": [
+          { "i2c_capture_bytes": { "register": "0x02", "count": 1 } },
+          { "i2c_capture_bytes": { "register": "0x04", "count": 1 } },
+          { "log_phase_fault": { "type": "n" } }
+        ]
+      }
+    }
+  ]
+}
+```
diff --git a/phosphor-regulators/docs/config_file/presence_detection.md b/phosphor-regulators/docs/config_file/presence_detection.md
index 138ff9c..6397dff 100644
--- a/phosphor-regulators/docs/config_file/presence_detection.md
+++ b/phosphor-regulators/docs/config_file/presence_detection.md
@@ -12,9 +12,10 @@
 Device presence is detected by executing actions, such as
 [compare_presence](compare_presence.md) and [compare_vpd](compare_vpd.md).
 
-Device operations like [configuration](configuration.md) and [sensor
-monitoring](sensor_monitoring.md) will only be performed if the actions
-indicate the device is present.
+Device operations like [configuration](configuration.md),
+[sensor monitoring](sensor_monitoring.md), and
+[phase fault detection](phase_fault_detection.md) will only be performed if the
+actions indicate the device is present.
 
 The actions can be specified in two ways:
 * Use the "rule_id" property to specify a standard rule to run.
diff --git a/phosphor-regulators/docs/config_file/rule.md b/phosphor-regulators/docs/config_file/rule.md
index 1526432..1349774 100644
--- a/phosphor-regulators/docs/config_file/rule.md
+++ b/phosphor-regulators/docs/config_file/rule.md
@@ -9,6 +9,7 @@
 * Actions that set the output voltage of a regulator rail
 * Actions that read all the sensors of a regulator rail
 * Actions that detect down-level hardware using version registers
+* Actions that detect phase faults
 
 ## Properties
 | Name | Required | Type | Description |
diff --git a/phosphor-regulators/docs/config_file/set_device.md b/phosphor-regulators/docs/config_file/set_device.md
index 1522a64..984e162 100644
--- a/phosphor-regulators/docs/config_file/set_device.md
+++ b/phosphor-regulators/docs/config_file/set_device.md
@@ -9,8 +9,8 @@
 [sensor_monitoring](sensor_monitoring.md).
 
 Use "set_device" if you need to change the hardware device used by actions.
-For example, you need to check a bit in an I/O expander before setting the
-output voltage of a regulator.
+For example, you need to check a bit in two different I/O expanders to detect a
+phase fault.
 
 ## Property Value
 String containing the unique ID of the [device](device.md).
diff --git a/phosphor-regulators/docs/design.md b/phosphor-regulators/docs/design.md
index a9ed1c2..9f6d38e 100644
--- a/phosphor-regulators/docs/design.md
+++ b/phosphor-regulators/docs/design.md
@@ -149,3 +149,22 @@
 * The Value property will be set to NaN.
 * The Available property will be set to false.
 
+### Phase Fault Monitoring
+
+When regulator monitoring is enabled, phase fault detection is performed every
+15 seconds.  The timer in the Manager object calls the `detectPhaseFaults()`
+method on all the objects representing the system (System, Chassis, Device).
+
+A phase fault must be detected two consecutive times (15 seconds apart) before
+an error is logged.  This provides "de-glitching" to ignore transient hardware
+problems.
+
+A phase fault error will only be logged for a regulator once per system boot.
+
+If a different error occurs while detecting phase faults in a regulator:
+* The error will be logged.  If the same error occurs repeatedly on regulator,
+  it will only be logged once per system boot.
+* Any remaining actions for the regulator will be skipped.
+* Phase fault detection will continue with the next regulator.
+* Phase fault detection will be attempted again for this regulator during the
+  next monitoring cycle.