Improve BMC error handling for OCC comm failures

- Delay starting OCC reset until all OCCs have been detected (or
timeout). It will prevent multiple resets from being triggered and to
help detecting when reset is completed (active sensor being set after
reset is complete)
- Wait for PLDM response to OCC reset and HRESET requests and retry if
they fail
- If HRESET returns NOT_READY, collect SBE FFDC and try OCC reset. A
persistent failure will put the system in safe state.

- Prevent overwriting dvfs over-temp filename for p10 and beyond since
that old file is only present in old kernel
- Prevent assert when opening sysfs files. (added catch and then created
an OCC Comm failure PEL, which will force an OCC reset.)
- Check return code after reading sysfs files to confirm success. If
read fails, try reset to recover.

- Updated traces to include which processor/OCC encountered issues.
- Better recovery to close windows that were leaving system in partial
good state.

JIRA: PFES-66
Change-Id: I0b087d0e05bd8562682062e1c662f9e18164a720
Signed-off-by: Chris Cain <cjcain@us.ibm.com>
diff --git a/occ_status.hpp b/occ_status.hpp
index a07c272..6493040 100644
--- a/occ_status.hpp
+++ b/occ_status.hpp
@@ -132,10 +132,7 @@
         sdpEvent(sdeventplus::Event::get_default()),
         safeStateDelayTimer(
             sdeventplus::utility::Timer<sdeventplus::ClockId::Monotonic>(
-                sdpEvent, std::bind(&Status::safeStateDelayExpired, this))),
-        occReadStateFailTimer(
-            sdeventplus::utility::Timer<sdeventplus::ClockId::Monotonic>(
-                sdpEvent, std::bind(&Status::occReadStateNow, this)))
+                sdpEvent, std::bind(&Status::safeStateDelayExpired, this)))
 #endif
 
 #ifdef PLDM
@@ -278,6 +275,9 @@
     /** @brief The last state read from the OCC */
     unsigned int lastState = 0;
 
+    /** @brief The last OCC read status (0 = no error) */
+    int lastOccReadStatus = 0;
+
     /** @brief Number of retry attempts to open file and update state. */
     const unsigned int occReadRetries = 1;
 
@@ -353,14 +353,8 @@
      * safe mode. Called to verify and then disable and reset the OCCs.
      */
     void safeStateDelayExpired();
-
-    /**
-     * @brief Timer that is started when OCC read Valid state failed.
-     */
-    sdeventplus::utility::Timer<sdeventplus::ClockId::Monotonic>
-        occReadStateFailTimer;
-
 #endif // POWER10
+
     /** @brief Callback for timer that is started when OCC state
      * was not able to be read. Called to attempt another read when needed.
      */