Prevent multiple PM complex resets from being queued
- Clear any prior reset request when notified that OCCs are active
- If OCC state is safe/not valid, prevent immediate request for reset.
Start a safe state delay timer and if it does not recover then request a
reset.
- If unable to read the OCC state after a retry, then request a reset.
(no change to this behavior)
Problem: A system where the OCC went to safe state, and BMC requested
a reset, but HTMT had already requested a reset, so the PM complex got
reset multiple times when not necessary.
Tested on Rainier/Fuji
Change-Id: Id40b00e6d3708358478271bb6d5acef804715d4a
Signed-off-by: Chris Cain <cjcain@us.ibm.com>
diff --git a/occ_manager.cpp b/occ_manager.cpp
index 699f4a8..3ae11f6 100644
--- a/occ_manager.cpp
+++ b/occ_manager.cpp
@@ -437,6 +437,16 @@
lg2::error(
"initiateOccRequest: Initiating PM Complex reset due to OCC{INST}",
"INST", instance);
+
+ // Make sure ALL OCC comm stops to all OCCs before the reset
+ for (auto& obj : statusObjects)
+ {
+ if (obj->occActive())
+ {
+ obj->occActive(false);
+ }
+ }
+
#ifdef PLDM
pldmHandle->resetOCC(instance);
#endif
@@ -507,9 +517,20 @@
#endif
}
- // Start poll timer if not already started
+ // Start poll timer if not already started (since at least one OCC is
+ // running)
if (!_pollTimer->isEnabled())
{
+ // An OCC just went active, PM Complex is just coming online so
+ // clear any outstanding reset requests
+ if (resetRequired)
+ {
+ resetRequired = false;
+ lg2::error(
+ "statusCallBack: clearing resetRequired (since OCC{INST} went active, resetInProgress={RIP})",
+ "INST", instance, "RIP", resetInProgress);
+ }
+
lg2::info("Manager: OCCs will be polled every {TIME} seconds",
"TIME", pollInterval);