Fix CPU utilization calculation
Currently, CPU utilization is calculated by (activeTimeDiff /
(activeTimeDiff + idleTimeDiff)), where idleTimeDiff is defined as "idle
time + IO wait time". The idleTimeDiff term is incorrect -- it
should be "everything else". As of current, one can have "kernel
utilization", "userspace utilization" and "overall utilization"
reach "100%" simultaneously which is does not make sense.
This change calculates CPU usage as follows:
* Kernel-space usage is "kernel time delta" / "total time delta".
* Userspace usage is "userspace delta" / "total time delta".
* Overall usage delta is "(kernel time delta + userspace delta)" /
"total time delta".
Tested:
Compile the "Explicit sampling version of the SmallPT path tracer"
(https://www.kevinbeason.com/smallpt/explicit.cpp) as a workload,
and run two copies of it on the BMC to fully stress the CPU cores.
(Alternatively, any benchmark program can fulfill this purpose, but for
this one, I understand what it does and know it's compute-bound.)
One can see from `htop`, both CPU cores are almost 100% occupied. Most
(around 90%) CPU time is spent in user-space. Remainder of the CPU
usage is attributable to other tasks and background processing.
When one checks the
`/xyz/openbmc_project/sensors/utilization/CPU_Kernel` and
`/xyz/openbmc_project/sensors/utilization/CPU_User` objects, one can see
CPU_User reading ramp up and reach around 90%. CPU_Kernel stabilizes
at 10%. When `smallpt_explicit` is terminated, kernel and userspace CPU
usage re-converge to their normal values.
Change-Id: I7c0e10e08bd2b6c8b3bd1c1a618fffb2739feecc
Signed-off-By: Sui Chen <suichen@google.com>
Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
diff --git a/healthMonitor.cpp b/healthMonitor.cpp
index 1477323..2b3d5fe 100644
--- a/healthMonitor.cpp
+++ b/healthMonitor.cpp
@@ -103,6 +103,10 @@
NUM_CPU_STATES_TIME
};
+// # cat /proc/stat|grep 'cpu '
+// cpu 5750423 14827 1572788 9259794 1317 0 28879 0 0 0
+static_assert(NUM_CPU_STATES_TIME == 10);
+
enum CPUUtilizationType
{
USER = 0,
@@ -149,12 +153,15 @@
return -1;
}
- static std::unordered_map<enum CPUUtilizationType, double> preActiveTime,
- preIdleTime;
- double activeTime, activeTimeDiff, idleTime, idleTimeDiff, totalTime,
- activePercValue;
+ static std::unordered_map<enum CPUUtilizationType, uint64_t> preActiveTime,
+ preTotalTime;
- idleTime = timeData[IDLE_IDX] + timeData[IOWAIT_IDX];
+ // These are actually Jiffies. On the BMC, 1 jiffy usually corresponds to
+ // 0.01 second.
+ uint64_t activeTime = 0, activeTimeDiff = 0, totalTime = 0,
+ totalTimeDiff = 0;
+ double activePercValue = 0;
+
if (type == TOTAL)
{
activeTime = timeData[USER_IDX] + timeData[NICE_IDX] +
@@ -171,16 +178,16 @@
activeTime = timeData[USER_IDX];
}
- idleTimeDiff = idleTime - preIdleTime[type];
+ totalTime = std::accumulate(std::begin(timeData), std::end(timeData), 0);
+
activeTimeDiff = activeTime - preActiveTime[type];
+ totalTimeDiff = totalTime - preTotalTime[type];
/* Store current idle and active time for next calculation */
- preIdleTime[type] = idleTime;
preActiveTime[type] = activeTime;
+ preTotalTime[type] = totalTime;
- totalTime = idleTimeDiff + activeTimeDiff;
-
- activePercValue = activeTimeDiff / totalTime * 100;
+ activePercValue = (100.0 * activeTimeDiff) / totalTimeDiff;
if (DEBUG)
std::cout << "CPU Utilization is " << activePercValue << "\n";