Find and call out faulted GPUs
Isolate down to the GPU that caused the GPU PGOOD or
overtemp summary fault bit to turn on. On Witherspoon
this involves reading GPIOs on a pca9552 device to find
the GPU signaling the fault.
GPUs are not currently in the inventory, so the code
isn't doing the standard callout by adding a certain
metadata field. The GPU number that failed will just
be added to the error log metadata, and work will be done
with support to make sure that is documented. Also, the
other power fault callouts don't use the standard inventory
callouts either as they are more complicated than just a single
FRU, so this method is consistent with that.
Note that these faults do not cause the system to
power off automatically like other power faults, though
a future commit will power off the system on a GPU overtemp.
Change-Id: If4053f32a06a335a6612a04a8164d34306530b22
Signed-off-by: Matt Spinler <spinler@us.ibm.com>
2 files changed