Find and call out faulted GPUs

Isolate down to the GPU that caused the GPU PGOOD or
overtemp summary fault bit to turn on.  On Witherspoon
this involves reading GPIOs on a pca9552 device to find
the GPU signaling the fault.

GPUs are not currently in the inventory, so the code
isn't doing the standard callout by adding a certain
metadata field.  The GPU number that failed will just
be added to the error log metadata, and work will be done
with support to make sure that is documented.  Also, the
other power fault callouts don't use the standard inventory
callouts either as they are more complicated than just a single
FRU, so this method is consistent with that.

Note that these faults do not cause the system to
power off automatically like other power faults, though
a future commit will power off the system on a GPU overtemp.

Change-Id: If4053f32a06a335a6612a04a8164d34306530b22
Signed-off-by: Matt Spinler <spinler@us.ibm.com>
2 files changed
tree: 99ffee149664706d40a29f46449a166f78094a36
  1. power-sequencer/
  2. test/
  3. xyz/
  4. .gitignore
  5. argument.hpp
  6. bootstrap.sh
  7. configure.ac
  8. device.hpp
  9. device_monitor.hpp
  10. elog-errors.hpp
  11. event.hpp
  12. file.hpp
  13. LICENSE
  14. Makefile.am
  15. names_values.hpp
  16. pmbus.cpp
  17. pmbus.hpp
  18. README.md
  19. timer.cpp
  20. timer.hpp
  21. utility.cpp
  22. utility.hpp
README.md

Code for detecting and analyzing power faults on Witherspoon.

To Build

To build this package, do the following steps:

    1. ./bootstrap.sh
    2. ./configure ${CONFIGURE_FLAGS}
    3. make

To full clean the repository again run `./bootstrap.sh clean`.