attn: Make restart after fail more restrictive

Override the default service restart after fail behavior. The current
behavior is maximum of two failures within 30 seconds before the
attention handler is considered failed. The new behavior is maximum of
two failures within 10 minutes before the attention handler is
considered failed. This avoids a potential endless fail and restart
loop due to analyses possibly taking longer than 15 seconds during which
time the attention handler could potentially crash, avoiding the current
30 second fail time window. The 10 minute value is an arbitrarily
large value as we expect the cause of the attention handler crash to
immediately go away or not at all.

Signed-off-by: Ben Tyner <ben.tyner@ibm.com>
Change-Id: Ia3a6e9ee849733655273f3a82c4fbef46c808525
1 file changed
tree: e5cd9d80f625ce413b458b93693f45d0dc4176c9
  1. analyzer/
  2. attn/
  3. subprojects/
  4. test/
  5. util/
  6. .clang-format
  7. .eslintignore
  8. .gitignore
  9. buildinfo.hpp.in
  10. cli.cpp
  11. cli.hpp
  12. config.h.in
  13. LICENSE
  14. listener.cpp
  15. listener.hpp
  16. main.cpp
  17. main_nl.cpp
  18. MAINTAINERS
  19. meson.build
  20. meson_options.txt
  21. OWNERS
  22. README.md
README.md

Hardware Diagnostics for POWER Systems

In the event of a system fatal error reported by the internal system hardware (processor chips, memory chips, I/O chips, system memory, etc.), POWER Systems have the ability to diagnose the root cause of the failure and perform any service action needed to avoid repeated system failures.

Aditional details TBD.

Building

For a standard OpenBMC release build, you want something like:

meson -Dtests=disabled <build_dir>
ninja -C <build_dir>
ninja -C <build_dir> install

For a test / debug build, a typical configuration is:

meson -Dtests=enabled <build_dir>
ninja -C <build_dir> test