watchdog:Trigger System Dump when host load fails

Using the xyz.openbmc_project.State.Host interface to determine the
operational status of the host becomes less precise when shifting from
hostboot to host. In the interim phase when host is initializing and
hasn't reached full functionality, the host's state is inaccurately
assumed to be in hostboot. In cases where host encounters initial
boot challenges and the watchdog timer triggers because the boot
process hasn't finished within the set time, this watchdog
misinterprets the situation as a hostboot problem.

To address this, there exists a core scratch register that undergoes
an update by hostboot just before transferring control to host.
We have devised a method that leverages this register to determine
whether the transition to host has already occurred.

By implementing this functionality we can determine which booting
subsystem is failed or stopped responding, and the dump can be
extracted from the right subsystem

Tested:
Oct 03 09:35:36 p10bmc watchdog_timeout[7099]: Host did not respond
within watchdog timeout interval
Oct 03 09:35:36 p10bmc watchdog_timeout[7099]: PHYP boot failure,
triggering system dump
Oct 03 09:35:37 p10bmc phosphor-log-manager[372]: Created PEL
0x50001924 (BMC ID 252) with SRC BD5EC101
Oct 03 09:35:37 p10bmc ibm-panel[1208]: Resolution is empty for
PEL = /xyz/openbmc_project/logging/entry/252
Oct 03 09:35:37 p10bmc phosphor-host-state-manager[795]: Received
signal that host has crashed, decrement reboot count
...
...
Oct 03 09:35:38 p10bmc systemd[1]: Starting Start memory preserving
reboot host0...
Oct 03 09:35:38 p10bmc pldmd[719]: BIOS:pvm_sys_dump_active, updated
to value: Enabled(16), by BMC: true
Oct 03 09:35:38 p10bmc phosphor-dump-manager[473]: OriginatorId is
not provided
Oct 03 09:35:38 p10bmc phosphor-dump-manager[473]: OriginatorType is
not provided. Replacing the string with the default value
Oct 03 09:35:38 p10bmc sh[7151]:
o "/xyz/openbmc_project/dump/bmc/entry/2"
Oct 03 09:35:38 p10bmc openpower-proc-control[7153]: Starting memory
preserving reboot

Change-Id: I312ad7201e9258d23f6e784fab504d0fb8f0f712
Signed-off-by: Deepa Karthikeyan <deepakala.karthikeyan@ibm.com>
3 files changed
tree: 6f77874b34257d53b6d2b2845d5b3ac92ec178b0
  1. dump/
  2. subprojects/
  3. watchdog/
  4. .clang-format
  5. .gitignore
  6. .shellcheck-ignore
  7. checkstop_app.cpp
  8. LICENSE
  9. meson.build
  10. meson.options
  11. OWNERS
  12. README.md
  13. watchdog_timeout.cpp
README.md

openpower-debug-collector

Building the Code

To build this package, do the following steps:

    1. meson build
    2. ninja -C build

To clean the repository run `rm -rf build`.