op-apps: Add pdbg to the op-apps package group
This is justified by Nick Piggin below, with some rework of the original
email to abstract unnecessary detail.
Hi,
We are having a continuing discussion about shipping host debug tools on
our standard OpenPower BMC image, and I promised Brad some justification
for the request. I'm including a wider cc list to keep people on the
same page.
The exact host debug tool can be debated, but the capability to send
system reset interrupts and read host registers is a baseline, so I have
"pdbg" in mind, as that's what I have used.
Justification:
- The most basic capability is the system reset, which is an existing
tool for pSeries (KVM and PowerVM) guests. The similar 'ipmi nmi' is
available on x86 BMCs. This is required functionality expected by
customers. An important hang at Pfizer was solved last year because
they were able to system reset the Linux lpar to get a crash dump.
- It's common to be pointed to a crashed system to debug. More
convenient to have a good baseline set of debug tools, and not modify
the BMC of the system that is not yours.
- Hardware and software partners similarly would like to have this
functionality. They could download and install tools, but it can turn
into a an ongoing inconvenience. Many of them are not
openpower/openbmc experts, and may not have ability or inclination to
find and install tools. Having everything just work out of the box and
not having to follow ibm.com link is a big relief.
- Experience with customers when collaborating to resolve bugs is we
often don't have easy access to their P9 systems, and they are often
unaware of how to flash firmware, or they don't know if they have
permission to modify the BMC, etc.
- On customer sites, live debugging is not uncommon. A bug may not be
solveable with a single crash dump or system hang, so it may take some
iterations working with the customer. It is also common that the
customer may have redundant capacity or a test environment which means
they can leave a machine in crashed state. They may be bringing up a new
installation that is not yet online. This will certainly be the case
with large supercomputers.
- Customers may have policy or legislation that makes uploading code
difficult or impossible.
- Some consumers may customize everything on the BMC, but even so,
having reference host debugging tools would show what's available. In
some cases of small scale trials with P9 systems the BMC has not
had much host debugging capability, making it very difficult to
understand problems like hard hangs of the host.
- A strong host debug capability on the BMC can be a differentiating
point. For example very large sites often prefer to debug problems
themselves.
So I advocate for a reasonable host debug capability to be shipped with
standard OpenPOWER OpenBMC images, and for host firmware teams to have
responsibility and control of the low level tools and libraries that
access host registers.
Thanks,
Nick
Change-Id: I87baf40b6bd1004b234cdec139759de9e587d705
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2 files changed