pldmd: improve reaction to mctpd socket close
Previous behavior was to just log a "socket has been closed" message,
while the pldmd is still running an event loop. We're seeing the journal
contain a large number of these sometimes on Tacoma platforms. This is
likely a bug (either in pldmd or mctpd), yet to be determined.
Instead of flooding the journal, a better mechanism is to have the pldmd
exit the event loop and return with a failure reason code. This will
cause systemd to restart pldmd, which provides a chance for recovery.
Signed-off-by: Deepak Kodihalli <dkodihal@in.ibm.com>
Change-Id: Iada6ee50808758312690883109f0499a8396e99e
diff --git a/pldmd/pldmd.cpp b/pldmd/pldmd.cpp
index e4a865b..86f6210 100644
--- a/pldmd/pldmd.cpp
+++ b/pldmd/pldmd.cpp
@@ -241,7 +241,7 @@
dbus_api::Pdr dbusImplPdr(bus, "/xyz/openbmc_project/pldm", pdrRepo.get());
sdbusplus::xyz::openbmc_project::PLDM::server::Event dbusImplEvent(
bus, "/xyz/openbmc_project/pldm");
- auto callback = [verbose, &invoker, &dbusImplReq](IO& /*io*/, int fd,
+ auto callback = [verbose, &invoker, &dbusImplReq](IO& io, int fd,
uint32_t revents) {
if (!(revents & EPOLLIN))
{
@@ -260,7 +260,12 @@
ssize_t peekedLength = recv(fd, nullptr, 0, MSG_PEEK | MSG_TRUNC);
if (0 == peekedLength)
{
- std::cerr << "Socket has been closed \n";
+ // MCTP daemon has closed the socket this daemon is connected to.
+ // This may or may not be an error scenario, in either case the
+ // recovery mechanism for this daemon is to restart, and hence exit
+ // the event loop, that will cause this daemon to exit with a
+ // failure code.
+ io.get_event().exit(0);
}
else if (peekedLength <= -1)
{