NVMeBasicContext: Properly cleanup resources, allowing destruction
nvmesensor was terminating with an uncaught exception, e.g:
Jun 03 04:52:09 bmc nvmesensor[507]: terminate called after throwing an instance of 'sdbusplus::exception::SdBusError'
Jun 03 04:52:09 bmc nvmesensor[507]: what(): sd_bus_add_object_vtable: org.freedesktop.DBus.Error.FileExists: File exists
This would occur whenever entity-manager published a configuration for a
new drive in the system. The implementation of nvmesensor isn't the
smartest, and it tries to just scrap all NVMeContexts it knows of, along
with their NVMeSensor instances, and reconstruct them all from newly
captured entity-manager configuration data.
The problem lies in the fact that the NVMeContexts were not getting
destructed due embedding of std::shared_ptrs obtained via
shared_from_this() into async callback lambda captures whose context was
owned by the NVMeContext implementation.
Switch to capturing `this` via weak_from_this() in callback lambdas to
prevent the circular references. By doing so we are able to successfully
destruct the NVMeContext derivative instances via the clear() method on
the context map.
However, there's more to the story, as the NVMeSensors owned by a given
context are asynchronously iterated for polling purposes. To make sure
we don't have races from unbounded latency on asynchronous destruction
of NVMeSensors, we order the std::jthread and async stream member
variables of NVMeBasicContext such that destruction of the streams exits
the IO thread due to read() or write() syscall failures on the pipes.
The use of std::jthread ensures we join() to complete the cleanup prior
to returning from the NVMeBasicContext destructor.
Preventing active polling of NVMeSensor instances ensures their
destruction when clearing the NVMe context map, opening the path for
successful (re-)construction of NVMeSensor DBus objects for the
published sensor configurations.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Change-Id: I63d0e7eb3318c209b08551072a3cb7279da21269
diff --git a/src/NVMeBasicContext.cpp b/src/NVMeBasicContext.cpp
index 09b58be..322c133 100644
--- a/src/NVMeBasicContext.cpp
+++ b/src/NVMeBasicContext.cpp
@@ -15,7 +15,6 @@
#include <cstdio>
#include <cstring>
#include <system_error>
-#include <thread>
extern "C"
{
@@ -214,8 +213,8 @@
FileHandle streamOut(responsePipe[1]);
respStream.assign(responsePipe[0]);
- std::thread thread([streamIn{std::move(streamIn)},
- streamOut{std::move(streamOut)}]() mutable {
+ thread = std::jthread([streamIn{std::move(streamIn)},
+ streamOut{std::move(streamOut)}]() mutable {
ssize_t rc = 0;
if ((rc = processBasicQueryStream(streamIn, streamOut)) < 0)
@@ -226,7 +225,6 @@
std::cerr << "Terminating basic query thread\n";
});
- thread.detach();
}
void NVMeBasicContext::readAndProcessNVMeSensor()
@@ -308,7 +306,7 @@
response->prepare(len);
return len;
},
- [self{shared_from_this()}, sensor, response](
+ [weakSelf{weak_from_this()}, sensor, response](
const boost::system::error_code& ec, std::size_t length) mutable {
if (ec)
{
@@ -322,17 +320,20 @@
return;
}
- /* Deserialise the response */
- response->consume(1); /* Drop the length byte */
- std::istream is(response.get());
- std::vector<char> data(response->size());
- is.read(data.data(), response->size());
+ if (auto self = weakSelf.lock())
+ {
+ /* Deserialise the response */
+ response->consume(1); /* Drop the length byte */
+ std::istream is(response.get());
+ std::vector<char> data(response->size());
+ is.read(data.data(), response->size());
- /* Update the sensor */
- self->processResponse(sensor, data.data(), data.size());
+ /* Update the sensor */
+ self->processResponse(sensor, data.data(), data.size());
- /* Enqueue processing of the next sensor */
- self->readAndProcessNVMeSensor();
+ /* Enqueue processing of the next sensor */
+ self->readAndProcessNVMeSensor();
+ }
});
}
@@ -341,8 +342,8 @@
pollCursor = sensors.begin();
scanTimer.expires_from_now(boost::posix_time::seconds(1));
- scanTimer.async_wait(
- [self{shared_from_this()}](const boost::system::error_code errorCode) {
+ scanTimer.async_wait([weakSelf{weak_from_this()}](
+ const boost::system::error_code errorCode) {
if (errorCode == boost::asio::error::operation_aborted)
{
return;
@@ -354,7 +355,10 @@
return;
}
- self->readAndProcessNVMeSensor();
+ if (auto self = weakSelf.lock())
+ {
+ self->readAndProcessNVMeSensor();
+ }
});
}