designs/mctp: Update for a multiplexer-daemon approach This change incorporates a little more of the daemon design, but instead of the in-application handlers, describes a separate multiplexer that handlers can connect to. Change-Id: Ief09092a87b912b183120bfc68dc83e8ab6b461b Signed-off-by: Jeremy Kerr <jk@ozlabs.org>

commit: bef44633c6424a786f57cc25eec651ac3b5becc3 [log] [tgz]
author: Jeremy Kerr <jk@ozlabs.org> Tue Apr 30 19:14:29 2019 +0800
committer: Gunnar Mills <gmills@us.ibm.com> Tue May 14 14:23:22 2019 +0000
tree: c7150cdf3303174a8f4a36e2af3a4501eb196ebd
parent: 7f2469dc1051b352e252b035633b95a1fef8a419 [diff]
diff --git a/designs/mctp.md b/designs/mctp.md
index 6985cef..dc8bc48 100644
--- a/designs/mctp.md
+++ b/designs/mctp.md

@@ -88,12 +88,14 @@
 addressing mechanisms. The actual transmit/receive of those packets is
 up to the hardware binding of the MCTP transport.
 
-For OpenBMC, we would introduce a "MCTP+applications" daemon, which
-implements the transport over a configurable hardware channel (eg.,
-Serial UART, I2C or PCI), and provides handlers for any incoming MCTP
-application requests. This daemon is responsible for the packetisation
-and routing of MCTP messages from external endpoints, and handling the
-application layer requests.
+For OpenBMC, we would introduce a MCTP daemon, which implements the transport
+over a configurable hardware channel (eg., Serial UART, I2C or PCIe), and
+provides a socket-based interface for other processes to send and
+receive complete MCTP messages. This daemon is responsible for the
+packetisation and routing of MCTP messages from external endpoints, and
+handling the forwarding these messages to and from individual handler
+applications. This includes handling local MCTP-stack configuration,
+like local EID assignments.
 
 This daemon has a few components:
 
@@ -102,11 +104,7 @@
  2) one or more binding implementations (eg, MCTP-over-serial), which
     interact with the hardware channel(s).
 
- 3) one or more MCTP message handlers (eg PLDM or NVME-MI), to handle incoming
-    MCTP messages of specific types
-
- 4) the core application, consisting of main loop, handler management and
-    MCTP binding management
+ 3) an interface to handler applications over a unix-domain socket.
 
 The proposed implementation here is to produce an MCTP "library" which
 provides the packetisation and routing functions, between:
@@ -142,26 +140,60 @@
 binding for use in low-level host firmware environments may interact
 directly with hardware registers to perform packet transfers.
 
-The application-specific handlers (listed as (3) above) implement the
-actual functionality provided over the MCTP channel. Each of these would
-register with the MCTP core library to receive MCTP messages of a
-certain type, and would transmit MCTP messages of that same type. While
-the handlers themselves are out of scope for this design, there are a
-few elements that are important here:
+The application-specific handlers implement the actual functionality
+provided over the MCTP channel, and connect to the central daemon over a
+UNIX domain socket. Each of these would register with the MCTP daemon to
+receive MCTP messages of a certain type, and would transmit MCTP
+messages of that same type.
 
- - Handlers are likely to perform IO to other components of the BMC
-   (such as sending and receiving dbus messages). To allow multiple
-   handlers to co-exist, this IO should be implemented using
-   non-blocking interfaces (eg, using poll()).
+The daemon's sockets to these handlers is configured for non-blocking
+IO, to allow the daemon to be decoupled from any blocking behaviour of
+handlers. The daemon would use a message queue to enable message
+reception/transmission to a blocked daemon, but this would be of a
+limited size. Handlers whose sockets exceed this queue would be
+disconnected from the daemon.
 
- - Handlers should be implemented as separate components from the main
-   daemon, so as not to require completely separate functionality (such
-   as PLDM and NVME-MI) existing in the same codebase. Having the core
-   daemon load handlers as shared objects would allow this.
+One design intention of the multiplexer daemon is to allow a future
+kernel-based MCTP implementation without requiring major structural
+changes to handler applications. The socket-based interface facilitates
+this, as the unix-domain socket interface could be fairly easily swapped
+out with a new kernel-based socket type.
 
 MCTP is intended to be an optional component of OpenBMC. Platforms using
 OpenBMC are free to adopt it as they see fit.
 
+### Demultiplexer daemon interface
+
+MCTP handlers (ie, clients of the demultiplexer) connect using a
+unix-domain socket, at the abstract socket address:
+
+  \0mctp-demux
+
+The socket type used should be `SOCK_SEQPACKET`.
+
+Once connected, the client sends a single byte message, indicating what
+type of MCTP messages should be forwarded to the client. Types must be
+greater than zero.
+
+Subsequent messages sent over the socket are MCTP messages sent/received
+by the demultiplexer, that match the specified MCTP message type.
+Clients should use the send/recv syscalls to interact with the socket.
+
+Each message has a fixed small header:
+
+   `uint8_t eid`
+
+For messages coming from the demux daemon, this indicates the source EID
+of the outgoing MCTP message. For messages going to the demux daemon,
+this indicates the destination EID.
+
+The rest of the message data is the complete MCTP message, including
+MCTP message type field.
+
+The daemon does not provide a facility for clients to specify or
+retrieve values for the tag field in individual MCTP packets.
+
+
 ## Alternatives Considered
 
 There have been two main alternatives to this approach:
@@ -182,13 +214,22 @@
 format and no requirement on HTTP. However, this may involve a large
 amount of complexity in host firmware.
 
-In terms of an MCTP daemon implementation, an alternative is to have the
-core MCTP stack exist in a different process from the application
-handlers. For example, the MCTP core could be only responsible for
-proxying MCTP messages to and from a dbus interface, as is currently
-done for IPMI messages. However, the complexity, messaging overheads and
-state management involved here has indicated that the added separation
-has not been a clear advantage.
+In terms of an MCTP daemon structure, an alternative is to have the
+MCTP implementation contained within a single process, using the libmctp
+API directly for passing messages from the core code to
+application-level handlers. The drawback of this approach is that this
+single process needs to implement all possible functionality that is
+available over MCTP, which may be quite a disjoint set. This would
+likely lead to unnecessary restrictions on the implementation of those
+application-level handlers (programming language, frameworks used, etc).
+Also, this single-process approach would likely need more significant
+modifications if/when MCTP protocol support is moved to the kernel.
+
+The interface between the demultiplexer daemon and clients is currently
+defined as a socket-based interface. However, an alternative here would
+be to pass MCTP messages over dbus instead. The reason for the choice of
+sockets rather than dbus is that the former allows a direct transition
+to a kernel-based socket API when suitable.
 
 ## Impacts
commit	bef44633c6424a786f57cc25eec651ac3b5becc3	[log] [tgz]
author	Jeremy Kerr <jk@ozlabs.org>	Tue Apr 30 19:14:29 2019 +0800
committer	Gunnar Mills <gmills@us.ibm.com>	Tue May 14 14:23:22 2019 +0000
tree	c7150cdf3303174a8f4a36e2af3a4501eb196ebd
parent	7f2469dc1051b352e252b035633b95a1fef8a419 [diff]