designs/mctp: Move MCTP designs to a subdirectory, making way for kernel approach

We've had long-term plans for a kernel approach to MCTP, but have
documented the current userspace-based implementation here.

This change moves the MCTP design to a new subdirectory (mctp/), and
splits the design into a common overview (mctp.md) and the current
userspace-based design (mctp-userspace.md).

This allows us to introduce a kernel design as a future commit, and
share the overview amongst documents.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Change-Id: I834bce8bd3b6a3a89e00e0b3ff9cd77014d485a1
diff --git a/designs/mctp.md b/designs/mctp.md
deleted file mode 100644
index 2d794b8..0000000
--- a/designs/mctp.md
+++ /dev/null
@@ -1,257 +0,0 @@
-# OpenBMC platform communication channel: MCTP & PLDM
-
-Author: Jeremy Kerr <jk@ozlabs.org> <jk>
-
-## Problem Description
-
-Currently, we have a few different methods of communication between host
-and BMC. This is primarily IPMI-based, but also includes a few
-hardware-specific side-channels, like hiomap. On OpenPOWER hardware at
-least, we've definitely started to hit some of the limitations of IPMI
-(for example, we have need for >255 sensors), as well as the hardware
-channels that IPMI typically uses.
-
-This design aims to use the Management Component Transport Protocol
-(MCTP) to provide a common transport layer over the multiple channels
-that OpenBMC platforms provide. Then, on top of MCTP, we have the
-opportunity to move to newer host/BMC messaging protocols to overcome
-some of the limitations we've encountered with IPMI.
-
-## Background and References
-
-Separating the "transport" and "messaging protocol" parts of the current
-stack allows us to design these parts separately. Currently, IPMI
-defines both of these; we currently have BT and KCS (both defined as
-part of the IPMI 2.0 standard) as the transports, and IPMI itself as the
-messaging protocol.
-
-Some efforts of improving the hardware transport mechanism of IPMI have
-been attempted, but not in a cross-implementation manner so far. This
-does not address some of the limitations of the IPMI data model.
-
-MCTP defines a standard transport protocol, plus a number of separate
-physical layer bindings for the actual transport of MCTP packets. These
-are defined by the DMTF's Platform Management Working group; standards
-are available at:
-
-  https://www.dmtf.org/standards/pmci
-
-The following diagram shows how these standards map to the areas of
-functionality that we may want to implement for OpenBMC. The DSP numbers
-provided are references to DMTF standard documents.
-
-![](mctp-standards.svg)
-
-One of the key concepts here is that separation of transport protocol
-from the physical layer bindings; this means that an MCTP "stack" may be
-using either a I2C, PCI, Serial or custom hardware channel, without the
-higher layers of that stack needing to be aware of the hardware
-implementation.  These higher levels only need to be aware that they are
-communicating with a certain entity, defined by an Entity ID (MCTP EID).
-These entities may be any element of the platform that communicates
-over MCTP - for example, the host device, the BMC, or any other
-system peripheral - static or hot-pluggable.
-
-This document is focused on the "transport" part of the platform design.
-While this does enable new messaging protocols (mainly PLDM), those
-components are not covered in detail much; we will propose those parts
-in separate design efforts. For example, the PLDM design at
-[pldm-stack.md].
-
-As part of the design, the references to MCTP "messages" and "packets"
-are intentional, to match the definitions in the MCTP standard. MCTP
-messages are the higher-level data transferred between MCTP endpoints,
-which packets are typically smaller, and are what is sent over the
-hardware. Messages that are larger than the hardware Maximum Transmit
-Unit (MTU) are split into individual packets by the transmit
-implementation, and reassembled at the receive implementation.
-
-## Requirements
-
-Any channel between host and BMC should:
-
- - Have a simple serialisation and deserialisation format, to enable
-   implementations in host firmware, which have widely varying runtime
-   capabilities
-
- - Allow different hardware channels, as we have a wide variety of
-   target platforms for OpenBMC
-
- - Be usable over simple hardware implementations, but have a facility
-   for higher bandwidth messaging on platforms that require it.
-
- - Ideally, integrate with newer messaging protocols
-
-## Proposed Design
-
-The MCTP core specification just provides the packetisation, routing and
-addressing mechanisms. The actual transmit/receive of those packets is
-up to the hardware binding of the MCTP transport.
-
-For OpenBMC, we would introduce a MCTP daemon, which implements the transport
-over a configurable hardware channel (eg., Serial UART, I2C or PCIe), and
-provides a socket-based interface for other processes to send and
-receive complete MCTP messages. This daemon is responsible for the
-packetisation and routing of MCTP messages from external endpoints, and
-handling the forwarding these messages to and from individual handler
-applications. This includes handling local MCTP-stack configuration,
-like local EID assignments.
-
-This daemon has a few components:
-
- 1) the core MCTP stack
-
- 2) one or more binding implementations (eg, MCTP-over-serial), which
-    interact with the hardware channel(s).
-
- 3) an interface to handler applications over a unix-domain socket.
-
-The proposed implementation here is to produce an MCTP "library" which
-provides the packetisation and routing functions, between:
-
- - an "upper" messaging transmit/receive interface, for tx/rx of a full
-   message to a specific endpoint (ie, (1) above)
-
- - a "lower" hardware binding for transmit/receive of individual
-   packets, providing a method for the core to tx/rx each packet to
-   hardware, and defines the parameters of the common packetisation
-   code (ie. (2) above).
-
-The lower interface would be plugged in to one of a number of
-hardware-specific binding implementations. Most of these would be
-included in the library source tree, but others can be plugged-in too,
-perhaps where the physical layer implementation does not make sense to
-include in the platform-agnostic library.
-
-The reason for a library is to allow the same MCTP implementation to be
-used in both OpenBMC and host firmware; the library should be
-bidirectional. To allow this, the library would be written in portable C
-(structured in a way that can be compiled as "extern C" in C++
-codebases), and be able to be configured to suit those runtime
-environments (for example, POSIX IO may not be available on all
-platforms; we should be able to compile the library to suit). The
-licence for the library should also allow this re-use; a dual Apache &
-GPLv2+ licence may be best.
-
-These "lower" binding implementations may have very different methods of
-transferring packets to the physical layer. For example, a serial
-binding implementation for running on a Linux environment may be
-implemented through read()/write() syscalls to a PTY device. An I2C
-binding for use in low-level host firmware environments may interact
-directly with hardware registers to perform packet transfers.
-
-The application-specific handlers implement the actual functionality
-provided over the MCTP channel, and connect to the central daemon over a
-UNIX domain socket. Each of these would register with the MCTP daemon to
-receive MCTP messages of a certain type, and would transmit MCTP
-messages of that same type.
-
-The daemon's sockets to these handlers is configured for non-blocking
-IO, to allow the daemon to be decoupled from any blocking behaviour of
-handlers. The daemon would use a message queue to enable message
-reception/transmission to a blocked daemon, but this would be of a
-limited size. Handlers whose sockets exceed this queue would be
-disconnected from the daemon.
-
-One design intention of the multiplexer daemon is to allow a future
-kernel-based MCTP implementation without requiring major structural
-changes to handler applications. The socket-based interface facilitates
-this, as the unix-domain socket interface could be fairly easily swapped
-out with a new kernel-based socket type.
-
-MCTP is intended to be an optional component of OpenBMC. Platforms using
-OpenBMC are free to adopt it as they see fit.
-
-### Demultiplexer daemon interface
-
-MCTP handlers (ie, clients of the demultiplexer) connect using a
-unix-domain socket, at the abstract socket address:
-
-  \0mctp-demux
-
-The socket type used should be `SOCK_SEQPACKET`.
-
-Once connected, the client sends a single byte message, indicating what
-type of MCTP messages should be forwarded to the client. Types must be
-greater than zero.
-
-Subsequent messages sent over the socket are MCTP messages sent/received
-by the demultiplexer, that match the specified MCTP message type.
-Clients should use the send/recv syscalls to interact with the socket.
-
-Each message has a fixed small header:
-
-   `uint8_t eid`
-
-For messages coming from the demux daemon, this indicates the source EID
-of the outgoing MCTP message. For messages going to the demux daemon,
-this indicates the destination EID.
-
-The rest of the message data is the complete MCTP message, including
-MCTP message type field.
-
-The daemon does not provide a facility for clients to specify or
-retrieve values for the tag field in individual MCTP packets.
-
-
-## Alternatives Considered
-
-There have been two main alternatives to this approach:
-
-Continue using IPMI, but start making more use of OEM extensions to
-suit the requirements of new platforms. However, given that the IPMI
-standard is no longer under active development, we would likely end up
-with a large amount of platform-specific customisations. This also does
-not solve the hardware channel issues in a standard manner.
-
-Redfish between host and BMC. This would mean that host firmware needs a
-HTTP client, a TCP/IP stack, a JSON (de)serialiser, and support for
-Redfish schema. While this may be present in some environments (for
-example, UEFI-based firmware), this is may not be feasible for all host
-firmware implementations (for example, OpenPOWER). It's possible that we
-could run a simplified Redfish stack - indeed, MCTP has a proposal for a
-Redfish-over-MCTP channel (DSP0218), which uses simplified serialisation
-format and no requirement on HTTP. However, this may involve a large
-amount of complexity in host firmware.
-
-In terms of an MCTP daemon structure, an alternative is to have the
-MCTP implementation contained within a single process, using the libmctp
-API directly for passing messages from the core code to
-application-level handlers. The drawback of this approach is that this
-single process needs to implement all possible functionality that is
-available over MCTP, which may be quite a disjoint set. This would
-likely lead to unnecessary restrictions on the implementation of those
-application-level handlers (programming language, frameworks used, etc).
-Also, this single-process approach would likely need more significant
-modifications if/when MCTP protocol support is moved to the kernel.
-
-The interface between the demultiplexer daemon and clients is currently
-defined as a socket-based interface. However, an alternative here would
-be to pass MCTP messages over dbus instead. The reason for the choice of
-sockets rather than dbus is that the former allows a direct transition
-to a kernel-based socket API when suitable.
-
-## Impacts
-
-Development would be required to implement the MCTP transport, plus any
-new users of the MCTP messaging (eg, a PLDM implementation). These would
-somewhat duplicate the work we have in IPMI handlers.
-
-We'd want to keep IPMI running in parallel, so the "upgrade" path should
-be fairly straightforward.
-
-Design and development needs to involve potential host, management
-controllers and managed device implementations.
-
-## Testing
-
-For the core MCTP library, we are able to run tests there in complete
-isolation (I have already been able to run a prototype MCTP stack
-through the afl fuzzer) to ensure that the core transport protocol
-works.
-
-For MCTP hardware bindings, we would develop channel-specific tests that
-would be run in CI on both host and BMC.
-
-For the OpenBMC MCTP daemon implementation, testing models would depend
-on the structure we adopt in the design section.
diff --git a/designs/mctp-standards.svg b/designs/mctp/mctp-standards.svg
similarity index 100%
rename from designs/mctp-standards.svg
rename to designs/mctp/mctp-standards.svg
diff --git a/designs/mctp/mctp-userspace.md b/designs/mctp/mctp-userspace.md
new file mode 100644
index 0000000..3b051c5
--- /dev/null
+++ b/designs/mctp/mctp-userspace.md
@@ -0,0 +1,154 @@
+# OpenBMC platform communication channel: MCTP & PLDM in userspace
+
+Author: Jeremy Kerr <jk@ozlabs.org> <jk>
+
+Please refer to the [mctp.md](MCTP Overview) document for general
+MCTP design description, background and requirements.
+
+This document describes a userspace implementation of MCTP
+infrastructure, allowing a straightforward mechnism of supporting MCTP
+messaging within an OpenBMC system.
+
+## Proposed Design
+
+The MCTP core specification just provides the packetisation, routing and
+addressing mechanisms. The actual transmit/receive of those packets is
+up to the hardware binding of the MCTP transport.
+
+For OpenBMC, we would introduce a MCTP daemon, which implements the transport
+over a configurable hardware channel (eg., Serial UART, I2C or PCIe), and
+provides a socket-based interface for other processes to send and
+receive complete MCTP messages. This daemon is responsible for the
+packetisation and routing of MCTP messages from external endpoints, and
+handling the forwarding these messages to and from individual handler
+applications. This includes handling local MCTP-stack configuration,
+like local EID assignments.
+
+This daemon has a few components:
+
+ 1) the core MCTP stack
+
+ 2) one or more binding implementations (eg, MCTP-over-serial), which
+    interact with the hardware channel(s).
+
+ 3) an interface to handler applications over a unix-domain socket.
+
+The proposed implementation here is to produce an MCTP "library" which
+provides the packetisation and routing functions, between:
+
+ - an "upper" messaging transmit/receive interface, for tx/rx of a full
+   message to a specific endpoint (ie, (1) above)
+
+ - a "lower" hardware binding for transmit/receive of individual
+   packets, providing a method for the core to tx/rx each packet to
+   hardware, and defines the parameters of the common packetisation
+   code (ie. (2) above).
+
+The lower interface would be plugged in to one of a number of
+hardware-specific binding implementations. Most of these would be
+included in the library source tree, but others can be plugged-in too,
+perhaps where the physical layer implementation does not make sense to
+include in the platform-agnostic library.
+
+The reason for a library is to allow the same MCTP implementation to be
+used in both OpenBMC and host firmware; the library should be
+bidirectional. To allow this, the library would be written in portable C
+(structured in a way that can be compiled as "extern C" in C++
+codebases), and be able to be configured to suit those runtime
+environments (for example, POSIX IO may not be available on all
+platforms; we should be able to compile the library to suit). The
+licence for the library should also allow this re-use; a dual Apache &
+GPLv2+ licence may be best.
+
+These "lower" binding implementations may have very different methods of
+transferring packets to the physical layer. For example, a serial
+binding implementation for running on a Linux environment may be
+implemented through read()/write() syscalls to a PTY device. An I2C
+binding for use in low-level host firmware environments may interact
+directly with hardware registers to perform packet transfers.
+
+The application-specific handlers implement the actual functionality
+provided over the MCTP channel, and connect to the central daemon over a
+UNIX domain socket. Each of these would register with the MCTP daemon to
+receive MCTP messages of a certain type, and would transmit MCTP
+messages of that same type.
+
+The daemon's sockets to these handlers is configured for non-blocking
+IO, to allow the daemon to be decoupled from any blocking behaviour of
+handlers. The daemon would use a message queue to enable message
+reception/transmission to a blocked daemon, but this would be of a
+limited size. Handlers whose sockets exceed this queue would be
+disconnected from the daemon.
+
+One design intention of the multiplexer daemon is to allow a future
+kernel-based MCTP implementation without requiring major structural
+changes to handler applications. The socket-based interface facilitates
+this, as the unix-domain socket interface could be fairly easily swapped
+out with a new kernel-based socket type.
+
+MCTP is intended to be an optional component of OpenBMC. Platforms using
+OpenBMC are free to adopt it as they see fit.
+
+### Demultiplexer daemon interface
+
+MCTP handlers (ie, clients of the demultiplexer) connect using a
+unix-domain socket, at the abstract socket address:
+
+  \0mctp-demux
+
+The socket type used should be `SOCK_SEQPACKET`.
+
+Once connected, the client sends a single byte message, indicating what
+type of MCTP messages should be forwarded to the client. Types must be
+greater than zero.
+
+Subsequent messages sent over the socket are MCTP messages sent/received
+by the demultiplexer, that match the specified MCTP message type.
+Clients should use the send/recv syscalls to interact with the socket.
+
+Each message has a fixed small header:
+
+   `uint8_t eid`
+
+For messages coming from the demux daemon, this indicates the source EID
+of the outgoing MCTP message. For messages going to the demux daemon,
+this indicates the destination EID.
+
+The rest of the message data is the complete MCTP message, including
+MCTP message type field.
+
+The daemon does not provide a facility for clients to specify or
+retrieve values for the tag field in individual MCTP packets.
+
+
+## Alternatives Considered
+
+In terms of an MCTP daemon structure, an alternative is to have the
+MCTP implementation contained within a single process, using the libmctp
+API directly for passing messages from the core code to
+application-level handlers. The drawback of this approach is that this
+single process needs to implement all possible functionality that is
+available over MCTP, which may be quite a disjoint set. This would
+likely lead to unnecessary restrictions on the implementation of those
+application-level handlers (programming language, frameworks used, etc).
+Also, this single-process approach would likely need more significant
+modifications if/when MCTP protocol support is moved to the kernel.
+
+The interface between the demultiplexer daemon and clients is currently
+defined as a socket-based interface. However, an alternative here would
+be to pass MCTP messages over dbus instead. The reason for the choice of
+sockets rather than dbus is that the former allows a direct transition
+to a kernel-based socket API when suitable.
+
+## Testing
+
+For the core MCTP library, we are able to run tests there in complete
+isolation (I have already been able to run a prototype MCTP stack
+through the afl fuzzer) to ensure that the core transport protocol
+works.
+
+For MCTP hardware bindings, we would develop channel-specific tests that
+would be run in CI on both host and BMC.
+
+For the OpenBMC MCTP daemon implementation, testing models would depend
+on the structure we adopt in the design section.
diff --git a/designs/mctp/mctp.md b/designs/mctp/mctp.md
new file mode 100644
index 0000000..d958e3f
--- /dev/null
+++ b/designs/mctp/mctp.md
@@ -0,0 +1,132 @@
+# OpenBMC platform communication channel: MCTP & PLDM
+
+Author: Jeremy Kerr <jk@ozlabs.org> <jk>
+
+## Problem Description
+
+Currently, we have a few different methods of communication between host
+and BMC. This is primarily IPMI-based, but also includes a few
+hardware-specific side-channels, like hiomap. On OpenPOWER hardware at
+least, we've definitely started to hit some of the limitations of IPMI
+(for example, we have need for >255 sensors), as well as the hardware
+channels that IPMI typically uses.
+
+This design aims to use the Management Component Transport Protocol
+(MCTP) to provide a common transport layer over the multiple channels
+that OpenBMC platforms provide. Then, on top of MCTP, we have the
+opportunity to move to newer host/BMC messaging protocols to overcome
+some of the limitations we've encountered with IPMI.
+
+## Background and References
+
+Separating the "transport" and "messaging protocol" parts of the current
+stack allows us to design these parts separately. Currently, IPMI
+defines both of these; we currently have BT and KCS (both defined as
+part of the IPMI 2.0 standard) as the transports, and IPMI itself as the
+messaging protocol.
+
+Some efforts of improving the hardware transport mechanism of IPMI have
+been attempted, but not in a cross-implementation manner so far. This
+does not address some of the limitations of the IPMI data model.
+
+MCTP defines a standard transport protocol, plus a number of separate
+physical layer bindings for the actual transport of MCTP packets. These
+are defined by the DMTF's Platform Management Working group; standards
+are available at:
+
+  https://www.dmtf.org/standards/pmci
+
+The following diagram shows how these standards map to the areas of
+functionality that we may want to implement for OpenBMC. The DSP numbers
+provided are references to DMTF standard documents.
+
+![](mctp-standards.svg)
+
+One of the key concepts here is that separation of transport protocol
+from the physical layer bindings; this means that an MCTP "stack" may be
+using either a I2C, PCI, Serial or custom hardware channel, without the
+higher layers of that stack needing to be aware of the hardware
+implementation.  These higher levels only need to be aware that they are
+communicating with a certain entity, defined by an Entity ID (MCTP EID).
+These entities may be any element of the platform that communicates
+over MCTP - for example, the host device, the BMC, or any other
+system peripheral - static or hot-pluggable.
+
+This document is focused on the "transport" part of the platform design.
+While this does enable new messaging protocols (mainly PLDM), those
+components are not covered in detail much; we will propose those parts
+in separate design efforts. For example, the PLDM design at
+[pldm-stack.md].
+
+As part of the design, the references to MCTP "messages" and "packets"
+are intentional, to match the definitions in the MCTP standard. MCTP
+messages are the higher-level data transferred between MCTP endpoints,
+which packets are typically smaller, and are what is sent over the
+hardware. Messages that are larger than the hardware Maximum Transmit
+Unit (MTU) are split into individual packets by the transmit
+implementation, and reassembled at the receive implementation.
+
+## Requirements
+
+Any channel between host and BMC should:
+
+ - Have a simple serialisation and deserialisation format, to enable
+   implementations in host firmware, which have widely varying runtime
+   capabilities
+
+ - Allow different hardware channels, as we have a wide variety of
+   target platforms for OpenBMC
+
+ - Be usable over simple hardware implementations, but have a facility
+   for higher bandwidth messaging on platforms that require it.
+
+ - Ideally, integrate with newer messaging protocols
+
+## Proposed Designs
+
+The MCTP infrastrcuture in OpenBMC is implemented in two approaches:
+
+ - A userspace-based approach, using a core library, plus a
+   demultiplexing daemon. This is the current implementation, and is
+   described in [MCTP Userspace](mctp-userspace.md).
+
+ - A kernel-based approach, using a sockets API for client and server
+   applications. This approach is in a design stage.
+
+Design details for both approaches are covered in their relevant
+documents, but both share the same Problem Description, Background and
+Requirements, Alternatives and Impacts sections as defined by this
+document.
+
+## Alternatives Considered
+
+There have been two main alternatives to an MCTP implementation in
+OpenBMC:
+
+Continue using IPMI, but start making more use of OEM extensions to
+suit the requirements of new platforms. However, given that the IPMI
+standard is no longer under active development, we would likely end up
+with a large amount of platform-specific customisations. This also does
+not solve the hardware channel issues in a standard manner.
+
+Redfish between host and BMC. This would mean that host firmware needs a
+HTTP client, a TCP/IP stack, a JSON (de)serialiser, and support for
+Redfish schema. While this may be present in some environments (for
+example, UEFI-based firmware), this is may not be feasible for all host
+firmware implementations (for example, OpenPOWER). It's possible that we
+could run a simplified Redfish stack - indeed, MCTP has a proposal for a
+Redfish-over-MCTP channel (DSP0218), which uses simplified serialisation
+format and no requirement on HTTP. However, this may involve a large
+amount of complexity in host firmware.
+
+## Impacts
+
+Development would be required to implement the MCTP transport, plus any
+new users of the MCTP messaging (eg, a PLDM implementation). These would
+somewhat duplicate the work we have in IPMI handlers.
+
+We'd want to keep IPMI running in parallel, so the "upgrade" path should
+be fairly straightforward.
+
+Design and development needs to involve potential host, management
+controllers and managed device implementations.