Design Doc for io_uring

Added design doc detailing the usage of new io_uring library in
phosphor-hwmon repo.

Signed-off-by: Jerry Zhu <jzhu379@gmail.com>
Change-Id: Ib401c8ec067ccb47aaae6fe248fce0a02a6aac12
diff --git a/designs/phosphor-hwmon-io-uring.md b/designs/phosphor-hwmon-io-uring.md
new file mode 100644
index 0000000..3652294
--- /dev/null
+++ b/designs/phosphor-hwmon-io-uring.md
@@ -0,0 +1,172 @@
+# Using io_uring in BMCs for asynchronous sensor reads
+
+Author: Jerry Zhu ([jerryzhu@google.com](mailto:jerryzhu@google.com))
+
+Primary assignee: Brandon Kim
+([brandonkim@google.com](mailto:brandonkim@google.com), brandonk)
+
+Other contributors: William A. Kennington III
+([wak@google.com](mailto:wak@google.com), wak)
+
+Created: June 9, 2021
+
+## Problem Description
+
+Currently, OpenBMC has code that performs I2C reads for sensors that may take
+longer than desired. These IO operations are currently synchronous, and
+therefore may block other functions such as IPMI. This project will involve
+going through OpenBMC repositories (specifically
+[phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)) that may have this
+drawback currently, and adding an asynchronous interface using the new io_uring
+library.
+
+## Background and References
+
+io_uring is a new asynchronous framework for Linux I/O interface (added to 5.1
+Linux kernel, 5.10 is preferred). It is an upgrade from the previous
+asynchronous IO called AIO, which had its limitations in context of its usage
+in sensor reads for OpenBMC.
+
+[brandonkim@google.com](mailto:brandonkim@google.com) has previously created a
+method for preventing sensors from blocking all other sensor reads and D-Bus if
+they do not report failures quickly enough in the phosphor-hwmon repository
+([link to change](https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-hwmon/+/24337)).
+Internal Google BMC efforts have also focused on introducing the io_uring
+library to its code.
+
+## Requirements
+
+By using io_uring, the asynchronous sensor reads will need to maintain the same
+accuracy as the current, synchronous reads in each of the daemons. Potential
+OpenBMC repositories that will benefit from this library include:
+
+*   [phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)
+*   [phosphor-nvme](https://github.com/openbmc/phosphor-nvme)
+*   [dbus-sensors](https://github.com/openbmc/dbus-sensors)
+*   any other appropriate repository
+
+The focus of this project is to add asynchronous sensor reads to the
+phosphor-hwmon repository, which is easier to implement than adding
+asynchronous sensor reads into dbus-sensors.
+
+Users will need the ability to choose whether they want to utilize this new
+asynchronous method of reading sensors, or remain with the traditional,
+synchronous method. In addition, the performance improvement from using the new
+io_uring library will need to be calculated for each daemon.
+
+## Proposed Design
+
+In the phosphor-hwmon repository, the primary files that will require
+modification are
+[sensor.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.hpp)
+and
+[mainloop.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.hpp),
+as well the addition of a caching layer for the results from the sensor reads.
+
+In mainloop.cpp currently, the `read()` function, which reads hwmon sysfs
+entries, iterates through all sensors and calls `_ioAccess->read(...)` for each
+one; this operation is potentially blocking.
+
+The refactor will maintain this loop over all sensors, but instead make the
+read operation non-blocking by using an io_uring wrapper. A caching layer will
+be used to store the read results, which will be the main access point for
+obtaining sensor reads in mainloop.cpp.
+
+```
+               Interface Layer
++--------------------------------------------+
+|                                            |
+|   +------------+         +-------------+   |
+|   |            |         |             |   |
+|   |   Redfish  |         |     IPMI    |   |
+|   |            |         |             |   |
+|   +-----+------+         +-------+-----+   |
+|         ^                        ^         |
++---------|------------------------|---------+
+          |                        |
+          v                        v
++---------+------------------------+---------+
+|                                            |
+|                  DBus                      |
+|                                            |
++---------^------------------------^---------+
+          |                        |
+  +-------v-------+       +--------v-------+
+  |               |       |                |
+  |phosphor-hwmon |       |  dbus-sensors  |
+  |               |       |                |
+  +-------^-------+       +--------^-------+
+          | <--------------------- | <------- caching layer at this level
+ +--------v------------------------v--------+
+ |                                          |
+ |               Linux kernel               |
+ |                                          |
+ +----------^---------------------^---------+
+            |                     |
+       +----v-----+         +-----v----+
+       |          |         |          |
+       |i2c sensor|         |i2c sensor|
+       |          |         |          |
+       +----------+         +----------+
+
+```
+
+Using a flag variable (most likely to be placed in the .conf files of each hwmon
+sensor), users will be able to determine whether or not to utilize this new
+io_uring implementation for compatibility reasons.
+
+## Detailed Design
+
+The read cache is implemented using an `unordered_map` of {sensor hwmon path:
+read result}. The read result is a struct that keeps track of any necessary
+information for processing the read values and handling errors. Such
+information includes open file descriptor from the `open()` system call,
+number of retries remaining for reading this sensor when errors occur, etc.
+
+Each call to access the read value of a particaular sesnor in the read cache
+will not only return the cached value but will also submit a SQE (submission
+queue event) to io_uring for that sensor; this SQE acts as a read request
+that will be sent to the kernel. The implementation maintains a set of sensors
+that keeps track of any pre-existing submissions so that multiple SQEs for the
+same sensor do not get submitted and overlap; the set entries will be cleared
+upon successful return of the read result using the CQE (completion queue
+event). The CQE will then be processed, and its information will update the
+cache map.
+
+The asynchronous nature of this implementation comes from sending all possible
+SQE requests, a non-blocking operation, at once instead of being blocked by
+slow sensor reads in the synchronous implementation. The kernel will process
+these requests, and before the next iteration of sensor reads the cache will
+attempt to process any returned CQEs, a non-blocking operation as well.
+
+Simply put, an access to some "Sensor A" in the read cache will create an
+underlying read request that makes a best effort to update the value of
+"Sensor A" before the next time the sensor read loop (currently 1 s by default)
+gets the value of "Sensor A" through the cache.
+
+## Alternatives Considered
+
+Linux does have a native asynchronous IO interface, simply dubbed AIO; however,
+there are a number of limitations. The biggest limitation of AIO is that it
+only supports true asynchronous IO for un-buffered reads. Furthermore, there
+are a number of ways that the IO submission can end up blocking - for example,
+if metadata is required to perform IO. Additionally, the memory costs of AIO
+are more expensive than those of io_uring.
+
+For these primary reasons, the native AIO library will not be considered for
+this implementation of asynchronous reads.
+
+## Impacts
+
+This project would impact all OpenBMC developers of openbmc/phosphor-hwmon
+initially. It has improved the latency performance of phosphor-hwmon;
+throughput has also been shown to increase (note that throughput profiling
+was more arbitrary than latency profiling). These performance changes will
+have to be calculated in further detail across different machines.
+
+There will be no security impact.
+
+## Testing
+
+The change will utilize the gTest framework to ensure the original functionality
+of the code in the repository modified stays the same.