commit | dac2dfc3a64063b062fabd9c4d6e2857c1185eaf | [log] [tgz] |
---|---|---|
author | Vu Pham <vuhuong@nvidia.com> | Tue Nov 05 12:20:42 2024 -0600 |
committer | Vu Pham <vuhuong@nvidia.com> | Thu Jan 09 22:01:37 2025 +0000 |
tree | 36f00f66ea770071c30759525b0688d607ce53f6 | |
parent | 3f1d7b3fae5bb0ed6388bea703fc065b6fa291fb [diff] |
Detect how many address bytes needed for a given EEPROM memory Introduce different modes to detect how many address byte(s) needed for a given EEPROM device. MODE_1: ------- The existing upstream function isDevice16Bit() bases on sending 1-byte write operation (with a STOP condition) and 8 subsequent 1-byte read operations with SINGLE byte address. 1. This MODE_1 expects the following logic: - if the device requires 1 address byte, it EXPECTS that the data will be read from a single location so 8 bytes read will be the same. - if the device requires 2 address bytes, it EXPECTS that the data will be read from 8 DIFFERENT LOCATIONS and at least one byte read is different than 7 other reads. 2. Issue and potential issue with this MODE_1 - If any "2 address bytes" EEPROM from any vendor has the same data in all memory locations (0-7) the existing upstream function read, this device will be identified as "1 address byte" device. - ONSEMI EEPROM (a 2 address bytes device) return the same data from the same single byte address read --> therefore, existing function wrongly identifies it as 1 byte address device. MODE_2: ------- The proposal MODE_2 changes to isDevice16Bit() sends 8 instructions of 2-bytes write operation (WITHOUT a STOP condition ie. prohibited STOP) followed by a 1-byte read operation. The proposed solution fully complies with IIC standard and should be applicable to any IIC EEPROM manufacturer. | Start | SlaveAddr + W | 0x00 | 0x00 | STOP PROHIBITED HERE | Start | SlaveAddr + R | data byte | Stop | |-------|---------------|------|------|----------------------|-------| --------------|-----------|------| | Start | SlaveAddr + W | 0x00 | 0x01 | STOP PROHIBITED HERE | Start | SlaveAddr + R | data byte | Stop | | Start | SlaveAddr + W | 0x00 | 0x02 | STOP PROHIBITED HERE | Start | SlaveAddr + R | data byte | Stop | | Start | SlaveAddr + W | 0x00 | 0x03 | STOP PROHIBITED HERE | Start | SlaveAddr + R | data byte | Stop | | Start | SlaveAddr + W | 0x00 | 0x04 | STOP PROHIBITED HERE | Start | SlaveAddr + R | data byte | Stop | | Start | SlaveAddr + W | 0x00 | 0x05 | STOP PROHIBITED HERE | Start | SlaveAddr + R | data byte | Stop | | Start | SlaveAddr + W | 0x00 | 0x06 | STOP PROHIBITED HERE | Start | SlaveAddr + R | data byte | Stop | | Start | SlaveAddr + W | 0x00 | 0x07 | STOP PROHIBITED HERE | Start | SlaveAddr + R | data byte | Stop | 1. If the device requires a single data byte, then it will always load address 0x00, so the subsequent read byte will be the same for all 8 instructions. The second byte on the write would be interpreted as data byte, thus not modifying the address pointer. 2. If two address bytes are required, then the device will interpret both bytes as addresses, thus reading from different addresses every time, similar with what the existing function is using now. Notes & reasons: ----------------- There is no STOP condition after the second (potential) address byte. A START condition must be sent after the second byte. If STOP condition is sent, then the 1-byte address devices will start internal write cycle, altering the EEPROM content which is not good. This proposal MODE_2 suffers the same 1st issue (#2a) as before (what if the EEPROM has the same data at all those addresses). However, this proposal MODE_2 addresses the 2nd issue (#2b) where existing MODE_1 upstream function EXPECTS that the data will be read from 8 DIFFERENT LOCATIONS if the device requires 2 address bytes. This expectation is the ambiguity (not standard defined) in the IIC spec (https://www.nxp.com/docs/en/user-guide/UM10204.pdf) 1. Section 3.1.10, Note 2 -> "All decisions on auto-increment or decrement of previously accessed memory locations, etc., are taken by the designer of the device." Based on this note, the designer of every EEPROM has the "freedom" to use whatever architecture considers appropriate and suitable to process everyone of the two address bytes. There are no restrictions on this. Based on this, the others EEPROM (not ONSEMI EEPROM) auto-increment - observed with one address byte sent instead of two - is a manufacturer-specific behavior, and not standard defined. 2. Section 3.1.10, Note 1 -> "Combined formats can be used, for example, to control a serial memory. The internal memory location must be written during the first data byte. After the START condition and slave address is repeated, data can be transferred." This proposal MODE_2 implements this note. The memory location referred herein is the address pointer, as being the first data byte in I2C communication. Based on this note, EEPROM must update this pointer immediately following this first address byte. Tested: -------- 1. With ONSEMI I2C eeprom memory on Nvidia Bluefield-3 HCA a. Without this patch or with this patch MODE_1 root@dpu-bmc:~# ipmitool fru FRU Device Description : Builtin FRU Device (ID 0) Device not present (Requested sensor, data, or record not found) FRU Device Description : Nvidia-BMCMezz (ID 169) Board Mfg Date : Thu May 11 13:00:00 2023 UTC Board Mfg : Nvidia Board Product : Nvidia-BMCMezz Board Serial : MT2319XZ04K6 Board Part Number : 900-9D3B6-00CV-AA0 b. With this patch MODE_2 root@dpu-bmc:~# ipmitool fru FRU Device Description : Builtin FRU Device (ID 0) Chassis Type : Main Server Chassis Chassis Part Number : 900-9D3B6-00CV-AA0 Chassis Serial : MT2319XZ04K6 Chassis Extra : N/A Chassis Extra : N/A Chassis Extra : N/A Chassis Area Checksum : OK Board Mfg Date : Thu May 11 13:00:00 2023 UTC Board Mfg : N/A Board Product : N/A Board Serial : MT2319XZ04K6 Board Part Number : 900-9D3B6-00CV-AA0 Board Extra : N/A Board Area Checksum : OK Product Manufacturer : N/A Product Name : N/A Product Part Number : 900-9D3B6-00CV-AA0 Product Serial : MT2319XZ04K6 Product Asset Tag : N/A Product Extra : N/A Product Area Checksum : OK FRU Device Description : Nvidia-BMCMezz (ID 169) Board Mfg Date : Thu May 11 13:00:00 2023 UTC Board Mfg : Nvidia Board Product : Nvidia-BMCMezz Board Serial : MT2319XZ04K6 Board Part Number : 900-9D3B6-00CV-AA0 Board Area Checksum : OK 2. With other I2C eeprom memory on Nvidia Bluefield-3 HCA, and without this patch or with this patch on both MODE_1/MODE_2 root@dpu-bmc:~# ipmitool fru FRU Device Description : Builtin FRU Device (ID 0) Chassis Type : Main Server Chassis Chassis Part Number : 900-9D3B4-00EN-EAA Chassis Serial : MT2315XZ0599 Chassis Extra : N/A Chassis Extra : N/A Chassis Extra : N/A Chassis Area Checksum : OK Board Mfg Date : Tue Apr 18 10:25:00 2023 UTC Board Mfg : N/A Board Product : N/A Board Serial : MT2315XZ0599 Board Part Number : 900-9D3B4-00EN-EAA Board Extra : N/A Board Area Checksum : OK Product Manufacturer : N/A Product Name : N/A Product Part Number : 900-9D3B4-00EN-EAA Product Version : N/A Product Serial : MT2315XZ0599 Product Asset Tag : N/A Product Extra : N/A Product Area Checksum : OK FRU Device Description : Nvidia-BMCMezz (ID 169) Board Mfg Date : Tue Apr 18 10:25:00 2023 UTC Board Mfg : Nvidia Board Product : Nvidia-BMCMezz Board Serial : MT2315XZ0599 Board Part Number : 900-9D3B4-00EN-EAA Board Area Checksum : OK Change-Id: I296c22334c919f4248fb3a7f19e384ce802cba17 Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Entity manager is a design for managing physical system components, and mapping them to software resources within the BMC. Said resources are designed to allow the flexible adjustment of the system at runtime, as well as the reduction in the number of independent system configurations one needs to create.
A server component that is physically separate, detectable through some means, and can be added or removed from a given OpenBMC system. Said component can, and likely does contain multiple sub-components, but the component itself as a whole is referred to as an entity.
Note, this term is needed because most other terms that could've been used (Component, Field Replaceable Unit, or Assembly) are already overloaded in the industry, and have a distinct definition already, which is a subset of what an entity encompasses.
A particular feature of an Entity. An Entity generally will have multiple Exposes records for the various features that component supports. Some examples of features include, LM75 sensors, PID control parameters, or CPU information.
A set of rules for detecting a given entity. Said rules generally take the form of a D-Bus interface definition.
Entity manager has the following goals (unless you can think of better ones):
A full BMC setup using Entity Manager consists of a few parts:
A detection daemon This is something that can be used to detect components at runtime. The most common of these, fru-device, is included in the Entity-Manager repo, and scans all available I2C buses for IPMI FRU EEPROM devices. Other examples of detection daemons include: peci-pcie: A daemon that utilizes the CPU bus to read in a list of PCIe devices from the processor. smbios-mdr: A daemon that utilizes the x86 SMBIOS table specification to detect the available systems dependencies from BIOS.
In many cases, the existing detection daemons are sufficient for a single system, but in cases where there is a superseding inventory control system in place (such as in a large datacenter) they can be replaced with application specific daemons that speak the protocol information of their controller, and expose the inventory information, such that failing devices can be detected more readily, and system configurations can be "verified" rather than detected.
An entity manager configuration file Entity manager configuration files are located in the ./configurations directory in the entity manager repository, and include one file per device supported. Entities are detected based on the "Probe" key in the json file. The intention is that this folder contains all hardware configurations that OpenBMC supports, to allows an easy answer to "Is X device supported". An EM configuration contains a number of Exposes records that specify the specific features that this Entity supports. Once a component is detected, entity manager will publish these Exposes records to D-Bus.
A reactor The reactors are things that take the entity manager configurations, and use them to execute and enable the features that they describe. One example of this is dbus-sensors, which contains a suite of applications that input the Exposes records for sensor devices, then connect to the filesystem to create the sensors and scan loops to scan sensors for those devices. Other examples of reactors could include: CPU management daemons and Hot swap backplane management daemons, or drive daemons.
note: In some cases, a given daemon could be both a detection daemon and a reactor when architectures are multi-tiered. An example of this might include a hot swap backplane daemon, which both reacts to the hot swap being detected, and also creates detection records of what drives are present.
Entity Manager will automatically create associations between its entities in certain cases. For details see here.
Entity manager shall support the dynamic discovery of hardware at runtime, using inventory interfaces. The types of devices include, but are not limited to hard drives, hot swap backplanes, baseboards, power supplies, CPUs, and PCIe Add-in-cards.
Entity manager shall support the ability to add or remove support for particular devices in a given binary image. By default, entity manager will support all available and known working devices for all platforms.
Entity manager shall provide data to D-Bus about a particular device such that other daemons can create instances of the features being exposed.
Entity manager shall support multiple detection runs, and shall do the minimal number of changes necessary when new components are detected or no longer detected. Some examples of re-detection events might include host power on, drive plug/unplug, PSU plug/unplug.
Entity manager shall have exactly one configuration file per supported device model. In some cases this will cause duplicated information between files, but the ability to list and see all supported device models in a single place, as well as maintenance when devices do differ in the future is determined to be more important than duplication of configuration files.
bmcweb A webserver implementation that uses the inventory information from entity-manager to produce a Redfish compliant REST API. intel-ipmi-oem An implementation of the IPMI SDR, FRU, and Storage commands that utilize Entity Manager as the source of information.