commit | b300575e74c7dea5d29a5815b0f2f2a0748f125e | [log] [tgz] |
---|---|---|
author | Josh Lehan <krellan@google.com> | Tue Feb 22 20:48:07 2022 -0800 |
committer | Josh Lehan <krellan@google.com> | Thu Nov 10 18:38:58 2022 -0800 |
tree | b469d5318638a56d2e489dd9834edf95646af14f | |
parent | b1225b26d4bd90728265384e1b6f504092cfb13c [diff] |
pid/zone: Adding unscaled to cache and logging The "ReadReturn" structure, and the cache within DbusPidZone, have been widened, to hold both the scaled and the original unscaled values at the same time. This allows logging to show both at once, and also clears up confusion/bugs resulting from storing one or the other and losing track of which was which. Compatibility setValue() and getCachedValue() functions still retained, so this will not break other sensors. These functions still only take a single argument/return, which will be used for both value and unscaled, indicating scaling is unknown or irrelevant to this sensor. Also, the PWM output of the PID loop appears in the log file, conveniently right alongside the RPM input of the PID loop. An output cache has been added to the zone interface, and, unlike the input cache, use of it is optional. It is only to help populate the logging, so subclasses are free to ignore it if they want. Tested: In the logging files, I can see both PWM and RPM, and they are consistent, showing how the PID loop is trying to update the PWM to target the desired RPM. Example: Here's /tmp/zone_0.log on my system epoch_ms,setpt,fan0_tach,fan0_tach_raw,fan0_tach_pwm,fan0_tach_pwm_raw,bmcmargin_zone0,bmcmargin_zone0_raw,thermal_zone0,thermal_zone0_raw,failsafe 3097918,3818.42,0.748267,11224,0,0,0.724753,56.812,0.745098,62,0 3098022,3818.42,0.748267,11224,0.266666,67,0.724753,56.812,0.745098,62,0 3098132,3818.42,0.748267,11224,0.266666,67,0.724753,56.812,0.745098,62,0 Here's what we can now learn: The desired setpoint is 3818 RPM. The fan is at 74.8% of scale, which is 11224 RPM. The written PWM, after the first PID loop pass, is a raw value of 67, which is 26.6% of scale. The first margin temperature is 56.8 degrees of margin, which is 72.4% of scale. The second margin temperature is 62 degrees of margin, which is 74.5% of scale. This zone is not in failsafe mode. As you can see, this will be rather useful for PID loop tuning. Signed-off-by: Josh Lehan <krellan@google.com> Change-Id: I972a4e4a3b787255f0dcafa10d4498ee58b682f0
This is a daemon running within the OpenBMC environment. It uses a well-defined configuration file to control the temperature of the tray components to keep them within operating conditions. It may require coordination with host-side tooling and OpenBMC.
The BMC will run a daemon that controls the fans by pre-defined zones. The application will use thermal control, such that each defined zone is kept within a range and adjusted based on thermal information provided from locally readable sensors as well as host-provided information over an IPMI OEM command.
A system (or tray) will be broken out into one or more zones, specified via configuration files or dbus. Each zone will contain at least one fan and at least one temperature sensor and some device margins. The sensor data can be provided via sysfs, dbus, or through IPMI. In either case, default margins should be provided in case of failure or other unknown situation.
The system will run a control loop for each zone with the attempt to maintain the temperature within that zone within the margin for the devices specified.
How to configure phosphor-pid-control
The software will run as a multi-threaded daemon that runs a control loop for each zone, and has a master thread which listens for dbus messages. Each zone will require at least one fan that it exclusively controls, however, zones can share temperature sensors.
In this figure the communications channels between swampd and ipmid and phosphor-hwmon are laid out.
A configuration file will need to exist for each board.
Each zone must have at least one fan that it exclusively controls. Each zone must have at least one temperature sensor, but they may be shared.
The internal thermometers specified can be read via sysfs or dbus.
Due to data center requirements, the delta between the outgoing air temperature and the environmental air temperature must be no greater than 15C.
Tools needs to update the thermal controller with information not necessarily available to the BMC. This will comprise of a list of temperature (or margin?) sensors that are updated by the set sensor command. Because they don't represent real sensors in the system, the set sensor handler can simply broadcast the update as a properties update on dbus when it receives the command over IPMI.
A tool can override a specific fan's PWM when we implement the set sensor IPMI command pathway.
A tool can read fan_tach through the normal IPMI interface presently exported for sensors.
The plan is to listen for fan_tach updates for each fan in a background thread. This will receive an update from phosphor-hwmon each time it updates any sensor it cares about.
By default phosphor-hwmon reads each sensor in turn and then sleeps for 1 second. We'll be updating phosphor-hwmon to sleep for a shorter period -- how short though is still TBD. We'll also be updating phosphor-hwmon to support pwm as a target.
Each zone will require a control loop that monitors the associated thermals and controls the fan(s). The EC PID loop is designed to hit the fans 10 times per second to drive them to the desired value and read the sensors once per second. We'll be receiving sensor updates with such regularly, however, at present it takes ~0.13s to read all 8 fans. Which can't be read constantly without bringing the system to its knees -- in that all CPU cycles would be spent reading the fans. TBD on how frequently we'll be reading the fan sensors and the impact this will have.
The main thread will manage the other threads, and process the initial configuration files. It will also register a dbus handler for the OEM message.
By default, swampd won't log information. To enable logging pass "-l" on the command line with a parameter that is the folder into which to write the logs.
The log files will be named {folderpath}/zone_{zoneid}.log
.
To enable tuning, pass "-t" on the command line.
See Logging & Tuning for more information.
The code is broken out into modules as follows:
dbus
- Any read or write interface that uses dbus primarily.experiments
- Small execution paths that allow for fan examination including how quickly fans respond to changes.ipmi
- Manual control for any zone is handled by receiving an IPMI message. This holds the ipmid provider for receiving those messages and sending them onto swampd.notimpl
- These are read-only and write-only interface implementations that can be dropped into a pluggable sensor to make it complete.pid
- This contains all the PID associated code, including the zone definition, controller definition, and the PID computational code.scripts
- This contains the scripts that convert YAML into C++.sensors
- This contains a couple of sensor types including the pluggable sensor's definition. It also holds the sensor manager.sysfs
- This contains code that reads from or writes to sysfs.threads
- Most of swampd's threads run in this method where there's just a dbus bus that we manage.A single zone system where multiple margin thermal sensors are fed into one PID that generates the output RPM for a set of fans controlled by one PID. margin sensors as input to thermal pid fleeting0+---->+-------+ +-------+ Thermal PID sampled | min()+--->+ PID | slower rate. fleeting1+---->+-------+ +---+---+ | | | RPM setpoint Current RPM v +--+-----+ The Fan PID fan0+---> | New PWM +-->fan0 samples at a | | | faster rate fan1+---> PID +---------->--->fan1 speeding up the | | | fans. fan2+---> | +-->fan2 ^ +--------+ + | | +-------------------------------+ RPM updated by PWM.