commit | e6866cac1855ace301003166f997177706de9acf | [log] [tgz] |
---|---|---|
author | Andrew Jeffery <andrew@aj.id.au> | Tue Sep 14 14:18:19 2021 +0930 |
committer | Andrew Geissler <geissonator@yahoo.com> | Tue Sep 14 12:07:12 2021 +0000 |
tree | 6b59d98406c03a78d661e8b17f2eb867ebaa56c6 | |
parent | a63d7f98ea1bd2aa47ea056751a63f5e56c128e0 [diff] |
Revert "Override pldm response time out value" This reverts commit bcc5f6b0f24e8ad0b03b8217e88a19ff3002c084. bcc5f6b0f24e ("Override pldm response time out value") talks about timeouts due to the endpoint taking some time to respond. However, the net effect of the change is the response to a retried request races against the instance ID expiration interval because the retry interval is effectively equal to the instance ID expiration interval once we account for some timer slack. This is demonstrated by the following strace on pldmd, where we can see a retried request go out, followed by the report that the request failed, further followed by the response to the request coming in. Note the values are string-literal-escaped-octal, so the [ 0x80 0x00 0x03 ... ] byte encoding of the GetPLDMVersions request appears as "\200\0\3...": ``` ... 11:56:25.046173 socket(AF_UNIX, SOCK_SEQPACKET, 0) = 3 ... 11:56:25.183936 connect(3, {sa_family=AF_UNIX, sun_path=@"mctp-mux"}, 11) = 0 11:56:25.190994 write(3, "\1", 1) = 1 ... 11:56:25.195272 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\t\1", iov_len=2}, {iov_base="\200\0\3\0\0\0\0\1\0", iov_len=9}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 11 ... 11:56:30.202298 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\t\1", iov_len=2}, {iov_base="\200\0\3\0\0\0\0\1\0", iov_len=9}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 11 11:56:30.202820 gettid() = 1918 11:56:30.203029 timerfd_settime64(6, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=3848, tv_nsec=61124978641398328}}, NULL) = 0 11:56:30.203286 epoll_wait(4, [{EPOLLIN, {u32=14373240, u64=14373240}}], 14, 0) = 1 11:56:30.203509 clock_gettime64(CLOCK_REALTIME, {tv_sec=1629806190, tv_nsec=203587376}) = 0 11:56:30.203687 clock_gettime64(CLOCK_MONOTONIC, {tv_sec=3843, tv_nsec=523046301}) = 0 11:56:30.203844 clock_gettime64(CLOCK_BOOTTIME, {tv_sec=3843, tv_nsec=523206110}) = 0 11:56:30.204049 write(2, "Response not received for the re"..., 59) = 59 11:56:30.204427 write(2, " EID = ", 7) = 7 11:56:30.204745 write(2, "9", 1) = 1 11:56:30.205047 write(2, " INSTANCE_ID = ", 15) = 15 11:56:30.205389 write(2, "0", 1) = 1 11:56:30.205719 write(2, " TYPE = ", 8) = 8 11:56:30.205997 write(2, "0", 1) = 1 11:56:30.206266 write(2, " COMMAND = ", 11) = 11 11:56:30.206576 write(2, "3", 1) = 1 11:56:30.206893 write(2, "\n", 1) = 1 11:56:30.209402 write(2, "Failed to receive response for ", 31) = 31 11:56:30.209814 write(2, "getPLDMVersion command, Host see"..., 46) = 46 11:56:30.210969 gettid() = 1918 11:56:30.211171 timerfd_settime64(6, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=8549172174085160960}, it_value={tv_sec=0, tv_nsec=8566510441663037440}}, NULL) = 0 11:56:30.211406 epoll_wait(4, [{EPOLLIN, {u32=14373240, u64=14373240}}], 14, 0) = 1 11:56:30.211640 clock_gettime64(CLOCK_REALTIME, {tv_sec=1629806190, tv_nsec=211720512}) = 0 11:56:30.211825 clock_gettime64(CLOCK_MONOTONIC, {tv_sec=3843, tv_nsec=531188829}) = 0 11:56:30.211983 clock_gettime64(CLOCK_BOOTTIME, {tv_sec=3843, tv_nsec=531335706}) = 0 11:56:30.212143 recv(3, NULL, 0, MSG_PEEK|MSG_TRUNC) = 15 11:56:30.212366 recv(3, "\t\1\0\0\3\0\0\0\0\0\5\361\361\360\0", 15, 0) = 15 ``` That is, at 11:56:30.202298 we send out the retry for the request initiated at 11:56:25.195272 and the reply arrives back at 11:56:30.212366, but in between we've already cancelled the request handler due to the instance ID interval timer expiring. Resolve this by removing the explicit configuration of the response-time-out build parameter setting the per-request response time to 4.8 seconds, setting its value back to the default of two seconds. Anecdotal testing of with the following shell script produced no failures (by inspection of the journal as the iterations executed): ``` for i in `seq 1 30`; do echo $i; ( systemctl stop pldmd mctp-demux && echo 1e78902c.kcs > /sys/bus/platform/drivers/ast-kcs-bmc/unbind && sleep 1 && echo 1e78902c.kcs > /sys/bus/platform/drivers/ast-kcs-bmc/bind && systemctl start pldmd && sleep 15 ) || break; done ``` Change-Id: Ide125d686e79376b412fca0105449c8bef722cfe Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
OpenBMC is a Linux distribution for management controllers used in devices such as servers, top of rack switches or RAID appliances. It uses Yocto, OpenEmbedded, systemd, and D-Bus to allow easy customization for your platform.
sudo apt-get install -y git build-essential libsdl1.2-dev texinfo gawk chrpath diffstat
sudo dnf install -y git patch diffstat texinfo chrpath SDL-devel bitbake \ rpcgen perl-Thread-Queue perl-bignum perl-Crypt-OpenSSL-Bignum sudo dnf groupinstall "C Development Tools and Libraries"
git clone git@github.com:openbmc/openbmc.git cd openbmc
Any build requires an environment set up according to your hardware target. There is a special script in the root of this repository that can be used to configure the environment as needed. The script is called setup
and takes the name of your hardware target as an argument.
The script needs to be sourced while in the top directory of the OpenBMC repository clone, and, if run without arguments, will display the list of supported hardware targets, see the following example:
$ . setup <machine> [build_dir] Target machine must be specified. Use one of: centriq2400-rep f0b fp5280g2 gsj hr630 hr855xg2 lanyang mihawk msn neptune nicole olympus olympus-nuvoton on5263m5 p10bmc palmetto qemuarm quanta-q71l romulus s2600wf stardragon4800-rep2 swift tiogapass vesnin witherspoon witherspoon-tacoma yosemitev2 zaius
Once you know the target (e.g. romulus), source the setup
script as follows:
. setup romulus
For evb-ast2500, please use the below command to specify the machine config, because the machine in meta-aspeed
layer is in a BSP layer and does not build the openbmc image.
TEMPLATECONF=meta-evb/meta-evb-aspeed/meta-evb-ast2500/conf . openbmc-env
bitbake obmc-phosphor-image
Additional details can be found in the docs repository.
The OpenBMC community maintains a set of tutorials new users can go through to get up to speed on OpenBMC development out here
Commits submitted by members of the OpenBMC GitHub community are compiled and tested via our Jenkins server. Commits are run through two levels of testing. At the repository level the makefile make check
directive is run. At the system level, the commit is built into a firmware image and run with an arm-softmmu QEMU model against a barrage of CI tests.
Commits submitted by non-members do not automatically proceed through CI testing. After visual inspection of the commit, a CI run can be manually performed by the reviewer.
Automated testing against the QEMU model along with supported systems are performed. The OpenBMC project uses the Robot Framework for all automation. Our complete test repository can be found here.
Support of additional hardware and software packages is always welcome. Please follow the contributing guidelines when making a submission. It is expected that contributions contain test cases.
Issues are managed on GitHub. It is recommended you search through the issues before opening a new one.
First, please do a search on the internet. There's a good chance your question has already been asked.
For general questions, please use the openbmc tag on Stack Overflow. Please review the discussion on Stack Overflow licensing before posting any code.
For technical discussions, please see contact info below for Discord and mailing list information. Please don't file an issue to ask a question. You'll get faster results by using the mailing list or Discord.
Feature List
Features In Progress
Features Requested but need help
Dive deeper into OpenBMC by opening the docs repository.
The Technical Steering Committee (TSC) guides the project. Members are: