tree 6b59d98406c03a78d661e8b17f2eb867ebaa56c6
parent a63d7f98ea1bd2aa47ea056751a63f5e56c128e0
author Andrew Jeffery <andrew@aj.id.au> 1631594899 +0930
committer Andrew Geissler <geissonator@yahoo.com> 1631621232 +0000

Revert "Override pldm response time out value"

This reverts commit bcc5f6b0f24e8ad0b03b8217e88a19ff3002c084.

bcc5f6b0f24e ("Override pldm response time out value") talks about
timeouts due to the endpoint taking some time to respond. However, the
net effect of the change is the response to a retried request races
against the instance ID expiration interval because the retry interval
is effectively equal to the instance ID expiration interval once we
account for some timer slack.

This is demonstrated by the following strace on pldmd, where we can see
a retried request go out, followed by the report that the request
failed, further followed by the response to the request coming in. Note
the values are string-literal-escaped-octal, so the [ 0x80 0x00 0x03 ...
] byte encoding of the GetPLDMVersions request appears as "\200\0\3...":

```
...
11:56:25.046173 socket(AF_UNIX, SOCK_SEQPACKET, 0) = 3
...
11:56:25.183936 connect(3, {sa_family=AF_UNIX, sun_path=@"mctp-mux"}, 11) = 0
11:56:25.190994 write(3, "\1", 1)       = 1
...
11:56:25.195272 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\t\1", iov_len=2}, {iov_base="\200\0\3\0\0\0\0\1\0", iov_len=9}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 11
...
11:56:30.202298 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\t\1", iov_len=2}, {iov_base="\200\0\3\0\0\0\0\1\0", iov_len=9}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 11
11:56:30.202820 gettid()                = 1918
11:56:30.203029 timerfd_settime64(6, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=3848, tv_nsec=61124978641398328}}, NULL) = 0
11:56:30.203286 epoll_wait(4, [{EPOLLIN, {u32=14373240, u64=14373240}}], 14, 0) = 1
11:56:30.203509 clock_gettime64(CLOCK_REALTIME, {tv_sec=1629806190, tv_nsec=203587376}) = 0
11:56:30.203687 clock_gettime64(CLOCK_MONOTONIC, {tv_sec=3843, tv_nsec=523046301}) = 0
11:56:30.203844 clock_gettime64(CLOCK_BOOTTIME, {tv_sec=3843, tv_nsec=523206110}) = 0
11:56:30.204049 write(2, "Response not received for the re"..., 59) = 59
11:56:30.204427 write(2, " EID = ", 7)  = 7
11:56:30.204745 write(2, "9", 1)        = 1
11:56:30.205047 write(2, " INSTANCE_ID = ", 15) = 15
11:56:30.205389 write(2, "0", 1)        = 1
11:56:30.205719 write(2, " TYPE = ", 8) = 8
11:56:30.205997 write(2, "0", 1)        = 1
11:56:30.206266 write(2, " COMMAND = ", 11) = 11
11:56:30.206576 write(2, "3", 1)        = 1
11:56:30.206893 write(2, "\n", 1)       = 1
11:56:30.209402 write(2, "Failed to receive response for ", 31) = 31
11:56:30.209814 write(2, "getPLDMVersion command, Host see"..., 46) = 46
11:56:30.210969 gettid()                = 1918
11:56:30.211171 timerfd_settime64(6, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=8549172174085160960}, it_value={tv_sec=0, tv_nsec=8566510441663037440}}, NULL) = 0
11:56:30.211406 epoll_wait(4, [{EPOLLIN, {u32=14373240, u64=14373240}}], 14, 0) = 1
11:56:30.211640 clock_gettime64(CLOCK_REALTIME, {tv_sec=1629806190, tv_nsec=211720512}) = 0
11:56:30.211825 clock_gettime64(CLOCK_MONOTONIC, {tv_sec=3843, tv_nsec=531188829}) = 0
11:56:30.211983 clock_gettime64(CLOCK_BOOTTIME, {tv_sec=3843, tv_nsec=531335706}) = 0
11:56:30.212143 recv(3, NULL, 0, MSG_PEEK|MSG_TRUNC) = 15
11:56:30.212366 recv(3, "\t\1\0\0\3\0\0\0\0\0\5\361\361\360\0", 15, 0) = 15
```

That is, at 11:56:30.202298 we send out the retry for the request
initiated at 11:56:25.195272 and the reply arrives back at
11:56:30.212366, but in between we've already cancelled the request
handler due to the instance ID interval timer expiring.

Resolve this by removing the explicit configuration of the
response-time-out build parameter setting the per-request response time
to 4.8 seconds, setting its value back to the default of two seconds.
Anecdotal testing of with the following shell script produced no
failures (by inspection of the journal as the iterations executed):

```
for i in `seq 1 30`; do echo $i; ( systemctl stop pldmd mctp-demux && echo 1e78902c.kcs > /sys/bus/platform/drivers/ast-kcs-bmc/unbind && sleep 1 && echo 1e78902c.kcs > /sys/bus/platform/drivers/ast-kcs-bmc/bind && systemctl start pldmd && sleep 15 ) || break; done
```

Change-Id: Ide125d686e79376b412fca0105449c8bef722cfe
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
