blob: 14e6085e20e81769c70e782d485b933647a1b93d [file] [log] [blame]
Stewart Smith0330d0a2019-07-09 10:26:35 +10001Release Notes for OpenPower Firmware v2.3.1
2===========================================
3
4op-build v2.3.1 was released on July 9th, 2019 and contains several important
5fixes for POWER8 and POWER9 systems.
6
7For POWER8 and POWER9 systems there are updated skiboot, Linux, and buildroot.
8There's also an an updated hostboot for POWER8 systems.
9
10skiboot
11-------
12
13Bug fixes included in this release are:
14
15- npu2: Purge cache when resetting a GPU
16
17 After putting all a GPU's links in reset, do a cache purge in case we
18 have CPU cache lines belonging to the now-unaccessible GPU memory.
19
20- npu2: Reset NVLinks when resetting a GPU
21
22 Resetting a V100 GPU brings its NVLinks down and if an NPU tries using
23 those, an HMI occurs. We were lucky not to observe this as the bare metal
24 does not normally reset a GPU and when passed through, GPUs are usually
25 before NPUs in QEMU command line or Libvirt XML and because of that NPUs
26 are naturally reset first. However simple change of the device order
27 brings HMIs.
28
29 This defines a bus control filter for a PCI slot with a GPU with NVLinks
30 so when the host system issues secondary bus reset to the slot, it resets
31 associated NVLinks.
32
33- hw/phb4: Assert Link Disable bit after ETU init
34
35 The cursed RAID card in ozrom1 has a bug where it ignores PERST being
36 asserted. The PCIe Base spec is a little vague about what happens
37 while PERST is asserted, but it does clearly specify that when
38 PERST is de-asserted the Link Training and Status State Machine
39 (LTSSM) of a device should return to the initial state (Detect)
40 defined in the spec and the link training process should restart.
41
42 This bug was worked around in 9078f8268922 ("phb4: Delay training till
43 after PERST is deasserted") by setting the link disable bit at the
44 start of the FRESET process and clearing it after PERST was
45 de-asserted. Although this fixed the bug, the patch offered no
46 explaination of why the fix worked.
47
48 In b8b4c79d4419 ("hw/phb4: Factor out PERST control") the link disable
49 workaround was moved into phb4_assert_perst(). This is called
50 always in the CRESET case, but a following patch resulted in
51 assert_perst() not being called if phb4_freset() was entered following a
52 CRESET since p->skip_perst was set in the CRESET handler. This is bad
53 since a side-effect of the CRESET is that the Link Disable bit is
54 cleared.
55
56 This, combined with the RAID card ignoring PERST results in the PCIe
57 link being trained by the PHB while we're waiting out the 100ms
58 ETU reset time. If we hack skiboot to print a DLP trace after returning
59 from phb4_hw_init() we get: ::
60
61 PHB#0001[0:1]: Initialization complete
62 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
63 PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
64 PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
65 PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config
66 PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery
67 PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery
68 PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0
69 PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0
70 PHB#0001[0:1]: CRESET: wait_time = 100
71 PHB#0001[0:1]: FRESET: Starts
72 PHB#0001[0:1]: FRESET: Prepare for link down
73 PHB#0001[0:1]: FRESET: Assert skipped
74 PHB#0001[0:1]: FRESET: Deassert
75 PHB#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0
76 PHB#0001[0:1]: TRACE: Reached target state
77 PHB#0001[0:1]: LINK: Start polling
78 PHB#0001[0:1]: LINK: Electrical link detected
79 PHB#0001[0:1]: LINK: Link is up
80 PHB#0001[0:1]: LINK: Went down waiting for stabilty
81 PHB#0001[0:1]: LINK: DLP train control: 0x0000105101000000
82 PHB#0001[0:1]: CRESET: Starts
83
84 What has happened here is that the link is trained to 8x Gen3 33ms after
85 we return from phb4_init_hw(), and before we've waitined to 100ms
86 that we normally wait after re-initialising the ETU. When we "deassert"
87 PERST later on in the FRESET handler the link in L0 (normal) state. At
88 this point we try to read from the Vendor/Device ID register to verify
89 that the link is stable and immediately get a PHB fence due to a PCIe
90 Completion Timeout. Skiboot attempts to recover by doing another CRESET,
91 but this will encounter the same issue.
92
93 This patch fixes the problem by setting the Link Disable bit (by calling
94 phb4_assert_perst()) immediately after we return from phb4_init_hw().
95 This prevents the link from being trained while PERST is asserted which
96 seems to avoid the Completion Timeout. With the patch applied we get: ::
97
98 PHB#0001[0:1]: Initialization complete
99 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
100 PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
101 PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
102 PHB#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled
103 PHB#0001[0:1]: CRESET: wait_time = 100
104 PHB#0001[0:1]: FRESET: Starts
105 PHB#0001[0:1]: FRESET: Prepare for link down
106 PHB#0001[0:1]: FRESET: Assert skipped
107 PHB#0001[0:1]: FRESET: Deassert
108 PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect
109 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
110 PHB#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect
111 PHB#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling
112 PHB#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config
113 PHB#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery
114 PHB#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery
115 PHB#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0
116 PHB#0001[0:1]: TRACE: Reached target state
117 PHB#0001[0:1]: LINK: Start polling
118 PHB#0001[0:1]: LINK: Electrical link detected
119 PHB#0001[0:1]: LINK: Link is up
120 PHB#0001[0:1]: LINK: Link is stable
121 PHB#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled
122 PHB#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3
123 PHB#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08
124 PHB#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000
125
126- npu2: Reset PID wildcard and refcounter when mapped to LPID
127
128 Since 105d80f85b "npu2: Use unfiltered mode in XTS tables" we do not
129 register every PID in the XTS table so the table has one entry per LPID.
130 Then we added a reference counter to keep track of the entry use when
131 switching GPU between the host and guest systems (the "Fixes:" tag below).
132
133 The POWERNV platform setup creates such entries and references them
134 at the boot time when initializing IOMMUs and only removes it when
135 a GPU is passed through to a guest. This creates a problem as POWERNV
136 boots via kexec and no defererencing happens; the XTS table state remains
137 undefined. So when the host kernel boots, skiboot thinks there are valid
138 XTS entries and does not update the XTS table which breaks ATS.
139
140 This adds the reference counter and the XTS entry reset when a GPU is
141 assigned to LPID and we cannot rely on the kernel to clean that up.
142
143- hw/phb4: Use read/write_reg in assert_perst
144
145 While the PHB is fenced we can't use the MMIO interface to access PHB
146 registers. While processing a complete reset we inject a PHB fence to
147 isolate the PHB from the rest of the system because the PHB won't
148 respond to MMIOs from the rest of the system while being reset.
149
150 We assert PERST after the fence has been erected which requires us to
151 use the XSCOM indirect interface to access the PHB registers rather than
152 the MMIO interface. Previously we did that when asserting PERST in the
153 CRESET path. However in b8b4c79d4419 ("hw/phb4: Factor out PERST
154 control"). This was re-written to use the raw in_be64() accessor. This
155 means that CRESET would not be asserted in the reset path. On some
156 Mellanox cards this would prevent them from re-loading their firmware
157 when the system was fast-reset.
158
159 This patch fixes the problem by replacing the raw {in|out}_be64()
160 accessors with the phb4_{read|write}_reg() functions.
161
162- opal-prd: Fix prd message size issue
163
164 If prd messages size is insufficient then read_prd_msg() call fails with
165 below error. And caller is not reallocating sufficient buffer. Also its
166 hard to guess the size.
167
168 sample log:::
169 -----------
170 Mar 28 03:31:43 zz24p1 opal-prd: FW: error reading from firmware: alloc 32 rc -1: Invalid argument
171 Mar 28 03:31:43 zz24p1 opal-prd: FW: error reading from firmware: alloc 32 rc -1: Invalid argument
172 Mar 28 03:31:43 zz24p1 opal-prd: FW: error reading from firmware: alloc 32 rc -1: Invalid argument
173 ....
174
175 Lets use opal-msg-size device tree property to allocate memory
176 for prd message.
177
178- npu2: Fix clearing the FIR bits
179
180 FIR registers are SCOM-only so they cannot be accesses with the indirect
181 write, and yet we use SCOM-based addresses for these; fix this.
182
183- opal-gard: Account for ECC size when clearing partition
184
185 When 'opal-gard clear all' is run, it works by erasing the GUARD then
186 using blockevel_smart_write() to write nothing to the partition. This
187 second write call is needed because we rely on libflash to set the ECC
188 bits appropriately when the partition contained ECCed data.
189
190 The API for this is a little odd with the caller specifying how much
191 actual data to write, and libflash writing size + size/8 bytes
192 since there is one additional ECC byte for every eight bytes of data.
193
194 We currently do not account for the extra space consumed by the ECC data
195 in reset_partition() which is used to handle the 'clear all' command.
196 Which results in the paritition following the GUARD partition being
197 partially overwritten when the command is used. This patch fixes the
198 problem by reducing the length we would normally write by the number
199 of ECC bytes required.
200
201- nvram: Flag dangerous NVRAM options
202
203 Most nvram options used by skiboot are just for debug or testing for
204 regressions. They should never be used long term.
205
206 We've hit a number of issues in testing and the field where nvram
207 options have been set "temporarily" but haven't been properly cleared
208 after, resulting in crashes or real bugs being masked.
209
210 This patch marks most nvram options used by skiboot as dangerous and
211 prints a chicken to remind users of the problem.
212
213- devicetree: Don't set path to dtc in makefile
214
215 By setting the path we fail to build under buildroot which has it's own
216 set of host tools in PATH, but not at /usr/bin.
217
218 Keep the variable so it can be set if need be but default to whatever
219 'dtc' is in the users path.
220
221
222Linux and buildroot
223-------------------
224
225Move to Linux v5.1.15-openpower1 and buildroot 2019.02.3
226
227This updates to a in-support stable Linux release, resolving potential
228security and stability issues. Notably, this includes fixes for
229CVE-2019-12817, CVE-2019-11477, CVE-2019-11478, and CVE-2019-11479.
230
231Buildroot stays on the same major version with the .2 and .3 stable
232releases added in.
233
234The skiroot defconfig is updated to ensure we still run the MMU in Radix
235mode (see http://git.kernel.org/torvalds/c/8adddf349fda0). It also
236disables xmon by default.
237
238Hostboot
239--------
240
241Point op-build P8 hostboot at commit to report cache-count-disabled OS flag
242
243Points OP build at the P8 hostboot package commit which enables reporting to the OS
244that the cache-count-disabled Spectre workaround is available.