blob: 64c3e1b6e7a5be95742c6f86b01bc5c3c04d9885 [file] [log] [blame]
Deployment Bot (from Travis CI)6d7e6e92021-06-14 19:18:41 +00001
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4
5<html xmlns="http://www.w3.org/1999/xhtml">
6 <head>
7 <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
9 <title>Release Notes for OpenPower Firmware v2.3.1 &#8212; OpenPOWER Firmware v2.6-257-g5b5624c2
10 documentation</title>
11 <link rel="stylesheet" href="../_static/alabaster.css" type="text/css" />
12 <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
13 <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
14 <script type="text/javascript" src="../_static/jquery.js"></script>
15 <script type="text/javascript" src="../_static/underscore.js"></script>
16 <script type="text/javascript" src="../_static/doctools.js"></script>
17 <link rel="index" title="Index" href="../genindex.html" />
18 <link rel="search" title="Search" href="../search.html" />
19 <link rel="next" title="Release Notes for OpenPower Firmware v2.4-rc1" href="v2.4-rc1.html" />
20 <link rel="prev" title="Release Notes for OpenPower Firmware v2.3" href="v2.3.html" />
21
22 <link rel="stylesheet" href="../_static/custom.css" type="text/css" />
23
24
25 <meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" />
26
27 </head><body>
28
29
30 <div class="document">
31 <div class="documentwrapper">
32 <div class="bodywrapper">
33
34
35 <div class="body" role="main">
36
37 <div class="section" id="release-notes-for-openpower-firmware-v2-3-1">
38<h1>Release Notes for OpenPower Firmware v2.3.1<a class="headerlink" href="#release-notes-for-openpower-firmware-v2-3-1" title="Permalink to this headline"></a></h1>
39<p>op-build v2.3.1 was released on July 9th, 2019 and contains several important
40fixes for POWER8 and POWER9 systems.</p>
41<p>For POWER8 and POWER9 systems there are updated skiboot, Linux, and buildroot.
42There’s also an an updated hostboot for POWER8 systems.</p>
43<div class="section" id="skiboot">
44<h2>skiboot<a class="headerlink" href="#skiboot" title="Permalink to this headline"></a></h2>
45<p>Bug fixes included in this release are:</p>
46<ul>
47<li><p class="first">npu2: Purge cache when resetting a GPU</p>
48<p>After putting all a GPU’s links in reset, do a cache purge in case we
49have CPU cache lines belonging to the now-unaccessible GPU memory.</p>
50</li>
51<li><p class="first">npu2: Reset NVLinks when resetting a GPU</p>
52<p>Resetting a V100 GPU brings its NVLinks down and if an NPU tries using
53those, an HMI occurs. We were lucky not to observe this as the bare metal
54does not normally reset a GPU and when passed through, GPUs are usually
55before NPUs in QEMU command line or Libvirt XML and because of that NPUs
56are naturally reset first. However simple change of the device order
57brings HMIs.</p>
58<p>This defines a bus control filter for a PCI slot with a GPU with NVLinks
59so when the host system issues secondary bus reset to the slot, it resets
60associated NVLinks.</p>
61</li>
62<li><p class="first">hw/phb4: Assert Link Disable bit after ETU init</p>
63<p>The cursed RAID card in ozrom1 has a bug where it ignores PERST being
64asserted. The PCIe Base spec is a little vague about what happens
65while PERST is asserted, but it does clearly specify that when
66PERST is de-asserted the Link Training and Status State Machine
67(LTSSM) of a device should return to the initial state (Detect)
68defined in the spec and the link training process should restart.</p>
69<p>This bug was worked around in 9078f8268922 (“phb4: Delay training till
70after PERST is deasserted”) by setting the link disable bit at the
71start of the FRESET process and clearing it after PERST was
72de-asserted. Although this fixed the bug, the patch offered no
73explaination of why the fix worked.</p>
74<p>In b8b4c79d4419 (“hw/phb4: Factor out PERST control”) the link disable
75workaround was moved into phb4_assert_perst(). This is called
76always in the CRESET case, but a following patch resulted in
77assert_perst() not being called if phb4_freset() was entered following a
78CRESET since p-&gt;skip_perst was set in the CRESET handler. This is bad
79since a side-effect of the CRESET is that the Link Disable bit is
80cleared.</p>
81<p>This, combined with the RAID card ignoring PERST results in the PCIe
82link being trained by the PHB while we’re waiting out the 100ms
83ETU reset time. If we hack skiboot to print a DLP trace after returning
84from phb4_hw_init() we get:</p>
85<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">PHB</span><span class="c1">#0001[0:1]: Initialization complete</span>
86<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling</span>
87<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect</span>
88<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling</span>
89<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config</span>
90<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery</span>
91<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery</span>
92<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0</span>
93<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0</span>
94<span class="n">PHB</span><span class="c1">#0001[0:1]: CRESET: wait_time = 100</span>
95<span class="n">PHB</span><span class="c1">#0001[0:1]: FRESET: Starts</span>
96<span class="n">PHB</span><span class="c1">#0001[0:1]: FRESET: Prepare for link down</span>
97<span class="n">PHB</span><span class="c1">#0001[0:1]: FRESET: Assert skipped</span>
98<span class="n">PHB</span><span class="c1">#0001[0:1]: FRESET: Deassert</span>
99<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0</span>
100<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE: Reached target state</span>
101<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Start polling</span>
102<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Electrical link detected</span>
103<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Link is up</span>
104<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Went down waiting for stabilty</span>
105<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: DLP train control: 0x0000105101000000</span>
106<span class="n">PHB</span><span class="c1">#0001[0:1]: CRESET: Starts</span>
107</pre></div>
108</div>
109<p>What has happened here is that the link is trained to 8x Gen3 33ms after
110we return from phb4_init_hw(), and before we’ve waitined to 100ms
111that we normally wait after re-initialising the ETU. When we “deassert”
112PERST later on in the FRESET handler the link in L0 (normal) state. At
113this point we try to read from the Vendor/Device ID register to verify
114that the link is stable and immediately get a PHB fence due to a PCIe
115Completion Timeout. Skiboot attempts to recover by doing another CRESET,
116but this will encounter the same issue.</p>
117<p>This patch fixes the problem by setting the Link Disable bit (by calling
118phb4_assert_perst()) immediately after we return from phb4_init_hw().
119This prevents the link from being trained while PERST is asserted which
120seems to avoid the Completion Timeout. With the patch applied we get:</p>
121<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">PHB</span><span class="c1">#0001[0:1]: Initialization complete</span>
122<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling</span>
123<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect</span>
124<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling</span>
125<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled</span>
126<span class="n">PHB</span><span class="c1">#0001[0:1]: CRESET: wait_time = 100</span>
127<span class="n">PHB</span><span class="c1">#0001[0:1]: FRESET: Starts</span>
128<span class="n">PHB</span><span class="c1">#0001[0:1]: FRESET: Prepare for link down</span>
129<span class="n">PHB</span><span class="c1">#0001[0:1]: FRESET: Assert skipped</span>
130<span class="n">PHB</span><span class="c1">#0001[0:1]: FRESET: Deassert</span>
131<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect</span>
132<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling</span>
133<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect</span>
134<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling</span>
135<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config</span>
136<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery</span>
137<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery</span>
138<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0</span>
139<span class="n">PHB</span><span class="c1">#0001[0:1]: TRACE: Reached target state</span>
140<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Start polling</span>
141<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Electrical link detected</span>
142<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Link is up</span>
143<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Link is stable</span>
144<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled</span>
145<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3</span>
146<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08</span>
147<span class="n">PHB</span><span class="c1">#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000</span>
148</pre></div>
149</div>
150</li>
151<li><p class="first">npu2: Reset PID wildcard and refcounter when mapped to LPID</p>
152<p>Since 105d80f85b “npu2: Use unfiltered mode in XTS tables” we do not
153register every PID in the XTS table so the table has one entry per LPID.
154Then we added a reference counter to keep track of the entry use when
155switching GPU between the host and guest systems (the “Fixes:” tag below).</p>
156<p>The POWERNV platform setup creates such entries and references them
157at the boot time when initializing IOMMUs and only removes it when
158a GPU is passed through to a guest. This creates a problem as POWERNV
159boots via kexec and no defererencing happens; the XTS table state remains
160undefined. So when the host kernel boots, skiboot thinks there are valid
161XTS entries and does not update the XTS table which breaks ATS.</p>
162<p>This adds the reference counter and the XTS entry reset when a GPU is
163assigned to LPID and we cannot rely on the kernel to clean that up.</p>
164</li>
165<li><p class="first">hw/phb4: Use read/write_reg in assert_perst</p>
166<p>While the PHB is fenced we can’t use the MMIO interface to access PHB
167registers. While processing a complete reset we inject a PHB fence to
168isolate the PHB from the rest of the system because the PHB won’t
169respond to MMIOs from the rest of the system while being reset.</p>
170<p>We assert PERST after the fence has been erected which requires us to
171use the XSCOM indirect interface to access the PHB registers rather than
172the MMIO interface. Previously we did that when asserting PERST in the
173CRESET path. However in b8b4c79d4419 (“hw/phb4: Factor out PERST
174control”). This was re-written to use the raw in_be64() accessor. This
175means that CRESET would not be asserted in the reset path. On some
176Mellanox cards this would prevent them from re-loading their firmware
177when the system was fast-reset.</p>
178<p>This patch fixes the problem by replacing the raw {in|out}_be64()
179accessors with the phb4_{read|write}_reg() functions.</p>
180</li>
181<li><p class="first">opal-prd: Fix prd message size issue</p>
182<p>If prd messages size is insufficient then read_prd_msg() call fails with
183below error. And caller is not reallocating sufficient buffer. Also its
184hard to guess the size.</p>
185<p>Mar 28 03:31:43 zz24p1 opal-prd: FW: error reading from firmware: alloc 32 rc -1: Invalid argument
186Mar 28 03:31:43 zz24p1 opal-prd: FW: error reading from firmware: alloc 32 rc -1: Invalid argument
187Mar 28 03:31:43 zz24p1 opal-prd: FW: error reading from firmware: alloc 32 rc -1: Invalid argument
188….</p>
189<p>Lets use opal-msg-size device tree property to allocate memory
190for prd message.</p>
191</li>
192<li><p class="first">npu2: Fix clearing the FIR bits</p>
193<p>FIR registers are SCOM-only so they cannot be accesses with the indirect
194write, and yet we use SCOM-based addresses for these; fix this.</p>
195</li>
196<li><p class="first">opal-gard: Account for ECC size when clearing partition</p>
197<p>When ‘opal-gard clear all’ is run, it works by erasing the GUARD then
198using blockevel_smart_write() to write nothing to the partition. This
199second write call is needed because we rely on libflash to set the ECC
200bits appropriately when the partition contained ECCed data.</p>
201<p>The API for this is a little odd with the caller specifying how much
202actual data to write, and libflash writing size + size/8 bytes
203since there is one additional ECC byte for every eight bytes of data.</p>
204<p>We currently do not account for the extra space consumed by the ECC data
205in reset_partition() which is used to handle the ‘clear all’ command.
206Which results in the paritition following the GUARD partition being
207partially overwritten when the command is used. This patch fixes the
208problem by reducing the length we would normally write by the number
209of ECC bytes required.</p>
210</li>
211<li><p class="first">nvram: Flag dangerous NVRAM options</p>
212<p>Most nvram options used by skiboot are just for debug or testing for
213regressions. They should never be used long term.</p>
214<p>We’ve hit a number of issues in testing and the field where nvram
215options have been set “temporarily” but haven’t been properly cleared
216after, resulting in crashes or real bugs being masked.</p>
217<p>This patch marks most nvram options used by skiboot as dangerous and
218prints a chicken to remind users of the problem.</p>
219</li>
220<li><p class="first">devicetree: Don’t set path to dtc in makefile</p>
221<p>By setting the path we fail to build under buildroot which has it’s own
222set of host tools in PATH, but not at /usr/bin.</p>
223<p>Keep the variable so it can be set if need be but default to whatever
224‘dtc’ is in the users path.</p>
225</li>
226</ul>
227</div>
228<div class="section" id="linux-and-buildroot">
229<h2>Linux and buildroot<a class="headerlink" href="#linux-and-buildroot" title="Permalink to this headline"></a></h2>
230<p>Move to Linux v5.1.15-openpower1 and buildroot 2019.02.3</p>
231<p>This updates to a in-support stable Linux release, resolving potential
232security and stability issues. Notably, this includes fixes for
233CVE-2019-12817, CVE-2019-11477, CVE-2019-11478, and CVE-2019-11479.</p>
234<p>Buildroot stays on the same major version with the .2 and .3 stable
235releases added in.</p>
236<p>The skiroot defconfig is updated to ensure we still run the MMU in Radix
237mode (see <a class="reference external" href="http://git.kernel.org/torvalds/c/8adddf349fda0">http://git.kernel.org/torvalds/c/8adddf349fda0</a>). It also
238disables xmon by default.</p>
239</div>
240<div class="section" id="hostboot">
241<h2>Hostboot<a class="headerlink" href="#hostboot" title="Permalink to this headline"></a></h2>
242<p>Point op-build P8 hostboot at commit to report cache-count-disabled OS flag</p>
243<p>Points OP build at the P8 hostboot package commit which enables reporting to the OS
244that the cache-count-disabled Spectre workaround is available.</p>
245</div>
246</div>
247
248
249 </div>
250
251 </div>
252 </div>
253 <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
254 <div class="sphinxsidebarwrapper">
255<h1 class="logo"><a href="../index.html">OpenPOWER Firmware</a></h1>
256
257
258
259
260
261
262
263
264<h3>Navigation</h3>
265<p class="caption"><span class="caption-text">Contents:</span></p>
266<ul class="current">
267<li class="toctree-l1"><a class="reference internal" href="../introduction.html">Introduction to OpenPOWER Firmware</a></li>
268<li class="toctree-l1"><a class="reference internal" href="../testing.html">Testing op-build</a></li>
269<li class="toctree-l1"><a class="reference internal" href="../process/index.html">Development Process</a></li>
270<li class="toctree-l1"><a class="reference internal" href="../boot-devices.html">Supported Boot Devices</a></li>
271<li class="toctree-l1"><a class="reference internal" href="../versioning.html">Version Scheme</a></li>
272<li class="toctree-l1 current"><a class="reference internal" href="index.html">op-build Release Notes</a><ul class="current">
273<li class="toctree-l2"><a class="reference internal" href="index.html#v1-21">v1.21</a></li>
274<li class="toctree-l2"><a class="reference internal" href="index.html#v1-22">v1.22</a></li>
275<li class="toctree-l2"><a class="reference internal" href="index.html#v2-0">v2.0</a></li>
276<li class="toctree-l2"><a class="reference internal" href="index.html#v2-1">v2.1</a></li>
277<li class="toctree-l2"><a class="reference internal" href="index.html#v2-2">v2.2</a></li>
278<li class="toctree-l2 current"><a class="reference internal" href="index.html#v2-3">v2.3</a></li>
279<li class="toctree-l2"><a class="reference internal" href="index.html#v2-4">v2.4</a></li>
280<li class="toctree-l2"><a class="reference internal" href="index.html#v2-5">v2.5</a></li>
281<li class="toctree-l2"><a class="reference internal" href="index.html#v2-6">v2.6</a></li>
282</ul>
283</li>
284</ul>
285
286<div class="relations">
287<h3>Related Topics</h3>
288<ul>
289 <li><a href="../index.html">Documentation overview</a><ul>
290 <li><a href="index.html">op-build Release Notes</a><ul>
291 <li>Previous: <a href="v2.3.html" title="previous chapter">Release Notes for OpenPower Firmware v2.3</a></li>
292 <li>Next: <a href="v2.4-rc1.html" title="next chapter">Release Notes for OpenPower Firmware v2.4-rc1</a></li>
293 </ul></li>
294 </ul></li>
295</ul>
296</div>
297<div id="searchbox" style="display: none" role="search">
298 <h3>Quick search</h3>
299 <div class="searchformwrapper">
300 <form class="search" action="../search.html" method="get">
301 <input type="text" name="q" />
302 <input type="submit" value="Go" />
303 <input type="hidden" name="check_keywords" value="yes" />
304 <input type="hidden" name="area" value="default" />
305 </form>
306 </div>
307</div>
308<script type="text/javascript">$('#searchbox').show(0);</script>
309
310
311
312
313
314
315
316
317 </div>
318 </div>
319 <div class="clearer"></div>
320 </div>
321 <div class="footer">
322 &copy;2017, OpenPOWER Foundation System Software Work Group.
323
324 |
325 Powered by <a href="http://sphinx-doc.org/">Sphinx 1.7.9</a>
326 &amp; <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.12</a>
327
328 |
329 <a href="../_sources/release-notes/v2.3.1.rst.txt"
330 rel="nofollow">Page source</a>
331 </div>
332
333
334
335
336 </body>
337</html>