blob: c0873e13ab051f18b698b2e042e2cca571e7603f [file] [log] [blame]
Patrick Williamsc124f4f2015-09-15 14:41:29 -05001<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
2"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
3[<!ENTITY % poky SYSTEM "../poky.ent"> %poky; ] >
4
5<chapter id='profile-manual-usage'>
6
7<title>Basic Usage (with examples) for each of the Yocto Tracing Tools</title>
8
9<para>
10 This chapter presents basic usage examples for each of the tracing
11 tools.
12</para>
13
14<section id='profile-manual-perf'>
15 <title>perf</title>
16
17 <para>
18 The 'perf' tool is the profiling and tracing tool that comes
19 bundled with the Linux kernel.
20 </para>
21
22 <para>
23 Don't let the fact that it's part of the kernel fool you into thinking
24 that it's only for tracing and profiling the kernel - you can indeed
25 use it to trace and profile just the kernel, but you can also use it
26 to profile specific applications separately (with or without kernel
27 context), and you can also use it to trace and profile the kernel
28 and all applications on the system simultaneously to gain a system-wide
29 view of what's going on.
30 </para>
31
32 <para>
33 In many ways, perf aims to be a superset of all the tracing and profiling
34 tools available in Linux today, including all the other tools covered
35 in this HOWTO. The past couple of years have seen perf subsume a lot
36 of the functionality of those other tools and, at the same time, those
37 other tools have removed large portions of their previous functionality
38 and replaced it with calls to the equivalent functionality now
39 implemented by the perf subsystem. Extrapolation suggests that at
40 some point those other tools will simply become completely redundant
41 and go away; until then, we'll cover those other tools in these pages
42 and in many cases show how the same things can be accomplished in
43 perf and the other tools when it seems useful to do so.
44 </para>
45
46 <para>
47 The coverage below details some of the most common ways you'll likely
48 want to apply the tool; full documentation can be found either within
49 the tool itself or in the man pages at
50 <ulink url='http://linux.die.net/man/1/perf'>perf(1)</ulink>.
51 </para>
52
53 <section id='perf-setup'>
54 <title>Setup</title>
55
56 <para>
57 For this section, we'll assume you've already performed the basic
58 setup outlined in the General Setup section.
59 </para>
60
61 <para>
62 In particular, you'll get the most mileage out of perf if you
Patrick Williamsc0f7c042017-02-23 20:41:17 -060063 profile an image built with the following in your
64 <filename>local.conf</filename> file:
65 <literallayout class='monospaced'>
66 <ulink url='&YOCTO_DOCS_REF_URL;#var-INHIBIT_PACKAGE_STRIP'>INHIBIT_PACKAGE_STRIP</ulink> = "1"
67 </literallayout>
Patrick Williamsc124f4f2015-09-15 14:41:29 -050068 </para>
69
70 <para>
71 perf runs on the target system for the most part. You can archive
72 profile data and copy it to the host for analysis, but for the
73 rest of this document we assume you've ssh'ed to the host and
74 will be running the perf commands on the target.
75 </para>
76 </section>
77
78 <section id='perf-basic-usage'>
79 <title>Basic Usage</title>
80
81 <para>
82 The perf tool is pretty much self-documenting. To remind yourself
83 of the available commands, simply type 'perf', which will show you
84 basic usage along with the available perf subcommands:
85 <literallayout class='monospaced'>
86 root@crownbay:~# perf
87
88 usage: perf [--version] [--help] COMMAND [ARGS]
89
90 The most commonly used perf commands are:
91 annotate Read perf.data (created by perf record) and display annotated code
92 archive Create archive with object files with build-ids found in perf.data file
93 bench General framework for benchmark suites
94 buildid-cache Manage build-id cache.
95 buildid-list List the buildids in a perf.data file
96 diff Read two perf.data files and display the differential profile
97 evlist List the event names in a perf.data file
98 inject Filter to augment the events stream with additional information
99 kmem Tool to trace/measure kernel memory(slab) properties
100 kvm Tool to trace/measure kvm guest os
101 list List all symbolic event types
102 lock Analyze lock events
103 probe Define new dynamic tracepoints
104 record Run a command and record its profile into perf.data
105 report Read perf.data (created by perf record) and display the profile
106 sched Tool to trace/measure scheduler properties (latencies)
107 script Read perf.data (created by perf record) and display trace output
108 stat Run a command and gather performance counter statistics
109 test Runs sanity tests.
110 timechart Tool to visualize total system behavior during a workload
111 top System profiling tool.
112
113 See 'perf help COMMAND' for more information on a specific command.
114 </literallayout>
115 </para>
116
117 <section id='using-perf-to-do-basic-profiling'>
118 <title>Using perf to do Basic Profiling</title>
119
120 <para>
121 As a simple test case, we'll profile the 'wget' of a fairly large
122 file, which is a minimally interesting case because it has both
123 file and network I/O aspects, and at least in the case of standard
124 Yocto images, it's implemented as part of busybox, so the methods
125 we use to analyze it can be used in a very similar way to the whole
126 host of supported busybox applets in Yocto.
127 <literallayout class='monospaced'>
128 root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \
129 wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
130 </literallayout>
131 The quickest and easiest way to get some basic overall data about
132 what's going on for a particular workload is to profile it using
133 'perf stat'. 'perf stat' basically profiles using a few default
134 counters and displays the summed counts at the end of the run:
135 <literallayout class='monospaced'>
136 root@crownbay:~# perf stat wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
137 Connecting to downloads.yoctoproject.org (140.211.169.59:80)
138 linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA
139
140 Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':
141
142 4597.223902 task-clock # 0.077 CPUs utilized
143 23568 context-switches # 0.005 M/sec
144 68 CPU-migrations # 0.015 K/sec
145 241 page-faults # 0.052 K/sec
146 3045817293 cycles # 0.663 GHz
147 &lt;not supported&gt; stalled-cycles-frontend
148 &lt;not supported&gt; stalled-cycles-backend
149 858909167 instructions # 0.28 insns per cycle
150 165441165 branches # 35.987 M/sec
151 19550329 branch-misses # 11.82% of all branches
152
153 59.836627620 seconds time elapsed
154 </literallayout>
155 Many times such a simple-minded test doesn't yield much of
156 interest, but sometimes it does (see Real-world Yocto bug
157 (slow loop-mounted write speed)).
158 </para>
159
160 <para>
161 Also, note that 'perf stat' isn't restricted to a fixed set of
162 counters - basically any event listed in the output of 'perf list'
163 can be tallied by 'perf stat'. For example, suppose we wanted to
164 see a summary of all the events related to kernel memory
165 allocation/freeing along with cache hits and misses:
166 <literallayout class='monospaced'>
167 root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
168 Connecting to downloads.yoctoproject.org (140.211.169.59:80)
169 linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA
170
171 Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':
172
173 5566 kmem:kmalloc
174 125517 kmem:kmem_cache_alloc
175 0 kmem:kmalloc_node
176 0 kmem:kmem_cache_alloc_node
177 34401 kmem:kfree
178 69920 kmem:kmem_cache_free
179 133 kmem:mm_page_free
180 41 kmem:mm_page_free_batched
181 11502 kmem:mm_page_alloc
182 11375 kmem:mm_page_alloc_zone_locked
183 0 kmem:mm_page_pcpu_drain
184 0 kmem:mm_page_alloc_extfrag
185 66848602 cache-references
186 2917740 cache-misses # 4.365 % of all cache refs
187
188 44.831023415 seconds time elapsed
189 </literallayout>
190 So 'perf stat' gives us a nice easy way to get a quick overview of
191 what might be happening for a set of events, but normally we'd
192 need a little more detail in order to understand what's going on
193 in a way that we can act on in a useful way.
194 </para>
195
196 <para>
197 To dive down into a next level of detail, we can use 'perf
198 record'/'perf report' which will collect profiling data and
199 present it to use using an interactive text-based UI (or
200 simply as text if we specify --stdio to 'perf report').
201 </para>
202
203 <para>
204 As our first attempt at profiling this workload, we'll simply
205 run 'perf record', handing it the workload we want to profile
206 (everything after 'perf record' and any perf options we hand
207 it - here none - will be executed in a new shell). perf collects
208 samples until the process exits and records them in a file named
209 'perf.data' in the current working directory.
210 <literallayout class='monospaced'>
211 root@crownbay:~# perf record wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
212
213 Connecting to downloads.yoctoproject.org (140.211.169.59:80)
214 linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA
215 [ perf record: Woken up 1 times to write data ]
216 [ perf record: Captured and wrote 0.176 MB perf.data (~7700 samples) ]
217 </literallayout>
218 To see the results in a 'text-based UI' (tui), simply run
219 'perf report', which will read the perf.data file in the current
220 working directory and display the results in an interactive UI:
221 <literallayout class='monospaced'>
222 root@crownbay:~# perf report
223 </literallayout>
224 </para>
225
226 <para>
227 <imagedata fileref="figures/perf-wget-flat-stripped.png" width="6in" depth="7in" align="center" scalefit="1" />
228 </para>
229
230 <para>
231 The above screenshot displays a 'flat' profile, one entry for
232 each 'bucket' corresponding to the functions that were profiled
233 during the profiling run, ordered from the most popular to the
234 least (perf has options to sort in various orders and keys as
235 well as display entries only above a certain threshold and so
236 on - see the perf documentation for details). Note that this
237 includes both userspace functions (entries containing a [.]) and
238 kernel functions accounted to the process (entries containing
239 a [k]). (perf has command-line modifiers that can be used to
240 restrict the profiling to kernel or userspace, among others).
241 </para>
242
243 <para>
244 Notice also that the above report shows an entry for 'busybox',
245 which is the executable that implements 'wget' in Yocto, but that
246 instead of a useful function name in that entry, it displays
247 a not-so-friendly hex value instead. The steps below will show
248 how to fix that problem.
249 </para>
250
251 <para>
252 Before we do that, however, let's try running a different profile,
253 one which shows something a little more interesting. The only
254 difference between the new profile and the previous one is that
255 we'll add the -g option, which will record not just the address
256 of a sampled function, but the entire callchain to the sampled
257 function as well:
258 <literallayout class='monospaced'>
259 root@crownbay:~# perf record -g wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
260 Connecting to downloads.yoctoproject.org (140.211.169.59:80)
261 linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA
262 [ perf record: Woken up 3 times to write data ]
263 [ perf record: Captured and wrote 0.652 MB perf.data (~28476 samples) ]
264
265
266 root@crownbay:~# perf report
267 </literallayout>
268 </para>
269
270 <para>
271 <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" />
272 </para>
273
274 <para>
275 Using the callgraph view, we can actually see not only which
276 functions took the most time, but we can also see a summary of
277 how those functions were called and learn something about how the
278 program interacts with the kernel in the process.
279 </para>
280
281 <para>
282 Notice that each entry in the above screenshot now contains a '+'
283 on the left-hand side. This means that we can expand the entry and
284 drill down into the callchains that feed into that entry.
285 Pressing 'enter' on any one of them will expand the callchain
286 (you can also press 'E' to expand them all at the same time or 'C'
287 to collapse them all).
288 </para>
289
290 <para>
291 In the screenshot above, we've toggled the __copy_to_user_ll()
292 entry and several subnodes all the way down. This lets us see
293 which callchains contributed to the profiled __copy_to_user_ll()
294 function which contributed 1.77% to the total profile.
295 </para>
296
297 <para>
298 As a bit of background explanation for these callchains, think
299 about what happens at a high level when you run wget to get a file
300 out on the network. Basically what happens is that the data comes
301 into the kernel via the network connection (socket) and is passed
302 to the userspace program 'wget' (which is actually a part of
303 busybox, but that's not important for now), which takes the buffers
304 the kernel passes to it and writes it to a disk file to save it.
305 </para>
306
307 <para>
308 The part of this process that we're looking at in the above call
309 stacks is the part where the kernel passes the data it's read from
310 the socket down to wget i.e. a copy-to-user.
311 </para>
312
313 <para>
314 Notice also that here there's also a case where the hex value
315 is displayed in the callstack, here in the expanded
316 sys_clock_gettime() function. Later we'll see it resolve to a
317 userspace function call in busybox.
318 </para>
319
320 <para>
321 <imagedata fileref="figures/perf-wget-g-copy-from-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" />
322 </para>
323
324 <para>
325 The above screenshot shows the other half of the journey for the
326 data - from the wget program's userspace buffers to disk. To get
327 the buffers to disk, the wget program issues a write(2), which
328 does a copy-from-user to the kernel, which then takes care via
329 some circuitous path (probably also present somewhere in the
330 profile data), to get it safely to disk.
331 </para>
332
333 <para>
334 Now that we've seen the basic layout of the profile data and the
335 basics of how to extract useful information out of it, let's get
336 back to the task at hand and see if we can get some basic idea
337 about where the time is spent in the program we're profiling,
338 wget. Remember that wget is actually implemented as an applet
339 in busybox, so while the process name is 'wget', the executable
340 we're actually interested in is busybox. So let's expand the
341 first entry containing busybox:
342 </para>
343
344 <para>
345 <imagedata fileref="figures/perf-wget-busybox-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" />
346 </para>
347
348 <para>
349 Again, before we expanded we saw that the function was labeled
350 with a hex value instead of a symbol as with most of the kernel
351 entries. Expanding the busybox entry doesn't make it any better.
352 </para>
353
354 <para>
355 The problem is that perf can't find the symbol information for the
356 busybox binary, which is actually stripped out by the Yocto build
357 system.
358 </para>
359
360 <para>
Patrick Williamsc0f7c042017-02-23 20:41:17 -0600361 One way around that is to put the following in your
362 <filename>local.conf</filename> file when you build the image:
Patrick Williamsc124f4f2015-09-15 14:41:29 -0500363 <literallayout class='monospaced'>
Patrick Williamsc0f7c042017-02-23 20:41:17 -0600364 <ulink url='&YOCTO_DOCS_REF_URL;#var-INHIBIT_PACKAGE_STRIP'>INHIBIT_PACKAGE_STRIP</ulink> = "1"
Patrick Williamsc124f4f2015-09-15 14:41:29 -0500365 </literallayout>
366 However, we already have an image with the binaries stripped,
367 so what can we do to get perf to resolve the symbols? Basically
368 we need to install the debuginfo for the busybox package.
369 </para>
370
371 <para>
372 To generate the debug info for the packages in the image, we can
373 add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example:
374 <literallayout class='monospaced'>
375 EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs"
376 </literallayout>
377 Additionally, in order to generate the type of debuginfo that
378 perf understands, we also need to add the following to local.conf:
379 <literallayout class='monospaced'>
380 PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory'
381 </literallayout>
382 Once we've done that, we can install the debuginfo for busybox.
383 The debug packages once built can be found in
384 build/tmp/deploy/rpm/* on the host system. Find the
385 busybox-dbg-...rpm file and copy it to the target. For example:
386 <literallayout class='monospaced'>
387 [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2_32/busybox-dbg-1.20.2-r2.core2_32.rpm root@192.168.1.31:
388 root@192.168.1.31's password:
389 busybox-dbg-1.20.2-r2.core2_32.rpm 100% 1826KB 1.8MB/s 00:01
390 </literallayout>
391 Now install the debug rpm on the target:
392 <literallayout class='monospaced'>
393 root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2_32.rpm
394 </literallayout>
395 Now that the debuginfo is installed, we see that the busybox
396 entries now display their functions symbolically:
397 </para>
398
399 <para>
400 <imagedata fileref="figures/perf-wget-busybox-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" />
401 </para>
402
403 <para>
404 If we expand one of the entries and press 'enter' on a leaf node,
405 we're presented with a menu of actions we can take to get more
406 information related to that entry:
407 </para>
408
409 <para>
410 <imagedata fileref="figures/perf-wget-busybox-dso-zoom-menu.png" width="6in" depth="2in" align="center" scalefit="1" />
411 </para>
412
413 <para>
414 One of these actions allows us to show a view that displays a
415 busybox-centric view of the profiled functions (in this case we've
416 also expanded all the nodes using the 'E' key):
417 </para>
418
419 <para>
420 <imagedata fileref="figures/perf-wget-busybox-dso-zoom.png" width="6in" depth="7in" align="center" scalefit="1" />
421 </para>
422
423 <para>
424 Finally, we can see that now that the busybox debuginfo is
425 installed, the previously unresolved symbol in the
426 sys_clock_gettime() entry mentioned previously is now resolved,
427 and shows that the sys_clock_gettime system call that was the
428 source of 6.75% of the copy-to-user overhead was initiated by
429 the handle_input() busybox function:
430 </para>
431
432 <para>
433 <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" />
434 </para>
435
436 <para>
437 At the lowest level of detail, we can dive down to the assembly
438 level and see which instructions caused the most overhead in a
439 function. Pressing 'enter' on the 'udhcpc_main' function, we're
440 again presented with a menu:
441 </para>
442
443 <para>
444 <imagedata fileref="figures/perf-wget-busybox-annotate-menu.png" width="6in" depth="2in" align="center" scalefit="1" />
445 </para>
446
447 <para>
448 Selecting 'Annotate udhcpc_main', we get a detailed listing of
449 percentages by instruction for the udhcpc_main function. From the
450 display, we can see that over 50% of the time spent in this
451 function is taken up by a couple tests and the move of a
452 constant (1) to a register:
453 </para>
454
455 <para>
456 <imagedata fileref="figures/perf-wget-busybox-annotate-udhcpc.png" width="6in" depth="7in" align="center" scalefit="1" />
457 </para>
458
459 <para>
460 As a segue into tracing, let's try another profile using a
461 different counter, something other than the default 'cycles'.
462 </para>
463
464 <para>
465 The tracing and profiling infrastructure in Linux has become
466 unified in a way that allows us to use the same tool with a
467 completely different set of counters, not just the standard
468 hardware counters that traditional tools have had to restrict
469 themselves to (of course the traditional tools can also make use
470 of the expanded possibilities now available to them, and in some
471 cases have, as mentioned previously).
472 </para>
473
474 <para>
475 We can get a list of the available events that can be used to
476 profile a workload via 'perf list':
477 <literallayout class='monospaced'>
478 root@crownbay:~# perf list
479
480 List of pre-defined events (to be used in -e):
481 cpu-cycles OR cycles [Hardware event]
482 stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
483 stalled-cycles-backend OR idle-cycles-backend [Hardware event]
484 instructions [Hardware event]
485 cache-references [Hardware event]
486 cache-misses [Hardware event]
487 branch-instructions OR branches [Hardware event]
488 branch-misses [Hardware event]
489 bus-cycles [Hardware event]
490 ref-cycles [Hardware event]
491
492 cpu-clock [Software event]
493 task-clock [Software event]
494 page-faults OR faults [Software event]
495 minor-faults [Software event]
496 major-faults [Software event]
497 context-switches OR cs [Software event]
498 cpu-migrations OR migrations [Software event]
499 alignment-faults [Software event]
500 emulation-faults [Software event]
501
502 L1-dcache-loads [Hardware cache event]
503 L1-dcache-load-misses [Hardware cache event]
504 L1-dcache-prefetch-misses [Hardware cache event]
505 L1-icache-loads [Hardware cache event]
506 L1-icache-load-misses [Hardware cache event]
507 .
508 .
509 .
510 rNNN [Raw hardware event descriptor]
511 cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
512 (see 'perf list --help' on how to encode it)
513
514 mem:&lt;addr&gt;[:access] [Hardware breakpoint]
515
516 sunrpc:rpc_call_status [Tracepoint event]
517 sunrpc:rpc_bind_status [Tracepoint event]
518 sunrpc:rpc_connect_status [Tracepoint event]
519 sunrpc:rpc_task_begin [Tracepoint event]
520 skb:kfree_skb [Tracepoint event]
521 skb:consume_skb [Tracepoint event]
522 skb:skb_copy_datagram_iovec [Tracepoint event]
523 net:net_dev_xmit [Tracepoint event]
524 net:net_dev_queue [Tracepoint event]
525 net:netif_receive_skb [Tracepoint event]
526 net:netif_rx [Tracepoint event]
527 napi:napi_poll [Tracepoint event]
528 sock:sock_rcvqueue_full [Tracepoint event]
529 sock:sock_exceed_buf_limit [Tracepoint event]
530 udp:udp_fail_queue_rcv_skb [Tracepoint event]
531 hda:hda_send_cmd [Tracepoint event]
532 hda:hda_get_response [Tracepoint event]
533 hda:hda_bus_reset [Tracepoint event]
534 scsi:scsi_dispatch_cmd_start [Tracepoint event]
535 scsi:scsi_dispatch_cmd_error [Tracepoint event]
536 scsi:scsi_eh_wakeup [Tracepoint event]
537 drm:drm_vblank_event [Tracepoint event]
538 drm:drm_vblank_event_queued [Tracepoint event]
539 drm:drm_vblank_event_delivered [Tracepoint event]
540 random:mix_pool_bytes [Tracepoint event]
541 random:mix_pool_bytes_nolock [Tracepoint event]
542 random:credit_entropy_bits [Tracepoint event]
543 gpio:gpio_direction [Tracepoint event]
544 gpio:gpio_value [Tracepoint event]
545 block:block_rq_abort [Tracepoint event]
546 block:block_rq_requeue [Tracepoint event]
547 block:block_rq_issue [Tracepoint event]
548 block:block_bio_bounce [Tracepoint event]
549 block:block_bio_complete [Tracepoint event]
550 block:block_bio_backmerge [Tracepoint event]
551 .
552 .
553 writeback:writeback_wake_thread [Tracepoint event]
554 writeback:writeback_wake_forker_thread [Tracepoint event]
555 writeback:writeback_bdi_register [Tracepoint event]
556 .
557 .
558 writeback:writeback_single_inode_requeue [Tracepoint event]
559 writeback:writeback_single_inode [Tracepoint event]
560 kmem:kmalloc [Tracepoint event]
561 kmem:kmem_cache_alloc [Tracepoint event]
562 kmem:mm_page_alloc [Tracepoint event]
563 kmem:mm_page_alloc_zone_locked [Tracepoint event]
564 kmem:mm_page_pcpu_drain [Tracepoint event]
565 kmem:mm_page_alloc_extfrag [Tracepoint event]
566 vmscan:mm_vmscan_kswapd_sleep [Tracepoint event]
567 vmscan:mm_vmscan_kswapd_wake [Tracepoint event]
568 vmscan:mm_vmscan_wakeup_kswapd [Tracepoint event]
569 vmscan:mm_vmscan_direct_reclaim_begin [Tracepoint event]
570 .
571 .
572 module:module_get [Tracepoint event]
573 module:module_put [Tracepoint event]
574 module:module_request [Tracepoint event]
575 sched:sched_kthread_stop [Tracepoint event]
576 sched:sched_wakeup [Tracepoint event]
577 sched:sched_wakeup_new [Tracepoint event]
578 sched:sched_process_fork [Tracepoint event]
579 sched:sched_process_exec [Tracepoint event]
580 sched:sched_stat_runtime [Tracepoint event]
581 rcu:rcu_utilization [Tracepoint event]
582 workqueue:workqueue_queue_work [Tracepoint event]
583 workqueue:workqueue_execute_end [Tracepoint event]
584 signal:signal_generate [Tracepoint event]
585 signal:signal_deliver [Tracepoint event]
586 timer:timer_init [Tracepoint event]
587 timer:timer_start [Tracepoint event]
588 timer:hrtimer_cancel [Tracepoint event]
589 timer:itimer_state [Tracepoint event]
590 timer:itimer_expire [Tracepoint event]
591 irq:irq_handler_entry [Tracepoint event]
592 irq:irq_handler_exit [Tracepoint event]
593 irq:softirq_entry [Tracepoint event]
594 irq:softirq_exit [Tracepoint event]
595 irq:softirq_raise [Tracepoint event]
596 printk:console [Tracepoint event]
597 task:task_newtask [Tracepoint event]
598 task:task_rename [Tracepoint event]
599 syscalls:sys_enter_socketcall [Tracepoint event]
600 syscalls:sys_exit_socketcall [Tracepoint event]
601 .
602 .
603 .
604 syscalls:sys_enter_unshare [Tracepoint event]
605 syscalls:sys_exit_unshare [Tracepoint event]
606 raw_syscalls:sys_enter [Tracepoint event]
607 raw_syscalls:sys_exit [Tracepoint event]
608 </literallayout>
609 </para>
610
611 <informalexample>
612 <emphasis>Tying it Together:</emphasis> These are exactly the same set of events defined
613 by the trace event subsystem and exposed by
614 ftrace/tracecmd/kernelshark as files in
615 /sys/kernel/debug/tracing/events, by SystemTap as
616 kernel.trace("tracepoint_name") and (partially) accessed by LTTng.
617 </informalexample>
618
619 <para>
620 Only a subset of these would be of interest to us when looking at
621 this workload, so let's choose the most likely subsystems
622 (identified by the string before the colon in the Tracepoint events)
623 and do a 'perf stat' run using only those wildcarded subsystems:
624 <literallayout class='monospaced'>
625 root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
626 Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':
627
628 23323 skb:kfree_skb
629 0 skb:consume_skb
630 49897 skb:skb_copy_datagram_iovec
631 6217 net:net_dev_xmit
632 6217 net:net_dev_queue
633 7962 net:netif_receive_skb
634 2 net:netif_rx
635 8340 napi:napi_poll
636 0 sched:sched_kthread_stop
637 0 sched:sched_kthread_stop_ret
638 3749 sched:sched_wakeup
639 0 sched:sched_wakeup_new
640 0 sched:sched_switch
641 29 sched:sched_migrate_task
642 0 sched:sched_process_free
643 1 sched:sched_process_exit
644 0 sched:sched_wait_task
645 0 sched:sched_process_wait
646 0 sched:sched_process_fork
647 1 sched:sched_process_exec
648 0 sched:sched_stat_wait
649 2106519415641 sched:sched_stat_sleep
650 0 sched:sched_stat_iowait
651 147453613 sched:sched_stat_blocked
652 12903026955 sched:sched_stat_runtime
653 0 sched:sched_pi_setprio
654 3574 workqueue:workqueue_queue_work
655 3574 workqueue:workqueue_activate_work
656 0 workqueue:workqueue_execute_start
657 0 workqueue:workqueue_execute_end
658 16631 irq:irq_handler_entry
659 16631 irq:irq_handler_exit
660 28521 irq:softirq_entry
661 28521 irq:softirq_exit
662 28728 irq:softirq_raise
663 1 syscalls:sys_enter_sendmmsg
664 1 syscalls:sys_exit_sendmmsg
665 0 syscalls:sys_enter_recvmmsg
666 0 syscalls:sys_exit_recvmmsg
667 14 syscalls:sys_enter_socketcall
668 14 syscalls:sys_exit_socketcall
669 .
670 .
671 .
672 16965 syscalls:sys_enter_read
673 16965 syscalls:sys_exit_read
674 12854 syscalls:sys_enter_write
675 12854 syscalls:sys_exit_write
676 .
677 .
678 .
679
680 58.029710972 seconds time elapsed
681 </literallayout>
682 Let's pick one of these tracepoints and tell perf to do a profile
683 using it as the sampling event:
684 <literallayout class='monospaced'>
685 root@crownbay:~# perf record -g -e sched:sched_wakeup wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
686 </literallayout>
687 </para>
688
689 <para>
690 <imagedata fileref="figures/sched-wakeup-profile.png" width="6in" depth="7in" align="center" scalefit="1" />
691 </para>
692
693 <para>
694 The screenshot above shows the results of running a profile using
695 sched:sched_switch tracepoint, which shows the relative costs of
696 various paths to sched_wakeup (note that sched_wakeup is the
697 name of the tracepoint - it's actually defined just inside
698 ttwu_do_wakeup(), which accounts for the function name actually
699 displayed in the profile:
700 <literallayout class='monospaced'>
701 /*
702 * Mark the task runnable and perform wakeup-preemption.
703 */
704 static void
705 ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
706 {
707 trace_sched_wakeup(p, true);
708 .
709 .
710 .
711 }
712 </literallayout>
713 A couple of the more interesting callchains are expanded and
714 displayed above, basically some network receive paths that
715 presumably end up waking up wget (busybox) when network data is
716 ready.
717 </para>
718
719 <para>
720 Note that because tracepoints are normally used for tracing,
721 the default sampling period for tracepoints is 1 i.e. for
722 tracepoints perf will sample on every event occurrence (this
723 can be changed using the -c option). This is in contrast to
724 hardware counters such as for example the default 'cycles'
725 hardware counter used for normal profiling, where sampling
726 periods are much higher (in the thousands) because profiling should
727 have as low an overhead as possible and sampling on every cycle
728 would be prohibitively expensive.
729 </para>
730 </section>
731
732 <section id='using-perf-to-do-basic-tracing'>
733 <title>Using perf to do Basic Tracing</title>
734
735 <para>
736 Profiling is a great tool for solving many problems or for
737 getting a high-level view of what's going on with a workload or
738 across the system. It is however by definition an approximation,
739 as suggested by the most prominent word associated with it,
740 'sampling'. On the one hand, it allows a representative picture of
741 what's going on in the system to be cheaply taken, but on the other
742 hand, that cheapness limits its utility when that data suggests a
743 need to 'dive down' more deeply to discover what's really going
744 on. In such cases, the only way to see what's really going on is
745 to be able to look at (or summarize more intelligently) the
746 individual steps that go into the higher-level behavior exposed
747 by the coarse-grained profiling data.
748 </para>
749
750 <para>
751 As a concrete example, we can trace all the events we think might
752 be applicable to our workload:
753 <literallayout class='monospaced'>
754 root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:*
755 -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write
756 wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
757 </literallayout>
758 We can look at the raw trace output using 'perf script' with no
759 arguments:
760 <literallayout class='monospaced'>
761 root@crownbay:~# perf script
762
763 perf 1262 [000] 11624.857082: sys_exit_read: 0x0
764 perf 1262 [000] 11624.857193: sched_wakeup: comm=migration/0 pid=6 prio=0 success=1 target_cpu=000
765 wget 1262 [001] 11624.858021: softirq_raise: vec=1 [action=TIMER]
766 wget 1262 [001] 11624.858074: softirq_entry: vec=1 [action=TIMER]
767 wget 1262 [001] 11624.858081: softirq_exit: vec=1 [action=TIMER]
768 wget 1262 [001] 11624.858166: sys_enter_read: fd: 0x0003, buf: 0xbf82c940, count: 0x0200
769 wget 1262 [001] 11624.858177: sys_exit_read: 0x200
770 wget 1262 [001] 11624.858878: kfree_skb: skbaddr=0xeb248d80 protocol=0 location=0xc15a5308
771 wget 1262 [001] 11624.858945: kfree_skb: skbaddr=0xeb248000 protocol=0 location=0xc15a5308
772 wget 1262 [001] 11624.859020: softirq_raise: vec=1 [action=TIMER]
773 wget 1262 [001] 11624.859076: softirq_entry: vec=1 [action=TIMER]
774 wget 1262 [001] 11624.859083: softirq_exit: vec=1 [action=TIMER]
775 wget 1262 [001] 11624.859167: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
776 wget 1262 [001] 11624.859192: sys_exit_read: 0x1d7
777 wget 1262 [001] 11624.859228: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
778 wget 1262 [001] 11624.859233: sys_exit_read: 0x0
779 wget 1262 [001] 11624.859573: sys_enter_read: fd: 0x0003, buf: 0xbf82c580, count: 0x0200
780 wget 1262 [001] 11624.859584: sys_exit_read: 0x200
781 wget 1262 [001] 11624.859864: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
782 wget 1262 [001] 11624.859888: sys_exit_read: 0x400
783 wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
784 wget 1262 [001] 11624.859944: sys_exit_read: 0x400
785 </literallayout>
786 This gives us a detailed timestamped sequence of events that
787 occurred within the workload with respect to those events.
788 </para>
789
790 <para>
791 In many ways, profiling can be viewed as a subset of tracing -
792 theoretically, if you have a set of trace events that's sufficient
793 to capture all the important aspects of a workload, you can derive
794 any of the results or views that a profiling run can.
795 </para>
796
797 <para>
798 Another aspect of traditional profiling is that while powerful in
799 many ways, it's limited by the granularity of the underlying data.
800 Profiling tools offer various ways of sorting and presenting the
801 sample data, which make it much more useful and amenable to user
802 experimentation, but in the end it can't be used in an open-ended
803 way to extract data that just isn't present as a consequence of
804 the fact that conceptually, most of it has been thrown away.
805 </para>
806
807 <para>
808 Full-blown detailed tracing data does however offer the opportunity
809 to manipulate and present the information collected during a
810 tracing run in an infinite variety of ways.
811 </para>
812
813 <para>
814 Another way to look at it is that there are only so many ways that
815 the 'primitive' counters can be used on their own to generate
816 interesting output; to get anything more complicated than simple
817 counts requires some amount of additional logic, which is typically
818 very specific to the problem at hand. For example, if we wanted to
819 make use of a 'counter' that maps to the value of the time
820 difference between when a process was scheduled to run on a
821 processor and the time it actually ran, we wouldn't expect such
822 a counter to exist on its own, but we could derive one called say
823 'wakeup_latency' and use it to extract a useful view of that metric
824 from trace data. Likewise, we really can't figure out from standard
825 profiling tools how much data every process on the system reads and
826 writes, along with how many of those reads and writes fail
827 completely. If we have sufficient trace data, however, we could
828 with the right tools easily extract and present that information,
829 but we'd need something other than pre-canned profiling tools to
830 do that.
831 </para>
832
833 <para>
834 Luckily, there is a general-purpose way to handle such needs,
835 called 'programming languages'. Making programming languages
836 easily available to apply to such problems given the specific
837 format of data is called a 'programming language binding' for
838 that data and language. Perf supports two programming language
839 bindings, one for Python and one for Perl.
840 </para>
841
842 <informalexample>
843 <emphasis>Tying it Together:</emphasis> Language bindings for manipulating and
844 aggregating trace data are of course not a new
845 idea. One of the first projects to do this was IBM's DProbes
846 dpcc compiler, an ANSI C compiler which targeted a low-level
847 assembly language running on an in-kernel interpreter on the
848 target system. This is exactly analogous to what Sun's DTrace
849 did, except that DTrace invented its own language for the purpose.
850 Systemtap, heavily inspired by DTrace, also created its own
851 one-off language, but rather than running the product on an
852 in-kernel interpreter, created an elaborate compiler-based
853 machinery to translate its language into kernel modules written
854 in C.
855 </informalexample>
856
857 <para>
858 Now that we have the trace data in perf.data, we can use
859 'perf script -g' to generate a skeleton script with handlers
860 for the read/write entry/exit events we recorded:
861 <literallayout class='monospaced'>
862 root@crownbay:~# perf script -g python
863 generated Python script: perf-script.py
864 </literallayout>
865 The skeleton script simply creates a python function for each
866 event type in the perf.data file. The body of each function simply
867 prints the event name along with its parameters. For example:
868 <literallayout class='monospaced'>
869 def net__netif_rx(event_name, context, common_cpu,
870 common_secs, common_nsecs, common_pid, common_comm,
871 skbaddr, len, name):
872 print_header(event_name, common_cpu, common_secs, common_nsecs,
873 common_pid, common_comm)
874
875 print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name),
876 </literallayout>
877 We can run that script directly to print all of the events
878 contained in the perf.data file:
879 <literallayout class='monospaced'>
880 root@crownbay:~# perf script -s perf-script.py
881
882 in trace_begin
883 syscalls__sys_exit_read 0 11624.857082795 1262 perf nr=3, ret=0
884 sched__sched_wakeup 0 11624.857193498 1262 perf comm=migration/0, pid=6, prio=0, success=1, target_cpu=0
885 irq__softirq_raise 1 11624.858021635 1262 wget vec=TIMER
886 irq__softirq_entry 1 11624.858074075 1262 wget vec=TIMER
887 irq__softirq_exit 1 11624.858081389 1262 wget vec=TIMER
888 syscalls__sys_enter_read 1 11624.858166434 1262 wget nr=3, fd=3, buf=3213019456, count=512
889 syscalls__sys_exit_read 1 11624.858177924 1262 wget nr=3, ret=512
890 skb__kfree_skb 1 11624.858878188 1262 wget skbaddr=3945041280, location=3243922184, protocol=0
891 skb__kfree_skb 1 11624.858945608 1262 wget skbaddr=3945037824, location=3243922184, protocol=0
892 irq__softirq_raise 1 11624.859020942 1262 wget vec=TIMER
893 irq__softirq_entry 1 11624.859076935 1262 wget vec=TIMER
894 irq__softirq_exit 1 11624.859083469 1262 wget vec=TIMER
895 syscalls__sys_enter_read 1 11624.859167565 1262 wget nr=3, fd=3, buf=3077701632, count=1024
896 syscalls__sys_exit_read 1 11624.859192533 1262 wget nr=3, ret=471
897 syscalls__sys_enter_read 1 11624.859228072 1262 wget nr=3, fd=3, buf=3077701632, count=1024
898 syscalls__sys_exit_read 1 11624.859233707 1262 wget nr=3, ret=0
899 syscalls__sys_enter_read 1 11624.859573008 1262 wget nr=3, fd=3, buf=3213018496, count=512
900 syscalls__sys_exit_read 1 11624.859584818 1262 wget nr=3, ret=512
901 syscalls__sys_enter_read 1 11624.859864562 1262 wget nr=3, fd=3, buf=3077701632, count=1024
902 syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024
903 syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024
904 syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024
905 </literallayout>
906 That in itself isn't very useful; after all, we can accomplish
907 pretty much the same thing by simply running 'perf script'
908 without arguments in the same directory as the perf.data file.
909 </para>
910
911 <para>
912 We can however replace the print statements in the generated
913 function bodies with whatever we want, and thereby make it
914 infinitely more useful.
915 </para>
916
917 <para>
918 As a simple example, let's just replace the print statements in
919 the function bodies with a simple function that does nothing but
920 increment a per-event count. When the program is run against a
921 perf.data file, each time a particular event is encountered,
922 a tally is incremented for that event. For example:
923 <literallayout class='monospaced'>
924 def net__netif_rx(event_name, context, common_cpu,
925 common_secs, common_nsecs, common_pid, common_comm,
926 skbaddr, len, name):
927 inc_counts(event_name)
928 </literallayout>
929 Each event handler function in the generated code is modified
930 to do this. For convenience, we define a common function called
931 inc_counts() that each handler calls; inc_counts() simply tallies
932 a count for each event using the 'counts' hash, which is a
933 specialized hash function that does Perl-like autovivification, a
934 capability that's extremely useful for kinds of multi-level
935 aggregation commonly used in processing traces (see perf's
936 documentation on the Python language binding for details):
937 <literallayout class='monospaced'>
938 counts = autodict()
939
940 def inc_counts(event_name):
941 try:
942 counts[event_name] += 1
943 except TypeError:
944 counts[event_name] = 1
945 </literallayout>
946 Finally, at the end of the trace processing run, we want to
947 print the result of all the per-event tallies. For that, we
948 use the special 'trace_end()' function:
949 <literallayout class='monospaced'>
950 def trace_end():
951 for event_name, count in counts.iteritems():
952 print "%-40s %10s\n" % (event_name, count)
953 </literallayout>
954 The end result is a summary of all the events recorded in the
955 trace:
956 <literallayout class='monospaced'>
957 skb__skb_copy_datagram_iovec 13148
958 irq__softirq_entry 4796
959 irq__irq_handler_exit 3805
960 irq__softirq_exit 4795
961 syscalls__sys_enter_write 8990
962 net__net_dev_xmit 652
963 skb__kfree_skb 4047
964 sched__sched_wakeup 1155
965 irq__irq_handler_entry 3804
966 irq__softirq_raise 4799
967 net__net_dev_queue 652
968 syscalls__sys_enter_read 17599
969 net__netif_receive_skb 1743
970 syscalls__sys_exit_read 17598
971 net__netif_rx 2
972 napi__napi_poll 1877
973 syscalls__sys_exit_write 8990
974 </literallayout>
975 Note that this is pretty much exactly the same information we get
976 from 'perf stat', which goes a little way to support the idea
977 mentioned previously that given the right kind of trace data,
978 higher-level profiling-type summaries can be derived from it.
979 </para>
980
981 <para>
982 Documentation on using the
983 <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>.
984 </para>
985 </section>
986
987 <section id='system-wide-tracing-and-profiling'>
988 <title>System-Wide Tracing and Profiling</title>
989
990 <para>
991 The examples so far have focused on tracing a particular program or
992 workload - in other words, every profiling run has specified the
993 program to profile in the command-line e.g. 'perf record wget ...'.
994 </para>
995
996 <para>
997 It's also possible, and more interesting in many cases, to run a
998 system-wide profile or trace while running the workload in a
999 separate shell.
1000 </para>
1001
1002 <para>
1003 To do system-wide profiling or tracing, you typically use
1004 the -a flag to 'perf record'.
1005 </para>
1006
1007 <para>
1008 To demonstrate this, open up one window and start the profile
1009 using the -a flag (press Ctrl-C to stop tracing):
1010 <literallayout class='monospaced'>
1011 root@crownbay:~# perf record -g -a
1012 ^C[ perf record: Woken up 6 times to write data ]
1013 [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ]
1014 </literallayout>
1015 In another window, run the wget test:
1016 <literallayout class='monospaced'>
1017 root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
1018 Connecting to downloads.yoctoproject.org (140.211.169.59:80)
1019 linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA
1020 </literallayout>
1021 Here we see entries not only for our wget load, but for other
1022 processes running on the system as well:
1023 </para>
1024
1025 <para>
1026 <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" />
1027 </para>
1028
1029 <para>
1030 In the snapshot above, we can see callchains that originate in
1031 libc, and a callchain from Xorg that demonstrates that we're
1032 using a proprietary X driver in userspace (notice the presence
1033 of 'PVR' and some other unresolvable symbols in the expanded
1034 Xorg callchain).
1035 </para>
1036
1037 <para>
1038 Note also that we have both kernel and userspace entries in the
1039 above snapshot. We can also tell perf to focus on userspace but
1040 providing a modifier, in this case 'u', to the 'cycles' hardware
1041 counter when we record a profile:
1042 <literallayout class='monospaced'>
1043 root@crownbay:~# perf record -g -a -e cycles:u
1044 ^C[ perf record: Woken up 2 times to write data ]
1045 [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ]
1046 </literallayout>
1047 </para>
1048
1049 <para>
1050 <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" />
1051 </para>
1052
1053 <para>
1054 Notice in the screenshot above, we see only userspace entries ([.])
1055 </para>
1056
1057 <para>
1058 Finally, we can press 'enter' on a leaf node and select the 'Zoom
1059 into DSO' menu item to show only entries associated with a
1060 specific DSO. In the screenshot below, we've zoomed into the
1061 'libc' DSO which shows all the entries associated with the
1062 libc-xxx.so DSO.
1063 </para>
1064
1065 <para>
1066 <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" />
1067 </para>
1068
1069 <para>
1070 We can also use the system-wide -a switch to do system-wide
1071 tracing. Here we'll trace a couple of scheduler events:
1072 <literallayout class='monospaced'>
1073 root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup
1074 ^C[ perf record: Woken up 38 times to write data ]
1075 [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ]
1076 </literallayout>
1077 We can look at the raw output using 'perf script' with no
1078 arguments:
1079 <literallayout class='monospaced'>
1080 root@crownbay:~# perf script
1081
1082 perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1083 perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
1084 kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120
1085 swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000
1086 swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
1087 kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
1088 perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1089 perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
1090 kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120
1091 perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1092 </literallayout>
1093 </para>
1094
1095 <section id='perf-filtering'>
1096 <title>Filtering</title>
1097
1098 <para>
1099 Notice that there are a lot of events that don't really have
1100 anything to do with what we're interested in, namely events
1101 that schedule 'perf' itself in and out or that wake perf up.
1102 We can get rid of those by using the '--filter' option -
1103 for each event we specify using -e, we can add a --filter
1104 after that to filter out trace events that contain fields
1105 with specific values:
1106 <literallayout class='monospaced'>
1107 root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf &amp;&amp; prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf'
1108 ^C[ perf record: Woken up 38 times to write data ]
1109 [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ]
1110
1111
1112 root@crownbay:~# perf script
1113
1114 swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
1115 kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
1116 perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1117 perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1118 perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1119 perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1120 perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1121 perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1122 swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
1123 swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
1124 kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
1125 swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000
1126 swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
1127 kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
1128 </literallayout>
1129 In this case, we've filtered out all events that have 'perf'
1130 in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice
1131 that there are still events recorded for perf, but notice
1132 that those events don't have values of 'perf' for the filtered
1133 fields. To completely filter out anything from perf will
1134 require a bit more work, but for the purpose of demonstrating
1135 how to use filters, it's close enough.
1136 </para>
1137
1138 <informalexample>
1139 <emphasis>Tying it Together:</emphasis> These are exactly the same set of event
1140 filters defined by the trace event subsystem. See the
1141 ftrace/tracecmd/kernelshark section for more discussion about
1142 these event filters.
1143 </informalexample>
1144
1145 <informalexample>
1146 <emphasis>Tying it Together:</emphasis> These event filters are implemented by a
1147 special-purpose pseudo-interpreter in the kernel and are an
1148 integral and indispensable part of the perf design as it
1149 relates to tracing. kernel-based event filters provide a
1150 mechanism to precisely throttle the event stream that appears
1151 in user space, where it makes sense to provide bindings to real
1152 programming languages for postprocessing the event stream.
1153 This architecture allows for the intelligent and flexible
1154 partitioning of processing between the kernel and user space.
1155 Contrast this with other tools such as SystemTap, which does
1156 all of its processing in the kernel and as such requires a
1157 special project-defined language in order to accommodate that
1158 design, or LTTng, where everything is sent to userspace and
1159 as such requires a super-efficient kernel-to-userspace
1160 transport mechanism in order to function properly. While
1161 perf certainly can benefit from for instance advances in
1162 the design of the transport, it doesn't fundamentally depend
1163 on them. Basically, if you find that your perf tracing
1164 application is causing buffer I/O overruns, it probably
1165 means that you aren't taking enough advantage of the
1166 kernel filtering engine.
1167 </informalexample>
1168 </section>
1169 </section>
1170
1171 <section id='using-dynamic-tracepoints'>
1172 <title>Using Dynamic Tracepoints</title>
1173
1174 <para>
1175 perf isn't restricted to the fixed set of static tracepoints
1176 listed by 'perf list'. Users can also add their own 'dynamic'
1177 tracepoints anywhere in the kernel. For instance, suppose we
1178 want to define our own tracepoint on do_fork(). We can do that
1179 using the 'perf probe' perf subcommand:
1180 <literallayout class='monospaced'>
1181 root@crownbay:~# perf probe do_fork
1182 Added new event:
1183 probe:do_fork (on do_fork)
1184
1185 You can now use it in all perf tools, such as:
1186
1187 perf record -e probe:do_fork -aR sleep 1
1188 </literallayout>
1189 Adding a new tracepoint via 'perf probe' results in an event
1190 with all the expected files and format in
1191 /sys/kernel/debug/tracing/events, just the same as for static
1192 tracepoints (as discussed in more detail in the trace events
1193 subsystem section:
1194 <literallayout class='monospaced'>
1195 root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al
1196 drwxr-xr-x 2 root root 0 Oct 28 11:42 .
1197 drwxr-xr-x 3 root root 0 Oct 28 11:42 ..
1198 -rw-r--r-- 1 root root 0 Oct 28 11:42 enable
1199 -rw-r--r-- 1 root root 0 Oct 28 11:42 filter
1200 -r--r--r-- 1 root root 0 Oct 28 11:42 format
1201 -r--r--r-- 1 root root 0 Oct 28 11:42 id
1202
1203 root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format
1204 name: do_fork
1205 ID: 944
1206 format:
1207 field:unsigned short common_type; offset:0; size:2; signed:0;
1208 field:unsigned char common_flags; offset:2; size:1; signed:0;
1209 field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
1210 field:int common_pid; offset:4; size:4; signed:1;
1211 field:int common_padding; offset:8; size:4; signed:1;
1212
1213 field:unsigned long __probe_ip; offset:12; size:4; signed:0;
1214
1215 print fmt: "(%lx)", REC->__probe_ip
1216 </literallayout>
1217 We can list all dynamic tracepoints currently in existence:
1218 <literallayout class='monospaced'>
1219 root@crownbay:~# perf probe -l
1220 probe:do_fork (on do_fork)
1221 probe:schedule (on schedule)
1222 </literallayout>
1223 Let's record system-wide ('sleep 30' is a trick for recording
1224 system-wide but basically do nothing and then wake up after
1225 30 seconds):
1226 <literallayout class='monospaced'>
1227 root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30
1228 [ perf record: Woken up 1 times to write data ]
1229 [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ]
1230 </literallayout>
1231 Using 'perf script' we can see each do_fork event that fired:
1232 <literallayout class='monospaced'>
1233 root@crownbay:~# perf script
1234
1235 # ========
1236 # captured on: Sun Oct 28 11:55:18 2012
1237 # hostname : crownbay
1238 # os release : 3.4.11-yocto-standard
1239 # perf version : 3.4.11
1240 # arch : i686
1241 # nrcpus online : 2
1242 # nrcpus avail : 2
1243 # cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz
1244 # cpuid : GenuineIntel,6,38,1
1245 # total memory : 1017184 kB
1246 # cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30
1247 # event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern
1248 = 0, id = { 5, 6 }
1249 # HEADER_CPU_TOPOLOGY info available, use -I to display
1250 # ========
1251 #
1252 matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460)
1253 matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460)
1254 pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460)
1255 pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460)
1256 matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460)
1257 matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460)
1258 gthumb 1300 [001] 34217.697451: do_fork: (c1028460)
1259 gthumb 1300 [001] 34219.085734: do_fork: (c1028460)
1260 gthumb 1300 [000] 34219.121351: do_fork: (c1028460)
1261 gthumb 1300 [001] 34219.264551: do_fork: (c1028460)
1262 pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460)
1263 matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460)
1264 matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460)
1265 matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460)
1266 matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460)
1267 matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460)
1268 matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460)
1269 matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460)
1270 gaku 1312 [000] 34237.202388: do_fork: (c1028460)
1271 </literallayout>
1272 And using 'perf report' on the same file, we can see the
1273 callgraphs from starting a few programs during those 30 seconds:
1274 </para>
1275
1276 <para>
1277 <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" />
1278 </para>
1279
1280 <informalexample>
1281 <emphasis>Tying it Together:</emphasis> The trace events subsystem accommodate static
1282 and dynamic tracepoints in exactly the same way - there's no
1283 difference as far as the infrastructure is concerned. See the
1284 ftrace section for more details on the trace event subsystem.
1285 </informalexample>
1286
1287 <informalexample>
1288 <emphasis>Tying it Together:</emphasis> Dynamic tracepoints are implemented under the
1289 covers by kprobes and uprobes. kprobes and uprobes are also used
1290 by and in fact are the main focus of SystemTap.
1291 </informalexample>
1292 </section>
1293 </section>
1294
1295 <section id='perf-documentation'>
1296 <title>Documentation</title>
1297
1298 <para>
1299 Online versions of the man pages for the commands discussed in this
1300 section can be found here:
1301 <itemizedlist>
1302 <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-stat'>'perf stat' manpage</ulink>.
1303 </para></listitem>
1304 <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-record'>'perf record' manpage</ulink>.
1305 </para></listitem>
1306 <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-report'>'perf report' manpage</ulink>.
1307 </para></listitem>
1308 <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-probe'>'perf probe' manpage</ulink>.
1309 </para></listitem>
1310 <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-script'>'perf script' manpage</ulink>.
1311 </para></listitem>
1312 <listitem><para>Documentation on using the
1313 <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>.
1314 </para></listitem>
1315 <listitem><para>The top-level
1316 <ulink url='http://linux.die.net/man/1/perf'>perf(1) manpage</ulink>.
1317 </para></listitem>
1318 </itemizedlist>
1319 </para>
1320
1321 <para>
1322 Normally, you should be able to invoke the man pages via perf
1323 itself e.g. 'perf help' or 'perf help record'.
1324 </para>
1325
1326 <para>
1327 However, by default Yocto doesn't install man pages, but perf
1328 invokes the man pages for most help functionality. This is a bug
1329 and is being addressed by a Yocto bug:
1330 <ulink url='https://bugzilla.yoctoproject.org/show_bug.cgi?id=3388'>Bug 3388 - perf: enable man pages for basic 'help' functionality</ulink>.
1331 </para>
1332
1333 <para>
1334 The man pages in text form, along with some other files, such as
1335 a set of examples, can be found in the 'perf' directory of the
1336 kernel tree:
1337 <literallayout class='monospaced'>
1338 tools/perf/Documentation
1339 </literallayout>
1340 There's also a nice perf tutorial on the perf wiki that goes
1341 into more detail than we do here in certain areas:
1342 <ulink url='https://perf.wiki.kernel.org/index.php/Tutorial'>Perf Tutorial</ulink>
1343 </para>
1344 </section>
1345</section>
1346
1347<section id='profile-manual-ftrace'>
1348 <title>ftrace</title>
1349
1350 <para>
1351 'ftrace' literally refers to the 'ftrace function tracer' but in
1352 reality this encompasses a number of related tracers along with
1353 the infrastructure that they all make use of.
1354 </para>
1355
1356 <section id='ftrace-setup'>
1357 <title>Setup</title>
1358
1359 <para>
1360 For this section, we'll assume you've already performed the basic
1361 setup outlined in the General Setup section.
1362 </para>
1363
1364 <para>
1365 ftrace, trace-cmd, and kernelshark run on the target system,
1366 and are ready to go out-of-the-box - no additional setup is
1367 necessary. For the rest of this section we assume you've ssh'ed
1368 to the host and will be running ftrace on the target. kernelshark
1369 is a GUI application and if you use the '-X' option to ssh you
1370 can have the kernelshark GUI run on the target but display
1371 remotely on the host if you want.
1372 </para>
1373 </section>
1374
1375 <section id='basic-ftrace-usage'>
1376 <title>Basic ftrace usage</title>
1377
1378 <para>
1379 'ftrace' essentially refers to everything included in
1380 the /tracing directory of the mounted debugfs filesystem
1381 (Yocto follows the standard convention and mounts it
1382 at /sys/kernel/debug). Here's a listing of all the files
1383 found in /sys/kernel/debug/tracing on a Yocto system:
1384 <literallayout class='monospaced'>
1385 root@sugarbay:/sys/kernel/debug/tracing# ls
1386 README kprobe_events trace
1387 available_events kprobe_profile trace_clock
1388 available_filter_functions options trace_marker
1389 available_tracers per_cpu trace_options
1390 buffer_size_kb printk_formats trace_pipe
1391 buffer_total_size_kb saved_cmdlines tracing_cpumask
1392 current_tracer set_event tracing_enabled
1393 dyn_ftrace_total_info set_ftrace_filter tracing_on
1394 enabled_functions set_ftrace_notrace tracing_thresh
1395 events set_ftrace_pid
1396 free_buffer set_graph_function
1397 </literallayout>
1398 The files listed above are used for various purposes -
1399 some relate directly to the tracers themselves, others are
1400 used to set tracing options, and yet others actually contain
1401 the tracing output when a tracer is in effect. Some of the
1402 functions can be guessed from their names, others need
1403 explanation; in any case, we'll cover some of the files we
1404 see here below but for an explanation of the others, please
1405 see the ftrace documentation.
1406 </para>
1407
1408 <para>
1409 We'll start by looking at some of the available built-in
1410 tracers.
1411 </para>
1412
1413 <para>
1414 cat'ing the 'available_tracers' file lists the set of
1415 available tracers:
1416 <literallayout class='monospaced'>
1417 root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers
1418 blk function_graph function nop
1419 </literallayout>
1420 The 'current_tracer' file contains the tracer currently in
1421 effect:
1422 <literallayout class='monospaced'>
1423 root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer
1424 nop
1425 </literallayout>
1426 The above listing of current_tracer shows that
1427 the 'nop' tracer is in effect, which is just another
1428 way of saying that there's actually no tracer
1429 currently in effect.
1430 </para>
1431
1432 <para>
1433 echo'ing one of the available_tracers into current_tracer
1434 makes the specified tracer the current tracer:
1435 <literallayout class='monospaced'>
1436 root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer
1437 root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer
1438 function
1439 </literallayout>
1440 The above sets the current tracer to be the
1441 'function tracer'. This tracer traces every function
1442 call in the kernel and makes it available as the
1443 contents of the 'trace' file. Reading the 'trace' file
1444 lists the currently buffered function calls that have been
1445 traced by the function tracer:
1446 <literallayout class='monospaced'>
1447 root@sugarbay:/sys/kernel/debug/tracing# cat trace | less
1448
1449 # tracer: function
1450 #
1451 # entries-in-buffer/entries-written: 310629/766471 #P:8
1452 #
1453 # _-----=&gt; irqs-off
1454 # / _----=&gt; need-resched
1455 # | / _---=&gt; hardirq/softirq
1456 # || / _--=&gt; preempt-depth
1457 # ||| / delay
1458 # TASK-PID CPU# |||| TIMESTAMP FUNCTION
1459 # | | | |||| | |
1460 &lt;idle&gt;-0 [004] d..1 470.867169: ktime_get_real &lt;-intel_idle
1461 &lt;idle&gt;-0 [004] d..1 470.867170: getnstimeofday &lt;-ktime_get_real
1462 &lt;idle&gt;-0 [004] d..1 470.867171: ns_to_timeval &lt;-intel_idle
1463 &lt;idle&gt;-0 [004] d..1 470.867171: ns_to_timespec &lt;-ns_to_timeval
1464 &lt;idle&gt;-0 [004] d..1 470.867172: smp_apic_timer_interrupt &lt;-apic_timer_interrupt
1465 &lt;idle&gt;-0 [004] d..1 470.867172: native_apic_mem_write &lt;-smp_apic_timer_interrupt
1466 &lt;idle&gt;-0 [004] d..1 470.867172: irq_enter &lt;-smp_apic_timer_interrupt
1467 &lt;idle&gt;-0 [004] d..1 470.867172: rcu_irq_enter &lt;-irq_enter
1468 &lt;idle&gt;-0 [004] d..1 470.867173: rcu_idle_exit_common.isra.33 &lt;-rcu_irq_enter
1469 &lt;idle&gt;-0 [004] d..1 470.867173: local_bh_disable &lt;-irq_enter
1470 &lt;idle&gt;-0 [004] d..1 470.867173: add_preempt_count &lt;-local_bh_disable
1471 &lt;idle&gt;-0 [004] d.s1 470.867174: tick_check_idle &lt;-irq_enter
1472 &lt;idle&gt;-0 [004] d.s1 470.867174: tick_check_oneshot_broadcast &lt;-tick_check_idle
1473 &lt;idle&gt;-0 [004] d.s1 470.867174: ktime_get &lt;-tick_check_idle
1474 &lt;idle&gt;-0 [004] d.s1 470.867174: tick_nohz_stop_idle &lt;-tick_check_idle
1475 &lt;idle&gt;-0 [004] d.s1 470.867175: update_ts_time_stats &lt;-tick_nohz_stop_idle
1476 &lt;idle&gt;-0 [004] d.s1 470.867175: nr_iowait_cpu &lt;-update_ts_time_stats
1477 &lt;idle&gt;-0 [004] d.s1 470.867175: tick_do_update_jiffies64 &lt;-tick_check_idle
1478 &lt;idle&gt;-0 [004] d.s1 470.867175: _raw_spin_lock &lt;-tick_do_update_jiffies64
1479 &lt;idle&gt;-0 [004] d.s1 470.867176: add_preempt_count &lt;-_raw_spin_lock
1480 &lt;idle&gt;-0 [004] d.s2 470.867176: do_timer &lt;-tick_do_update_jiffies64
1481 &lt;idle&gt;-0 [004] d.s2 470.867176: _raw_spin_lock &lt;-do_timer
1482 &lt;idle&gt;-0 [004] d.s2 470.867176: add_preempt_count &lt;-_raw_spin_lock
1483 &lt;idle&gt;-0 [004] d.s3 470.867177: ntp_tick_length &lt;-do_timer
1484 &lt;idle&gt;-0 [004] d.s3 470.867177: _raw_spin_lock_irqsave &lt;-ntp_tick_length
1485 .
1486 .
1487 .
1488 </literallayout>
1489 Each line in the trace above shows what was happening in
1490 the kernel on a given cpu, to the level of detail of
1491 function calls. Each entry shows the function called,
1492 followed by its caller (after the arrow).
1493 </para>
1494
1495 <para>
1496 The function tracer gives you an extremely detailed idea
1497 of what the kernel was doing at the point in time the trace
1498 was taken, and is a great way to learn about how the kernel
1499 code works in a dynamic sense.
1500 </para>
1501
1502 <informalexample>
1503 <emphasis>Tying it Together:</emphasis> The ftrace function tracer is also
1504 available from within perf, as the ftrace:function tracepoint.
1505 </informalexample>
1506
1507 <para>
1508 It is a little more difficult to follow the call chains than
1509 it needs to be - luckily there's a variant of the function
1510 tracer that displays the callchains explicitly, called the
1511 'function_graph' tracer:
1512 <literallayout class='monospaced'>
1513 root@sugarbay:/sys/kernel/debug/tracing# echo function_graph &gt; current_tracer
1514 root@sugarbay:/sys/kernel/debug/tracing# cat trace | less
1515
1516 tracer: function_graph
1517
1518 CPU DURATION FUNCTION CALLS
1519 | | | | | | |
1520 7) 0.046 us | pick_next_task_fair();
1521 7) 0.043 us | pick_next_task_stop();
1522 7) 0.042 us | pick_next_task_rt();
1523 7) 0.032 us | pick_next_task_fair();
1524 7) 0.030 us | pick_next_task_idle();
1525 7) | _raw_spin_unlock_irq() {
1526 7) 0.033 us | sub_preempt_count();
1527 7) 0.258 us | }
1528 7) 0.032 us | sub_preempt_count();
1529 7) + 13.341 us | } /* __schedule */
1530 7) 0.095 us | } /* sub_preempt_count */
1531 7) | schedule() {
1532 7) | __schedule() {
1533 7) 0.060 us | add_preempt_count();
1534 7) 0.044 us | rcu_note_context_switch();
1535 7) | _raw_spin_lock_irq() {
1536 7) 0.033 us | add_preempt_count();
1537 7) 0.247 us | }
1538 7) | idle_balance() {
1539 7) | _raw_spin_unlock() {
1540 7) 0.031 us | sub_preempt_count();
1541 7) 0.246 us | }
1542 7) | update_shares() {
1543 7) 0.030 us | __rcu_read_lock();
1544 7) 0.029 us | __rcu_read_unlock();
1545 7) 0.484 us | }
1546 7) 0.030 us | __rcu_read_lock();
1547 7) | load_balance() {
1548 7) | find_busiest_group() {
1549 7) 0.031 us | idle_cpu();
1550 7) 0.029 us | idle_cpu();
1551 7) 0.035 us | idle_cpu();
1552 7) 0.906 us | }
1553 7) 1.141 us | }
1554 7) 0.022 us | msecs_to_jiffies();
1555 7) | load_balance() {
1556 7) | find_busiest_group() {
1557 7) 0.031 us | idle_cpu();
1558 .
1559 .
1560 .
1561 4) 0.062 us | msecs_to_jiffies();
1562 4) 0.062 us | __rcu_read_unlock();
1563 4) | _raw_spin_lock() {
1564 4) 0.073 us | add_preempt_count();
1565 4) 0.562 us | }
1566 4) + 17.452 us | }
1567 4) 0.108 us | put_prev_task_fair();
1568 4) 0.102 us | pick_next_task_fair();
1569 4) 0.084 us | pick_next_task_stop();
1570 4) 0.075 us | pick_next_task_rt();
1571 4) 0.062 us | pick_next_task_fair();
1572 4) 0.066 us | pick_next_task_idle();
1573 ------------------------------------------
1574 4) kworker-74 =&gt; &lt;idle&gt;-0
1575 ------------------------------------------
1576
1577 4) | finish_task_switch() {
1578 4) | _raw_spin_unlock_irq() {
1579 4) 0.100 us | sub_preempt_count();
1580 4) 0.582 us | }
1581 4) 1.105 us | }
1582 4) 0.088 us | sub_preempt_count();
1583 4) ! 100.066 us | }
1584 .
1585 .
1586 .
1587 3) | sys_ioctl() {
1588 3) 0.083 us | fget_light();
1589 3) | security_file_ioctl() {
1590 3) 0.066 us | cap_file_ioctl();
1591 3) 0.562 us | }
1592 3) | do_vfs_ioctl() {
1593 3) | drm_ioctl() {
1594 3) 0.075 us | drm_ut_debug_printk();
1595 3) | i915_gem_pwrite_ioctl() {
1596 3) | i915_mutex_lock_interruptible() {
1597 3) 0.070 us | mutex_lock_interruptible();
1598 3) 0.570 us | }
1599 3) | drm_gem_object_lookup() {
1600 3) | _raw_spin_lock() {
1601 3) 0.080 us | add_preempt_count();
1602 3) 0.620 us | }
1603 3) | _raw_spin_unlock() {
1604 3) 0.085 us | sub_preempt_count();
1605 3) 0.562 us | }
1606 3) 2.149 us | }
1607 3) 0.133 us | i915_gem_object_pin();
1608 3) | i915_gem_object_set_to_gtt_domain() {
1609 3) 0.065 us | i915_gem_object_flush_gpu_write_domain();
1610 3) 0.065 us | i915_gem_object_wait_rendering();
1611 3) 0.062 us | i915_gem_object_flush_cpu_write_domain();
1612 3) 1.612 us | }
1613 3) | i915_gem_object_put_fence() {
1614 3) 0.097 us | i915_gem_object_flush_fence.constprop.36();
1615 3) 0.645 us | }
1616 3) 0.070 us | add_preempt_count();
1617 3) 0.070 us | sub_preempt_count();
1618 3) 0.073 us | i915_gem_object_unpin();
1619 3) 0.068 us | mutex_unlock();
1620 3) 9.924 us | }
1621 3) + 11.236 us | }
1622 3) + 11.770 us | }
1623 3) + 13.784 us | }
1624 3) | sys_ioctl() {
1625 </literallayout>
1626 As you can see, the function_graph display is much easier to
1627 follow. Also note that in addition to the function calls and
1628 associated braces, other events such as scheduler events
1629 are displayed in context. In fact, you can freely include
1630 any tracepoint available in the trace events subsystem described
1631 in the next section by simply enabling those events, and they'll
1632 appear in context in the function graph display. Quite a
1633 powerful tool for understanding kernel dynamics.
1634 </para>
1635
1636 <para>
1637 Also notice that there are various annotations on the left
1638 hand side of the display. For example if the total time it
1639 took for a given function to execute is above a certain
1640 threshold, an exclamation point or plus sign appears on the
1641 left hand side. Please see the ftrace documentation for
1642 details on all these fields.
1643 </para>
1644 </section>
1645
1646 <section id='the-trace-events-subsystem'>
1647 <title>The 'trace events' Subsystem</title>
1648
1649 <para>
1650 One especially important directory contained within
1651 the /sys/kernel/debug/tracing directory is the 'events'
1652 subdirectory, which contains representations of every
1653 tracepoint in the system. Listing out the contents of
1654 the 'events' subdirectory, we see mainly another set of
1655 subdirectories:
1656 <literallayout class='monospaced'>
1657 root@sugarbay:/sys/kernel/debug/tracing# cd events
1658 root@sugarbay:/sys/kernel/debug/tracing/events# ls -al
1659 drwxr-xr-x 38 root root 0 Nov 14 23:19 .
1660 drwxr-xr-x 5 root root 0 Nov 14 23:19 ..
1661 drwxr-xr-x 19 root root 0 Nov 14 23:19 block
1662 drwxr-xr-x 32 root root 0 Nov 14 23:19 btrfs
1663 drwxr-xr-x 5 root root 0 Nov 14 23:19 drm
1664 -rw-r--r-- 1 root root 0 Nov 14 23:19 enable
1665 drwxr-xr-x 40 root root 0 Nov 14 23:19 ext3
1666 drwxr-xr-x 79 root root 0 Nov 14 23:19 ext4
1667 drwxr-xr-x 14 root root 0 Nov 14 23:19 ftrace
1668 drwxr-xr-x 8 root root 0 Nov 14 23:19 hda
1669 -r--r--r-- 1 root root 0 Nov 14 23:19 header_event
1670 -r--r--r-- 1 root root 0 Nov 14 23:19 header_page
1671 drwxr-xr-x 25 root root 0 Nov 14 23:19 i915
1672 drwxr-xr-x 7 root root 0 Nov 14 23:19 irq
1673 drwxr-xr-x 12 root root 0 Nov 14 23:19 jbd
1674 drwxr-xr-x 14 root root 0 Nov 14 23:19 jbd2
1675 drwxr-xr-x 14 root root 0 Nov 14 23:19 kmem
1676 drwxr-xr-x 7 root root 0 Nov 14 23:19 module
1677 drwxr-xr-x 3 root root 0 Nov 14 23:19 napi
1678 drwxr-xr-x 6 root root 0 Nov 14 23:19 net
1679 drwxr-xr-x 3 root root 0 Nov 14 23:19 oom
1680 drwxr-xr-x 12 root root 0 Nov 14 23:19 power
1681 drwxr-xr-x 3 root root 0 Nov 14 23:19 printk
1682 drwxr-xr-x 8 root root 0 Nov 14 23:19 random
1683 drwxr-xr-x 4 root root 0 Nov 14 23:19 raw_syscalls
1684 drwxr-xr-x 3 root root 0 Nov 14 23:19 rcu
1685 drwxr-xr-x 6 root root 0 Nov 14 23:19 rpm
1686 drwxr-xr-x 20 root root 0 Nov 14 23:19 sched
1687 drwxr-xr-x 7 root root 0 Nov 14 23:19 scsi
1688 drwxr-xr-x 4 root root 0 Nov 14 23:19 signal
1689 drwxr-xr-x 5 root root 0 Nov 14 23:19 skb
1690 drwxr-xr-x 4 root root 0 Nov 14 23:19 sock
1691 drwxr-xr-x 10 root root 0 Nov 14 23:19 sunrpc
1692 drwxr-xr-x 538 root root 0 Nov 14 23:19 syscalls
1693 drwxr-xr-x 4 root root 0 Nov 14 23:19 task
1694 drwxr-xr-x 14 root root 0 Nov 14 23:19 timer
1695 drwxr-xr-x 3 root root 0 Nov 14 23:19 udp
1696 drwxr-xr-x 21 root root 0 Nov 14 23:19 vmscan
1697 drwxr-xr-x 3 root root 0 Nov 14 23:19 vsyscall
1698 drwxr-xr-x 6 root root 0 Nov 14 23:19 workqueue
1699 drwxr-xr-x 26 root root 0 Nov 14 23:19 writeback
1700 </literallayout>
1701 Each one of these subdirectories corresponds to a
1702 'subsystem' and contains yet again more subdirectories,
1703 each one of those finally corresponding to a tracepoint.
1704 For example, here are the contents of the 'kmem' subsystem:
1705 <literallayout class='monospaced'>
1706 root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem
1707 root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al
1708 drwxr-xr-x 14 root root 0 Nov 14 23:19 .
1709 drwxr-xr-x 38 root root 0 Nov 14 23:19 ..
1710 -rw-r--r-- 1 root root 0 Nov 14 23:19 enable
1711 -rw-r--r-- 1 root root 0 Nov 14 23:19 filter
1712 drwxr-xr-x 2 root root 0 Nov 14 23:19 kfree
1713 drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc
1714 drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc_node
1715 drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc
1716 drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc_node
1717 drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_free
1718 drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc
1719 drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_extfrag
1720 drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_zone_locked
1721 drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free
1722 drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free_batched
1723 drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_pcpu_drain
1724 </literallayout>
1725 Let's see what's inside the subdirectory for a specific
1726 tracepoint, in this case the one for kmalloc:
1727 <literallayout class='monospaced'>
1728 root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc
1729 root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al
1730 drwxr-xr-x 2 root root 0 Nov 14 23:19 .
1731 drwxr-xr-x 14 root root 0 Nov 14 23:19 ..
1732 -rw-r--r-- 1 root root 0 Nov 14 23:19 enable
1733 -rw-r--r-- 1 root root 0 Nov 14 23:19 filter
1734 -r--r--r-- 1 root root 0 Nov 14 23:19 format
1735 -r--r--r-- 1 root root 0 Nov 14 23:19 id
1736 </literallayout>
1737 The 'format' file for the tracepoint describes the event
1738 in memory, which is used by the various tracing tools
1739 that now make use of these tracepoint to parse the event
1740 and make sense of it, along with a 'print fmt' field that
1741 allows tools like ftrace to display the event as text.
1742 Here's what the format of the kmalloc event looks like:
1743 <literallayout class='monospaced'>
1744 root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format
1745 name: kmalloc
1746 ID: 313
1747 format:
1748 field:unsigned short common_type; offset:0; size:2; signed:0;
1749 field:unsigned char common_flags; offset:2; size:1; signed:0;
1750 field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
1751 field:int common_pid; offset:4; size:4; signed:1;
1752 field:int common_padding; offset:8; size:4; signed:1;
1753
1754 field:unsigned long call_site; offset:16; size:8; signed:0;
1755 field:const void * ptr; offset:24; size:8; signed:0;
1756 field:size_t bytes_req; offset:32; size:8; signed:0;
1757 field:size_t bytes_alloc; offset:40; size:8; signed:0;
1758 field:gfp_t gfp_flags; offset:48; size:4; signed:0;
1759
1760 print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc,
1761 (REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
1762 gfp_t)0x20000u) | (( gfp_t)0x02u) | (( gfp_t)0x08u)) | (( gfp_t)0x4000u) | (( gfp_t)0x10000u) | (( gfp_t)0x1000u) | (( gfp_t)0x200u) | ((
1763 gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | ((
1764 gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
1765 gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
1766 gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"},
1767 {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)),
1768 "GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)((
1769 gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)((
1770 gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned
1771 long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"}, {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"},
1772 {(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u),
1773 "GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned
1774 long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"},
1775 {(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT"
1776 </literallayout>
1777 The 'enable' file in the tracepoint directory is what allows
1778 the user (or tools such as trace-cmd) to actually turn the
1779 tracepoint on and off. When enabled, the corresponding
1780 tracepoint will start appearing in the ftrace 'trace'
1781 file described previously. For example, this turns on the
1782 kmalloc tracepoint:
1783 <literallayout class='monospaced'>
1784 root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable
1785 </literallayout>
1786 At the moment, we're not interested in the function tracer or
1787 some other tracer that might be in effect, so we first turn
1788 it off, but if we do that, we still need to turn tracing on in
1789 order to see the events in the output buffer:
1790 <literallayout class='monospaced'>
1791 root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer
1792 root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on
1793 </literallayout>
1794 Now, if we look at the the 'trace' file, we see nothing
1795 but the kmalloc events we just turned on:
1796 <literallayout class='monospaced'>
1797 root@sugarbay:/sys/kernel/debug/tracing# cat trace | less
1798 # tracer: nop
1799 #
1800 # entries-in-buffer/entries-written: 1897/1897 #P:8
1801 #
1802 # _-----=&gt; irqs-off
1803 # / _----=&gt; need-resched
1804 # | / _---=&gt; hardirq/softirq
1805 # || / _--=&gt; preempt-depth
1806 # ||| / delay
1807 # TASK-PID CPU# |||| TIMESTAMP FUNCTION
1808 # | | | |||| | |
1809 dropbear-1465 [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1810 &lt;idle&gt;-0 [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1811 &lt;idle&gt;-0 [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1812 matchbox-termin-1361 [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT
1813 Xorg-1264 [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
1814 Xorg-1264 [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
1815 Xorg-1264 [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
1816 matchbox-termin-1361 [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT
1817 Xorg-1264 [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
1818 Xorg-1264 [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
1819 Xorg-1264 [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
1820 &lt;idle&gt;-0 [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1821 dropbear-1465 [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1822 &lt;idle&gt;-0 [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1823 &lt;idle&gt;-0 [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1824 dropbear-1465 [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1825 &lt;idle&gt;-0 [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1826 &lt;idle&gt;-0 [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1827 dropbear-1465 [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1828 &lt;idle&gt;-0 [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1829 matchbox-termin-1361 [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT
1830 Xorg-1264 [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
1831 Xorg-1264 [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
1832 Xorg-1264 [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
1833 &lt;idle&gt;-0 [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1834 dropbear-1465 [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1835 &lt;idle&gt;-0 [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1836 &lt;idle&gt;-0 [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1837 dropbear-1465 [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1838 &lt;idle&gt;-0 [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1839 matchbox-termin-1361 [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT
1840 </literallayout>
1841 To again disable the kmalloc event, we need to send 0 to the
1842 enable file:
1843 <literallayout class='monospaced'>
1844 root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable
1845 </literallayout>
1846 You can enable any number of events or complete subsystems
1847 (by using the 'enable' file in the subsystem directory) and
1848 get an arbitrarily fine-grained idea of what's going on in the
1849 system by enabling as many of the appropriate tracepoints
1850 as applicable.
1851 </para>
1852
1853 <para>
1854 A number of the tools described in this HOWTO do just that,
1855 including trace-cmd and kernelshark in the next section.
1856 </para>
1857
1858 <informalexample>
1859 <emphasis>Tying it Together:</emphasis> These tracepoints and their representation
1860 are used not only by ftrace, but by many of the other tools
1861 covered in this document and they form a central point of
1862 integration for the various tracers available in Linux.
1863 They form a central part of the instrumentation for the
1864 following tools: perf, lttng, ftrace, blktrace and SystemTap
1865 </informalexample>
1866
1867 <informalexample>
1868 <emphasis>Tying it Together:</emphasis> Eventually all the special-purpose tracers
1869 currently available in /sys/kernel/debug/tracing will be
1870 removed and replaced with equivalent tracers based on the
1871 'trace events' subsystem.
1872 </informalexample>
1873 </section>
1874
1875 <section id='trace-cmd-kernelshark'>
1876 <title>trace-cmd/kernelshark</title>
1877
1878 <para>
1879 trace-cmd is essentially an extensive command-line 'wrapper'
1880 interface that hides the details of all the individual files
1881 in /sys/kernel/debug/tracing, allowing users to specify
1882 specific particular events within the
1883 /sys/kernel/debug/tracing/events/ subdirectory and to collect
1884 traces and avoid having to deal with those details directly.
1885 </para>
1886
1887 <para>
1888 As yet another layer on top of that, kernelshark provides a GUI
1889 that allows users to start and stop traces and specify sets
1890 of events using an intuitive interface, and view the
1891 output as both trace events and as a per-CPU graphical
1892 display. It directly uses 'trace-cmd' as the plumbing
1893 that accomplishes all that underneath the covers (and
1894 actually displays the trace-cmd command it uses, as we'll see).
1895 </para>
1896
1897 <para>
1898 To start a trace using kernelshark, first start kernelshark:
1899 <literallayout class='monospaced'>
1900 root@sugarbay:~# kernelshark
1901 </literallayout>
1902 Then bring up the 'Capture' dialog by choosing from the
1903 kernelshark menu:
1904 <literallayout class='monospaced'>
1905 Capture | Record
1906 </literallayout>
1907 That will display the following dialog, which allows you to
1908 choose one or more events (or even one or more complete
1909 subsystems) to trace:
1910 </para>
1911
1912 <para>
1913 <imagedata fileref="figures/kernelshark-choose-events.png" width="6in" depth="6in" align="center" scalefit="1" />
1914 </para>
1915
1916 <para>
1917 Note that these are exactly the same sets of events described
1918 in the previous trace events subsystem section, and in fact
1919 is where trace-cmd gets them for kernelshark.
1920 </para>
1921
1922 <para>
1923 In the above screenshot, we've decided to explore the
1924 graphics subsystem a bit and so have chosen to trace all
1925 the tracepoints contained within the 'i915' and 'drm'
1926 subsystems.
1927 </para>
1928
1929 <para>
1930 After doing that, we can start and stop the trace using
1931 the 'Run' and 'Stop' button on the lower right corner of
1932 the dialog (the same button will turn into the 'Stop'
1933 button after the trace has started):
1934 </para>
1935
1936 <para>
1937 <imagedata fileref="figures/kernelshark-output-display.png" width="6in" depth="6in" align="center" scalefit="1" />
1938 </para>
1939
1940 <para>
1941 Notice that the right-hand pane shows the exact trace-cmd
1942 command-line that's used to run the trace, along with the
1943 results of the trace-cmd run.
1944 </para>
1945
1946 <para>
1947 Once the 'Stop' button is pressed, the graphical view magically
1948 fills up with a colorful per-cpu display of the trace data,
1949 along with the detailed event listing below that:
1950 </para>
1951
1952 <para>
1953 <imagedata fileref="figures/kernelshark-i915-display.png" width="6in" depth="7in" align="center" scalefit="1" />
1954 </para>
1955
1956 <para>
1957 Here's another example, this time a display resulting
1958 from tracing 'all events':
1959 </para>
1960
1961 <para>
1962 <imagedata fileref="figures/kernelshark-all.png" width="6in" depth="7in" align="center" scalefit="1" />
1963 </para>
1964
1965 <para>
1966 The tool is pretty self-explanatory, but for more detailed
1967 information on navigating through the data, see the
1968 <ulink url='http://rostedt.homelinux.com/kernelshark/'>kernelshark website</ulink>.
1969 </para>
1970 </section>
1971
1972 <section id='ftrace-documentation'>
1973 <title>Documentation</title>
1974
1975 <para>
1976 The documentation for ftrace can be found in the kernel
1977 Documentation directory:
1978 <literallayout class='monospaced'>
1979 Documentation/trace/ftrace.txt
1980 </literallayout>
1981 The documentation for the trace event subsystem can also
1982 be found in the kernel Documentation directory:
1983 <literallayout class='monospaced'>
1984 Documentation/trace/events.txt
1985 </literallayout>
1986 There is a nice series of articles on using
1987 ftrace and trace-cmd at LWN:
1988 <itemizedlist>
1989 <listitem><para><ulink url='http://lwn.net/Articles/365835/'>Debugging the kernel using Ftrace - part 1</ulink>
1990 </para></listitem>
1991 <listitem><para><ulink url='http://lwn.net/Articles/366796/'>Debugging the kernel using Ftrace - part 2</ulink>
1992 </para></listitem>
1993 <listitem><para><ulink url='http://lwn.net/Articles/370423/'>Secrets of the Ftrace function tracer</ulink>
1994 </para></listitem>
1995 <listitem><para><ulink url='https://lwn.net/Articles/410200/'>trace-cmd: A front-end for Ftrace</ulink>
1996 </para></listitem>
1997 </itemizedlist>
1998 </para>
1999
2000 <para>
2001 There's more detailed documentation kernelshark usage here:
2002 <ulink url='http://rostedt.homelinux.com/kernelshark/'>KernelShark</ulink>
2003 </para>
2004
2005 <para>
2006 An amusing yet useful README (a tracing mini-HOWTO) can be
2007 found in /sys/kernel/debug/tracing/README.
2008 </para>
2009 </section>
2010</section>
2011
2012<section id='profile-manual-systemtap'>
2013 <title>systemtap</title>
2014
2015 <para>
2016 SystemTap is a system-wide script-based tracing and profiling tool.
2017 </para>
2018
2019 <para>
2020 SystemTap scripts are C-like programs that are executed in the
2021 kernel to gather/print/aggregate data extracted from the context
2022 they end up being invoked under.
2023 </para>
2024
2025 <para>
2026 For example, this probe from the
2027 <ulink url='http://sourceware.org/systemtap/tutorial/'>SystemTap tutorial</ulink>
2028 simply prints a line every time any process on the system open()s
2029 a file. For each line, it prints the executable name of the
2030 program that opened the file, along with its PID, and the name
2031 of the file it opened (or tried to open), which it extracts
2032 from the open syscall's argstr.
2033 <literallayout class='monospaced'>
2034 probe syscall.open
2035 {
2036 printf ("%s(%d) open (%s)\n", execname(), pid(), argstr)
2037 }
2038
2039 probe timer.ms(4000) # after 4 seconds
2040 {
2041 exit ()
2042 }
2043 </literallayout>
2044 Normally, to execute this probe, you'd simply install
2045 systemtap on the system you want to probe, and directly run
2046 the probe on that system e.g. assuming the name of the file
2047 containing the above text is trace_open.stp:
2048 <literallayout class='monospaced'>
2049 # stap trace_open.stp
2050 </literallayout>
2051 What systemtap does under the covers to run this probe is 1)
2052 parse and convert the probe to an equivalent 'C' form, 2)
2053 compile the 'C' form into a kernel module, 3) insert the
2054 module into the kernel, which arms it, and 4) collect the data
2055 generated by the probe and display it to the user.
2056 </para>
2057
2058 <para>
2059 In order to accomplish steps 1 and 2, the 'stap' program needs
2060 access to the kernel build system that produced the kernel
2061 that the probed system is running. In the case of a typical
2062 embedded system (the 'target'), the kernel build system
2063 unfortunately isn't typically part of the image running on
2064 the target. It is normally available on the 'host' system
2065 that produced the target image however; in such cases,
2066 steps 1 and 2 are executed on the host system, and steps
2067 3 and 4 are executed on the target system, using only the
2068 systemtap 'runtime'.
2069 </para>
2070
2071 <para>
2072 The systemtap support in Yocto assumes that only steps
2073 3 and 4 are run on the target; it is possible to do
2074 everything on the target, but this section assumes only
2075 the typical embedded use-case.
2076 </para>
2077
2078 <para>
2079 So basically what you need to do in order to run a systemtap
2080 script on the target is to 1) on the host system, compile the
2081 probe into a kernel module that makes sense to the target, 2)
2082 copy the module onto the target system and 3) insert the
2083 module into the target kernel, which arms it, and 4) collect
2084 the data generated by the probe and display it to the user.
2085 </para>
2086
2087 <section id='systemtap-setup'>
2088 <title>Setup</title>
2089
2090 <para>
2091 Those are a lot of steps and a lot of details, but
2092 fortunately Yocto includes a script called 'crosstap'
2093 that will take care of those details, allowing you to
2094 simply execute a systemtap script on the remote target,
2095 with arguments if necessary.
2096 </para>
2097
2098 <para>
2099 In order to do this from a remote host, however, you
2100 need to have access to the build for the image you
2101 booted. The 'crosstap' script provides details on how
2102 to do this if you run the script on the host without having
2103 done a build:
2104 <note>
2105 SystemTap, which uses 'crosstap', assumes you can establish an
2106 ssh connection to the remote target.
2107 Please refer to the crosstap wiki page for details on verifying
2108 ssh connections at
2109 <ulink url='https://wiki.yoctoproject.org/wiki/Tracing_and_Profiling#systemtap'></ulink>.
2110 Also, the ability to ssh into the target system is not enabled
2111 by default in *-minimal images.
2112 </note>
2113 <literallayout class='monospaced'>
2114 $ crosstap root@192.168.1.88 trace_open.stp
2115
2116 Error: No target kernel build found.
2117 Did you forget to create a local build of your image?
2118
2119 'crosstap' requires a local sdk build of the target system
2120 (or a build that includes 'tools-profile') in order to build
2121 kernel modules that can probe the target system.
2122
2123 Practically speaking, that means you need to do the following:
2124 - If you're running a pre-built image, download the release
2125 and/or BSP tarballs used to build the image.
2126 - If you're working from git sources, just clone the metadata
2127 and BSP layers needed to build the image you'll be booting.
2128 - Make sure you're properly set up to build a new image (see
2129 the BSP README and/or the widely available basic documentation
2130 that discusses how to build images).
2131 - Build an -sdk version of the image e.g.:
2132 $ bitbake core-image-sato-sdk
2133 OR
2134 - Build a non-sdk image but include the profiling tools:
2135 [ edit local.conf and add 'tools-profile' to the end of
2136 the EXTRA_IMAGE_FEATURES variable ]
2137 $ bitbake core-image-sato
2138
2139 Once you've build the image on the host system, you're ready to
2140 boot it (or the equivalent pre-built image) and use 'crosstap'
2141 to probe it (you need to source the environment as usual first):
2142
2143 $ source oe-init-build-env
2144 $ cd ~/my/systemtap/scripts
2145 $ crosstap root@192.168.1.xxx myscript.stp
2146 </literallayout>
2147 So essentially what you need to do is build an SDK image or
2148 image with 'tools-profile' as detailed in the
2149 "<link linkend='profile-manual-general-setup'>General Setup</link>"
2150 section of this manual, and boot the resulting target image.
2151 </para>
2152
2153 <note>
2154 If you have a build directory containing multiple machines,
2155 you need to have the MACHINE you're connecting to selected
2156 in local.conf, and the kernel in that machine's build
2157 directory must match the kernel on the booted system exactly,
2158 or you'll get the above 'crosstap' message when you try to
2159 invoke a script.
2160 </note>
2161 </section>
2162
2163 <section id='running-a-script-on-a-target'>
2164 <title>Running a Script on a Target</title>
2165
2166 <para>
2167 Once you've done that, you should be able to run a systemtap
2168 script on the target:
2169 <literallayout class='monospaced'>
2170 $ cd /path/to/yocto
2171 $ source oe-init-build-env
2172
2173 ### Shell environment set up for builds. ###
2174
Patrick Williamsd8c66bc2016-06-20 12:57:21 -05002175 You can now run 'bitbake &lt;target&gt;'
Patrick Williamsc124f4f2015-09-15 14:41:29 -05002176
2177 Common targets are:
Patrick Williamsd8c66bc2016-06-20 12:57:21 -05002178 core-image-minimal
2179 core-image-sato
2180 meta-toolchain
2181 meta-ide-support
Patrick Williamsc124f4f2015-09-15 14:41:29 -05002182
2183 You can also run generated qemu images with a command like 'runqemu qemux86'
Patrick Williamsd8c66bc2016-06-20 12:57:21 -05002184
Patrick Williamsc124f4f2015-09-15 14:41:29 -05002185 </literallayout>
2186 Once you've done that, you can cd to whatever directory
2187 contains your scripts and use 'crosstap' to run the script:
2188 <literallayout class='monospaced'>
2189 $ cd /path/to/my/systemap/script
2190 $ crosstap root@192.168.7.2 trace_open.stp
2191 </literallayout>
2192 If you get an error connecting to the target e.g.:
2193 <literallayout class='monospaced'>
2194 $ crosstap root@192.168.7.2 trace_open.stp
2195 error establishing ssh connection on remote 'root@192.168.7.2'
2196 </literallayout>
2197 Try ssh'ing to the target and see what happens:
2198 <literallayout class='monospaced'>
2199 $ ssh root@192.168.7.2
2200 </literallayout>
2201 A lot of the time, connection problems are due specifying a
2202 wrong IP address or having a 'host key verification error'.
2203 </para>
2204
2205 <para>
2206 If everything worked as planned, you should see something
2207 like this (enter the password when prompted, or press enter
2208 if it's set up to use no password):
2209 <literallayout class='monospaced'>
2210 $ crosstap root@192.168.7.2 trace_open.stp
2211 root@192.168.7.2's password:
2212 matchbox-termin(1036) open ("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)
2213 matchbox-termin(1036) open ("/tmp/vteJMC7LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)
2214 </literallayout>
2215 </para>
2216 </section>
2217
2218 <section id='systemtap-documentation'>
2219 <title>Documentation</title>
2220
2221 <para>
2222 The SystemTap language reference can be found here:
2223 <ulink url='http://sourceware.org/systemtap/langref/'>SystemTap Language Reference</ulink>
2224 </para>
2225
2226 <para>
2227 Links to other SystemTap documents, tutorials, and examples can be
2228 found here:
2229 <ulink url='http://sourceware.org/systemtap/documentation.html'>SystemTap documentation page</ulink>
2230 </para>
2231 </section>
2232</section>
2233
Patrick Williamsc124f4f2015-09-15 14:41:29 -05002234<section id='profile-manual-sysprof'>
2235 <title>Sysprof</title>
2236
2237 <para>
2238 Sysprof is a very easy to use system-wide profiler that consists
2239 of a single window with three panes and a few buttons which allow
2240 you to start, stop, and view the profile from one place.
2241 </para>
2242
2243 <section id='sysprof-setup'>
2244 <title>Setup</title>
2245
2246 <para>
2247 For this section, we'll assume you've already performed the
2248 basic setup outlined in the General Setup section.
2249 </para>
2250
2251 <para>
2252 Sysprof is a GUI-based application that runs on the target
2253 system. For the rest of this document we assume you've
2254 ssh'ed to the host and will be running Sysprof on the
2255 target (you can use the '-X' option to ssh and have the
2256 Sysprof GUI run on the target but display remotely on the
2257 host if you want).
2258 </para>
2259 </section>
2260
2261 <section id='sysprof-basic-usage'>
2262 <title>Basic Usage</title>
2263
2264 <para>
2265 To start profiling the system, you simply press the 'Start'
2266 button. To stop profiling and to start viewing the profile data
2267 in one easy step, press the 'Profile' button.
2268 </para>
2269
2270 <para>
2271 Once you've pressed the profile button, the three panes will
2272 fill up with profiling data:
2273 </para>
2274
2275 <para>
2276 <imagedata fileref="figures/sysprof-copy-to-user.png" width="6in" depth="4in" align="center" scalefit="1" />
2277 </para>
2278
2279 <para>
2280 The left pane shows a list of functions and processes.
2281 Selecting one of those expands that function in the right
2282 pane, showing all its callees. Note that this caller-oriented
2283 display is essentially the inverse of perf's default
2284 callee-oriented callchain display.
2285 </para>
2286
2287 <para>
2288 In the screenshot above, we're focusing on __copy_to_user_ll()
2289 and looking up the callchain we can see that one of the callers
2290 of __copy_to_user_ll is sys_read() and the complete callpath
2291 between them. Notice that this is essentially a portion of the
2292 same information we saw in the perf display shown in the perf
2293 section of this page.
2294 </para>
2295
2296 <para>
2297 <imagedata fileref="figures/sysprof-copy-from-user.png" width="6in" depth="4in" align="center" scalefit="1" />
2298 </para>
2299
2300 <para>
2301 Similarly, the above is a snapshot of the Sysprof display of a
2302 copy-from-user callchain.
2303 </para>
2304
2305 <para>
2306 Finally, looking at the third Sysprof pane in the lower left,
2307 we can see a list of all the callers of a particular function
2308 selected in the top left pane. In this case, the lower pane is
2309 showing all the callers of __mark_inode_dirty:
2310 </para>
2311
2312 <para>
2313 <imagedata fileref="figures/sysprof-callers.png" width="6in" depth="4in" align="center" scalefit="1" />
2314 </para>
2315
2316 <para>
2317 Double-clicking on one of those functions will in turn change the
2318 focus to the selected function, and so on.
2319 </para>
2320
2321 <informalexample>
2322 <emphasis>Tying it Together:</emphasis> If you like sysprof's 'caller-oriented'
2323 display, you may be able to approximate it in other tools as
2324 well. For example, 'perf report' has the -g (--call-graph)
2325 option that you can experiment with; one of the options is
2326 'caller' for an inverted caller-based callgraph display.
2327 </informalexample>
2328 </section>
2329
2330 <section id='sysprof-documentation'>
2331 <title>Documentation</title>
2332
2333 <para>
2334 There doesn't seem to be any documentation for Sysprof, but
2335 maybe that's because it's pretty self-explanatory.
2336 The Sysprof website, however, is here:
2337 <ulink url='http://sysprof.com/'>Sysprof, System-wide Performance Profiler for Linux</ulink>
2338 </para>
2339 </section>
2340</section>
2341
2342<section id='lttng-linux-trace-toolkit-next-generation'>
2343 <title>LTTng (Linux Trace Toolkit, next generation)</title>
2344
2345 <section id='lttng-setup'>
2346 <title>Setup</title>
2347
2348 <para>
2349 For this section, we'll assume you've already performed the
2350 basic setup outlined in the General Setup section.
2351 </para>
2352
2353 <para>
2354 LTTng is run on the target system by ssh'ing to it.
2355 However, if you want to see the traces graphically,
2356 install Eclipse as described in section
2357 "<link linkend='manually-copying-a-trace-to-the-host-and-viewing-it-in-eclipse'>Manually copying a trace to the host and viewing it in Eclipse (i.e. using Eclipse without network support)</link>"
2358 and follow the directions to manually copy traces to the host and
2359 view them in Eclipse (i.e. using Eclipse without network support).
2360 </para>
2361
2362 <note>
2363 Be sure to download and install/run the 'SR1' or later Juno release
2364 of eclipse e.g.:
2365 <ulink url='http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz'>http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz</ulink>
2366 </note>
2367 </section>
2368
2369 <section id='collecting-and-viewing-traces'>
2370 <title>Collecting and Viewing Traces</title>
2371
2372 <para>
2373 Once you've applied the above commits and built and booted your
2374 image (you need to build the core-image-sato-sdk image or use one of the
2375 other methods described in the General Setup section), you're
2376 ready to start tracing.
2377 </para>
2378
2379 <section id='collecting-and-viewing-a-trace-on-the-target-inside-a-shell'>
2380 <title>Collecting and viewing a trace on the target (inside a shell)</title>
2381
2382 <para>
2383 First, from the host, ssh to the target:
2384 <literallayout class='monospaced'>
2385 $ ssh -l root 192.168.1.47
2386 The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.
2387 RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.
2388 Are you sure you want to continue connecting (yes/no)? yes
2389 Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.
2390 root@192.168.1.47's password:
2391 </literallayout>
2392 Once on the target, use these steps to create a trace:
2393 <literallayout class='monospaced'>
2394 root@crownbay:~# lttng create
2395 Spawning a session daemon
2396 Session auto-20121015-232120 created.
2397 Traces will be written in /home/root/lttng-traces/auto-20121015-232120
2398 </literallayout>
2399 Enable the events you want to trace (in this case all
2400 kernel events):
2401 <literallayout class='monospaced'>
2402 root@crownbay:~# lttng enable-event --kernel --all
2403 All kernel events are enabled in channel channel0
2404 </literallayout>
2405 Start the trace:
2406 <literallayout class='monospaced'>
2407 root@crownbay:~# lttng start
2408 Tracing started for session auto-20121015-232120
2409 </literallayout>
2410 And then stop the trace after awhile or after running
2411 a particular workload that you want to trace:
2412 <literallayout class='monospaced'>
2413 root@crownbay:~# lttng stop
2414 Tracing stopped for session auto-20121015-232120
2415 </literallayout>
2416 You can now view the trace in text form on the target:
2417 <literallayout class='monospaced'>
2418 root@crownbay:~# lttng view
2419 [23:21:56.989270399] (+?.?????????) sys_geteuid: { 1 }, { }
2420 [23:21:56.989278081] (+0.000007682) exit_syscall: { 1 }, { ret = 0 }
2421 [23:21:56.989286043] (+0.000007962) sys_pipe: { 1 }, { fildes = 0xB77B9E8C }
2422 [23:21:56.989321802] (+0.000035759) exit_syscall: { 1 }, { ret = 0 }
2423 [23:21:56.989329345] (+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot = 3, flags = 131362, fd = 4294967295, pgoff = 0 }
2424 [23:21:56.989351694] (+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 }
2425 [23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags = 0x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 }
2426 [23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388 }
2427 [23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm = "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 }
2428 [23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer = 3970832076, clockid = 1, mode = 1 }
2429 [23:21:56.989770462] (+0.000262044) hrtimer_cancel: { 1 }, { hrtimer = 3993865440 }
2430 [23:21:56.989771580] (+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 }
2431 [23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, { hrtimer = 3993865440, now = 79815980007057, function = 3238465232 }
2432 [23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, { hrtimer = 3993812192, now = 79815980008174, function = 3238465232 }
2433 [23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 }
2434 [23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 }
2435 [23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 }
2436 [23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098 }
2437 [23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime = 36976733240 }
2438 [23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: { 0 }, { hrtimer = 3993812192 }
2439 [23:21:56.989819631] (+0.000001117) hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 }
2440 [23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer = 3993812192, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }
2441 [23:21:56.989822984] (+0.000001118) hrtimer_start: { 1 }, { hrtimer = 3993865440, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }
2442 [23:21:56.989832762] (+0.000009778) softirq_entry: { 1 }, { vec = 1 }
2443 [23:21:56.989833879] (+0.000001117) softirq_entry: { 0 }, { vec = 1 }
2444 [23:21:56.989838069] (+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 }
2445 [23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer = 3993818708 }
2446 [23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1 }, { timer = 3993871956, now = 79515980, function = 3238277552 }
2447 [23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer = 3993818708, now = 79515980, function = 3238277552 }
2448 [23:21:56.989854831] (+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 49237, vruntime = 43368363335 }
2449 [23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 }
2450 [23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm = "kworker/1:1", tid = 21, delay = 9451318 }
2451 [23:21:56.989862374] (+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4, delay = 9958820 }
2452 [23:21:56.989868241] (+0.000005867) sched_wakeup: { 0 }, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu = 0 }
2453 [23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm = "kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 }
2454 [23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer = 3993871956 }
2455 [23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0 }, { timer = 3993818708 }
2456 .
2457 .
2458 .
2459 </literallayout>
2460 You can now safely destroy the trace session (note that
2461 this doesn't delete the trace - it's still there
2462 in ~/lttng-traces):
2463 <literallayout class='monospaced'>
2464 root@crownbay:~# lttng destroy
2465 Session auto-20121015-232120 destroyed at /home/root
2466 </literallayout>
2467 Note that the trace is saved in a directory of the same
2468 name as returned by 'lttng create', under the ~/lttng-traces
2469 directory (note that you can change this by supplying your
2470 own name to 'lttng create'):
2471 <literallayout class='monospaced'>
2472 root@crownbay:~# ls -al ~/lttng-traces
2473 drwxrwx--- 3 root root 1024 Oct 15 23:21 .
2474 drwxr-xr-x 5 root root 1024 Oct 15 23:57 ..
2475 drwxrwx--- 3 root root 1024 Oct 15 23:21 auto-20121015-232120
2476 </literallayout>
2477 </para>
2478 </section>
2479
2480 <section id='collecting-and-viewing-a-userspace-trace-on-the-target-inside-a-shell'>
2481 <title>Collecting and viewing a userspace trace on the target (inside a shell)</title>
2482
2483 <para>
2484 For LTTng userspace tracing, you need to have a properly
2485 instrumented userspace program. For this example, we'll use
2486 the 'hello' test program generated by the lttng-ust build.
2487 </para>
2488
2489 <para>
2490 The 'hello' test program isn't installed on the rootfs by
2491 the lttng-ust build, so we need to copy it over manually.
2492 First cd into the build directory that contains the hello
2493 executable:
2494 <literallayout class='monospaced'>
2495 $ cd build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs
2496 </literallayout>
2497 Copy that over to the target machine:
2498 <literallayout class='monospaced'>
2499 $ scp hello root@192.168.1.20:
2500 </literallayout>
2501 You now have the instrumented lttng 'hello world' test
2502 program on the target, ready to test.
2503 </para>
2504
2505 <para>
2506 First, from the host, ssh to the target:
2507 <literallayout class='monospaced'>
2508 $ ssh -l root 192.168.1.47
2509 The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.
2510 RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.
2511 Are you sure you want to continue connecting (yes/no)? yes
2512 Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.
2513 root@192.168.1.47's password:
2514 </literallayout>
2515 Once on the target, use these steps to create a trace:
2516 <literallayout class='monospaced'>
2517 root@crownbay:~# lttng create
2518 Session auto-20190303-021943 created.
2519 Traces will be written in /home/root/lttng-traces/auto-20190303-021943
2520 </literallayout>
2521 Enable the events you want to trace (in this case all
2522 userspace events):
2523 <literallayout class='monospaced'>
2524 root@crownbay:~# lttng enable-event --userspace --all
2525 All UST events are enabled in channel channel0
2526 </literallayout>
2527 Start the trace:
2528 <literallayout class='monospaced'>
2529 root@crownbay:~# lttng start
2530 Tracing started for session auto-20190303-021943
2531 </literallayout>
2532 Run the instrumented hello world program:
2533 <literallayout class='monospaced'>
2534 root@crownbay:~# ./hello
2535 Hello, World!
2536 Tracing... done.
2537 </literallayout>
2538 And then stop the trace after awhile or after running a
2539 particular workload that you want to trace:
2540 <literallayout class='monospaced'>
2541 root@crownbay:~# lttng stop
2542 Tracing stopped for session auto-20190303-021943
2543 </literallayout>
2544 You can now view the trace in text form on the target:
2545 <literallayout class='monospaced'>
2546 root@crownbay:~# lttng view
2547 [02:31:14.906146544] (+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0, netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }
2548 [02:31:14.906170360] (+0.000023816) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1, longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }
2549 [02:31:14.906183140] (+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2, netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }
2550 [02:31:14.906194385] (+0.000011245) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3, longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }
2551 .
2552 .
2553 .
2554 </literallayout>
2555 You can now safely destroy the trace session (note that
2556 this doesn't delete the trace - it's still
2557 there in ~/lttng-traces):
2558 <literallayout class='monospaced'>
2559 root@crownbay:~# lttng destroy
2560 Session auto-20190303-021943 destroyed at /home/root
2561 </literallayout>
2562 </para>
2563 </section>
2564
2565 <section id='manually-copying-a-trace-to-the-host-and-viewing-it-in-eclipse'>
2566 <title>Manually copying a trace to the host and viewing it in Eclipse (i.e. using Eclipse without network support)</title>
2567
2568 <para>
2569 If you already have an LTTng trace on a remote target and
2570 would like to view it in Eclipse on the host, you can easily
2571 copy it from the target to the host and import it into
2572 Eclipse to view it using the LTTng Eclipse plug-in already
2573 bundled in the Eclipse (Juno SR1 or greater).
2574 </para>
2575
2576 <para>
2577 Using the trace we created in the previous section, archive
2578 it and copy it to your host system:
2579 <literallayout class='monospaced'>
2580 root@crownbay:~/lttng-traces# tar zcvf auto-20121015-232120.tar.gz auto-20121015-232120
2581 auto-20121015-232120/
2582 auto-20121015-232120/kernel/
2583 auto-20121015-232120/kernel/metadata
2584 auto-20121015-232120/kernel/channel0_1
2585 auto-20121015-232120/kernel/channel0_0
2586
2587 $ scp root@192.168.1.47:lttng-traces/auto-20121015-232120.tar.gz .
2588 root@192.168.1.47's password:
2589 auto-20121015-232120.tar.gz 100% 1566KB 1.5MB/s 00:01
2590 </literallayout>
2591 Unarchive it on the host:
2592 <literallayout class='monospaced'>
2593 $ gunzip -c auto-20121015-232120.tar.gz | tar xvf -
2594 auto-20121015-232120/
2595 auto-20121015-232120/kernel/
2596 auto-20121015-232120/kernel/metadata
2597 auto-20121015-232120/kernel/channel0_1
2598 auto-20121015-232120/kernel/channel0_0
2599 </literallayout>
2600 We can now import the trace into Eclipse and view it:
2601 <orderedlist>
2602 <listitem><para>First, start eclipse and open the
2603 'LTTng Kernel' perspective by selecting the following
2604 menu item:
2605 <literallayout class='monospaced'>
2606 Window | Open Perspective | Other...
2607 </literallayout></para></listitem>
2608 <listitem><para>In the dialog box that opens, select
2609 'LTTng Kernel' from the list.</para></listitem>
2610 <listitem><para>Back at the main menu, select the
2611 following menu item:
2612 <literallayout class='monospaced'>
2613 File | New | Project...
2614 </literallayout></para></listitem>
2615 <listitem><para>In the dialog box that opens, select
2616 the 'Tracing | Tracing Project' wizard and press
2617 'Next>'.</para></listitem>
2618 <listitem><para>Give the project a name and press
2619 'Finish'.</para></listitem>
2620 <listitem><para>In the 'Project Explorer' pane under
2621 the project you created, right click on the
2622 'Traces' item.</para></listitem>
2623 <listitem><para>Select 'Import..." and in the dialog
2624 that's displayed:</para></listitem>
2625 <listitem><para>Browse the filesystem and find the
2626 select the 'kernel' directory containing the trace
2627 you copied from the target
2628 e.g. auto-20121015-232120/kernel</para></listitem>
2629 <listitem><para>'Checkmark' the directory in the tree
2630 that's displayed for the trace</para></listitem>
2631 <listitem><para>Below that, select 'Common Trace Format:
2632 Kernel Trace' for the 'Trace Type'</para></listitem>
2633 <listitem><para>Press 'Finish' to close the dialog
2634 </para></listitem>
2635 <listitem><para>Back in the 'Project Explorer' pane,
2636 double-click on the 'kernel' item for the
2637 trace you just imported under 'Traces'
2638 </para></listitem>
2639 </orderedlist>
2640 You should now see your trace data displayed graphically
2641 in several different views in Eclipse:
2642 </para>
2643
2644 <para>
2645 <imagedata fileref="figures/lttngmain0.png" width="6in" depth="6in" align="center" scalefit="1" />
2646 </para>
2647
2648 <para>
2649 You can access extensive help information on how to use
2650 the LTTng plug-in to search and analyze captured traces via
2651 the Eclipse help system:
2652 <literallayout class='monospaced'>
2653 Help | Help Contents | LTTng Plug-in User Guide
2654 </literallayout>
2655 </para>
2656 </section>
2657
2658 <section id='collecting-and-viewing-a-trace-in-eclipse'>
2659 <title>Collecting and viewing a trace in Eclipse</title>
2660
2661 <note>
2662 This section on collecting traces remotely doesn't currently
2663 work because of Eclipse 'RSE' connectivity problems. Manually
2664 tracing on the target, copying the trace files to the host,
2665 and viewing the trace in Eclipse on the host as outlined in
2666 previous steps does work however - please use the manual
2667 steps outlined above to view traces in Eclipse.
2668 </note>
2669
2670 <para>
2671 In order to trace a remote target, you also need to add
2672 a 'tracing' group on the target and connect as a user
2673 who's part of that group e.g:
2674 <literallayout class='monospaced'>
2675 # adduser tomz
2676 # groupadd -r tracing
2677 # usermod -a -G tracing tomz
2678 </literallayout>
2679 <orderedlist>
2680 <listitem><para>First, start eclipse and open the
2681 'LTTng Kernel' perspective by selecting the following
2682 menu item:
2683 <literallayout class='monospaced'>
2684 Window | Open Perspective | Other...
2685 </literallayout></para></listitem>
2686 <listitem><para>In the dialog box that opens, select
2687 'LTTng Kernel' from the list.</para></listitem>
2688 <listitem><para>Back at the main menu, select the
2689 following menu item:
2690 <literallayout class='monospaced'>
2691 File | New | Project...
2692 </literallayout></para></listitem>
2693 <listitem><para>In the dialog box that opens, select
2694 the 'Tracing | Tracing Project' wizard and
2695 press 'Next>'.</para></listitem>
2696 <listitem><para>Give the project a name and press
2697 'Finish'. That should result in an entry in the
2698 'Project' subwindow.</para></listitem>
2699 <listitem><para>In the 'Control' subwindow just below
2700 it, press 'New Connection'.</para></listitem>
2701 <listitem><para>Add a new connection, giving it the
2702 hostname or IP address of the target system.
2703 </para></listitem>
2704 <listitem><para>Provide the username and password
2705 of a qualified user (a member of the 'tracing' group)
2706 or root account on the target system.
2707 </para></listitem>
2708 <listitem><para>Provide appropriate answers to whatever
2709 else is asked for e.g. 'secure storage password'
2710 can be anything you want.
2711 If you get an 'RSE Error' it may be due to proxies.
2712 It may be possible to get around the problem by
2713 changing the following setting:
2714 <literallayout class='monospaced'>
2715 Window | Preferences | Network Connections
2716 </literallayout>
2717 Switch 'Active Provider' to 'Direct'
2718 </para></listitem>
2719 </orderedlist>
2720 </para>
2721 </section>
2722 </section>
2723
2724 <section id='lltng-documentation'>
2725 <title>Documentation</title>
2726
2727 <para>
2728 You can find the primary LTTng Documentation on the
2729 <ulink url='https://lttng.org/docs/'>LTTng Documentation</ulink>
2730 site.
2731 The documentation on this site is appropriate for intermediate to
2732 advanced software developers who are working in a Linux environment
2733 and are interested in efficient software tracing.
2734 </para>
2735
2736 <para>
2737 For information on LTTng in general, visit the
2738 <ulink url='http://lttng.org/lttng2.0'>LTTng Project</ulink>
2739 site.
2740 You can find a "Getting Started" link on this site that takes
2741 you to an LTTng Quick Start.
2742 </para>
2743
2744 <para>
2745 Finally, you can access extensive help information on how to use
2746 the LTTng plug-in to search and analyze captured traces via the
2747 Eclipse help system:
2748 <literallayout class='monospaced'>
2749 Help | Help Contents | LTTng Plug-in User Guide
2750 </literallayout>
2751 </para>
2752 </section>
2753</section>
2754
2755<section id='profile-manual-blktrace'>
2756 <title>blktrace</title>
2757
2758 <para>
2759 blktrace is a tool for tracing and reporting low-level disk I/O.
2760 blktrace provides the tracing half of the equation; its output can
2761 be piped into the blkparse program, which renders the data in a
2762 human-readable form and does some basic analysis:
2763 </para>
2764
2765 <section id='blktrace-setup'>
2766 <title>Setup</title>
2767
2768 <para>
2769 For this section, we'll assume you've already performed the
2770 basic setup outlined in the
2771 "<link linkend='profile-manual-general-setup'>General Setup</link>"
2772 section.
2773 </para>
2774
2775 <para>
2776 blktrace is an application that runs on the target system.
2777 You can run the entire blktrace and blkparse pipeline on the
2778 target, or you can run blktrace in 'listen' mode on the target
2779 and have blktrace and blkparse collect and analyze the data on
2780 the host (see the
2781 "<link linkend='using-blktrace-remotely'>Using blktrace Remotely</link>"
2782 section below).
2783 For the rest of this section we assume you've ssh'ed to the
2784 host and will be running blkrace on the target.
2785 </para>
2786 </section>
2787
2788 <section id='blktrace-basic-usage'>
2789 <title>Basic Usage</title>
2790
2791 <para>
2792 To record a trace, simply run the 'blktrace' command, giving it
2793 the name of the block device you want to trace activity on:
2794 <literallayout class='monospaced'>
2795 root@crownbay:~# blktrace /dev/sdc
2796 </literallayout>
2797 In another shell, execute a workload you want to trace.
2798 <literallayout class='monospaced'>
2799 root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>; sync
2800 Connecting to downloads.yoctoproject.org (140.211.169.59:80)
2801 linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA
2802 </literallayout>
2803 Press Ctrl-C in the blktrace shell to stop the trace. It will
2804 display how many events were logged, along with the per-cpu file
2805 sizes (blktrace records traces in per-cpu kernel buffers and
2806 simply dumps them to userspace for blkparse to merge and sort
2807 later).
2808 <literallayout class='monospaced'>
2809 ^C=== sdc ===
2810 CPU 0: 7082 events, 332 KiB data
2811 CPU 1: 1578 events, 74 KiB data
2812 Total: 8660 events (dropped 0), 406 KiB data
2813 </literallayout>
2814 If you examine the files saved to disk, you see multiple files,
2815 one per CPU and with the device name as the first part of the
2816 filename:
2817 <literallayout class='monospaced'>
2818 root@crownbay:~# ls -al
2819 drwxr-xr-x 6 root root 1024 Oct 27 22:39 .
2820 drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..
2821 -rw-r--r-- 1 root root 339938 Oct 27 22:40 sdc.blktrace.0
2822 -rw-r--r-- 1 root root 75753 Oct 27 22:40 sdc.blktrace.1
2823 </literallayout>
2824 To view the trace events, simply invoke 'blkparse' in the
2825 directory containing the trace files, giving it the device name
2826 that forms the first part of the filenames:
2827 <literallayout class='monospaced'>
2828 root@crownbay:~# blkparse sdc
2829
2830 8,32 1 1 0.000000000 1225 Q WS 3417048 + 8 [jbd2/sdc-8]
2831 8,32 1 2 0.000025213 1225 G WS 3417048 + 8 [jbd2/sdc-8]
2832 8,32 1 3 0.000033384 1225 P N [jbd2/sdc-8]
2833 8,32 1 4 0.000043301 1225 I WS 3417048 + 8 [jbd2/sdc-8]
2834 8,32 1 0 0.000057270 0 m N cfq1225 insert_request
2835 8,32 1 0 0.000064813 0 m N cfq1225 add_to_rr
2836 8,32 1 5 0.000076336 1225 U N [jbd2/sdc-8] 1
2837 8,32 1 0 0.000088559 0 m N cfq workload slice:150
2838 8,32 1 0 0.000097359 0 m N cfq1225 set_active wl_prio:0 wl_type:1
2839 8,32 1 0 0.000104063 0 m N cfq1225 Not idling. st->count:1
2840 8,32 1 0 0.000112584 0 m N cfq1225 fifo= (null)
2841 8,32 1 0 0.000118730 0 m N cfq1225 dispatch_insert
2842 8,32 1 0 0.000127390 0 m N cfq1225 dispatched a request
2843 8,32 1 0 0.000133536 0 m N cfq1225 activate rq, drv=1
2844 8,32 1 6 0.000136889 1225 D WS 3417048 + 8 [jbd2/sdc-8]
2845 8,32 1 7 0.000360381 1225 Q WS 3417056 + 8 [jbd2/sdc-8]
2846 8,32 1 8 0.000377422 1225 G WS 3417056 + 8 [jbd2/sdc-8]
2847 8,32 1 9 0.000388876 1225 P N [jbd2/sdc-8]
2848 8,32 1 10 0.000397886 1225 Q WS 3417064 + 8 [jbd2/sdc-8]
2849 8,32 1 11 0.000404800 1225 M WS 3417064 + 8 [jbd2/sdc-8]
2850 8,32 1 12 0.000412343 1225 Q WS 3417072 + 8 [jbd2/sdc-8]
2851 8,32 1 13 0.000416533 1225 M WS 3417072 + 8 [jbd2/sdc-8]
2852 8,32 1 14 0.000422121 1225 Q WS 3417080 + 8 [jbd2/sdc-8]
2853 8,32 1 15 0.000425194 1225 M WS 3417080 + 8 [jbd2/sdc-8]
2854 8,32 1 16 0.000431968 1225 Q WS 3417088 + 8 [jbd2/sdc-8]
2855 8,32 1 17 0.000435251 1225 M WS 3417088 + 8 [jbd2/sdc-8]
2856 8,32 1 18 0.000440279 1225 Q WS 3417096 + 8 [jbd2/sdc-8]
2857 8,32 1 19 0.000443911 1225 M WS 3417096 + 8 [jbd2/sdc-8]
2858 8,32 1 20 0.000450336 1225 Q WS 3417104 + 8 [jbd2/sdc-8]
2859 8,32 1 21 0.000454038 1225 M WS 3417104 + 8 [jbd2/sdc-8]
2860 8,32 1 22 0.000462070 1225 Q WS 3417112 + 8 [jbd2/sdc-8]
2861 8,32 1 23 0.000465422 1225 M WS 3417112 + 8 [jbd2/sdc-8]
2862 8,32 1 24 0.000474222 1225 I WS 3417056 + 64 [jbd2/sdc-8]
2863 8,32 1 0 0.000483022 0 m N cfq1225 insert_request
2864 8,32 1 25 0.000489727 1225 U N [jbd2/sdc-8] 1
2865 8,32 1 0 0.000498457 0 m N cfq1225 Not idling. st->count:1
2866 8,32 1 0 0.000503765 0 m N cfq1225 dispatch_insert
2867 8,32 1 0 0.000512914 0 m N cfq1225 dispatched a request
2868 8,32 1 0 0.000518851 0 m N cfq1225 activate rq, drv=2
2869 .
2870 .
2871 .
2872 8,32 0 0 58.515006138 0 m N cfq3551 complete rqnoidle 1
2873 8,32 0 2024 58.516603269 3 C WS 3156992 + 16 [0]
2874 8,32 0 0 58.516626736 0 m N cfq3551 complete rqnoidle 1
2875 8,32 0 0 58.516634558 0 m N cfq3551 arm_idle: 8 group_idle: 0
2876 8,32 0 0 58.516636933 0 m N cfq schedule dispatch
2877 8,32 1 0 58.516971613 0 m N cfq3551 slice expired t=0
2878 8,32 1 0 58.516982089 0 m N cfq3551 sl_used=13 disp=6 charge=13 iops=0 sect=80
2879 8,32 1 0 58.516985511 0 m N cfq3551 del_from_rr
2880 8,32 1 0 58.516990819 0 m N cfq3551 put_queue
2881
2882 CPU0 (sdc):
2883 Reads Queued: 0, 0KiB Writes Queued: 331, 26,284KiB
2884 Read Dispatches: 0, 0KiB Write Dispatches: 485, 40,484KiB
2885 Reads Requeued: 0 Writes Requeued: 0
2886 Reads Completed: 0, 0KiB Writes Completed: 511, 41,000KiB
2887 Read Merges: 0, 0KiB Write Merges: 13, 160KiB
2888 Read depth: 0 Write depth: 2
2889 IO unplugs: 23 Timer unplugs: 0
2890 CPU1 (sdc):
2891 Reads Queued: 0, 0KiB Writes Queued: 249, 15,800KiB
2892 Read Dispatches: 0, 0KiB Write Dispatches: 42, 1,600KiB
2893 Reads Requeued: 0 Writes Requeued: 0
2894 Reads Completed: 0, 0KiB Writes Completed: 16, 1,084KiB
2895 Read Merges: 0, 0KiB Write Merges: 40, 276KiB
2896 Read depth: 0 Write depth: 2
2897 IO unplugs: 30 Timer unplugs: 1
2898
2899 Total (sdc):
2900 Reads Queued: 0, 0KiB Writes Queued: 580, 42,084KiB
2901 Read Dispatches: 0, 0KiB Write Dispatches: 527, 42,084KiB
2902 Reads Requeued: 0 Writes Requeued: 0
2903 Reads Completed: 0, 0KiB Writes Completed: 527, 42,084KiB
2904 Read Merges: 0, 0KiB Write Merges: 53, 436KiB
2905 IO unplugs: 53 Timer unplugs: 1
2906
2907 Throughput (R/W): 0KiB/s / 719KiB/s
2908 Events (sdc): 6,592 entries
2909 Skips: 0 forward (0 - 0.0%)
2910 Input file sdc.blktrace.0 added
2911 Input file sdc.blktrace.1 added
2912 </literallayout>
2913 The report shows each event that was found in the blktrace data,
2914 along with a summary of the overall block I/O traffic during
2915 the run. You can look at the
2916 <ulink url='http://linux.die.net/man/1/blkparse'>blkparse</ulink>
2917 manpage to learn the
2918 meaning of each field displayed in the trace listing.
2919 </para>
2920
2921 <section id='blktrace-live-mode'>
2922 <title>Live Mode</title>
2923
2924 <para>
2925 blktrace and blkparse are designed from the ground up to
2926 be able to operate together in a 'pipe mode' where the
2927 stdout of blktrace can be fed directly into the stdin of
2928 blkparse:
2929 <literallayout class='monospaced'>
2930 root@crownbay:~# blktrace /dev/sdc -o - | blkparse -i -
2931 </literallayout>
2932 This enables long-lived tracing sessions to run without
2933 writing anything to disk, and allows the user to look for
2934 certain conditions in the trace data in 'real-time' by
2935 viewing the trace output as it scrolls by on the screen or
2936 by passing it along to yet another program in the pipeline
2937 such as grep which can be used to identify and capture
2938 conditions of interest.
2939 </para>
2940
2941 <para>
2942 There's actually another blktrace command that implements
2943 the above pipeline as a single command, so the user doesn't
2944 have to bother typing in the above command sequence:
2945 <literallayout class='monospaced'>
2946 root@crownbay:~# btrace /dev/sdc
2947 </literallayout>
2948 </para>
2949 </section>
2950
2951 <section id='using-blktrace-remotely'>
2952 <title>Using blktrace Remotely</title>
2953
2954 <para>
2955 Because blktrace traces block I/O and at the same time
2956 normally writes its trace data to a block device, and
2957 in general because it's not really a great idea to make
2958 the device being traced the same as the device the tracer
2959 writes to, blktrace provides a way to trace without
2960 perturbing the traced device at all by providing native
2961 support for sending all trace data over the network.
2962 </para>
2963
2964 <para>
2965 To have blktrace operate in this mode, start blktrace on
2966 the target system being traced with the -l option, along with
2967 the device to trace:
2968 <literallayout class='monospaced'>
2969 root@crownbay:~# blktrace -l /dev/sdc
2970 server: waiting for connections...
2971 </literallayout>
2972 On the host system, use the -h option to connect to the
2973 target system, also passing it the device to trace:
2974 <literallayout class='monospaced'>
2975 $ blktrace -d /dev/sdc -h 192.168.1.43
2976 blktrace: connecting to 192.168.1.43
2977 blktrace: connected!
2978 </literallayout>
2979 On the target system, you should see this:
2980 <literallayout class='monospaced'>
2981 server: connection from 192.168.1.43
2982 </literallayout>
2983 In another shell, execute a workload you want to trace.
2984 <literallayout class='monospaced'>
2985 root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>; sync
2986 Connecting to downloads.yoctoproject.org (140.211.169.59:80)
2987 linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA
2988 </literallayout>
2989 When it's done, do a Ctrl-C on the host system to
2990 stop the trace:
2991 <literallayout class='monospaced'>
2992 ^C=== sdc ===
2993 CPU 0: 7691 events, 361 KiB data
2994 CPU 1: 4109 events, 193 KiB data
2995 Total: 11800 events (dropped 0), 554 KiB data
2996 </literallayout>
2997 On the target system, you should also see a trace
2998 summary for the trace just ended:
2999 <literallayout class='monospaced'>
3000 server: end of run for 192.168.1.43:sdc
3001 === sdc ===
3002 CPU 0: 7691 events, 361 KiB data
3003 CPU 1: 4109 events, 193 KiB data
3004 Total: 11800 events (dropped 0), 554 KiB data
3005 </literallayout>
3006 The blktrace instance on the host will save the target
3007 output inside a hostname-timestamp directory:
3008 <literallayout class='monospaced'>
3009 $ ls -al
3010 drwxr-xr-x 10 root root 1024 Oct 28 02:40 .
3011 drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..
3012 drwxr-xr-x 2 root root 1024 Oct 28 02:40 192.168.1.43-2012-10-28-02:40:56
3013 </literallayout>
3014 cd into that directory to see the output files:
3015 <literallayout class='monospaced'>
3016 $ ls -l
3017 -rw-r--r-- 1 root root 369193 Oct 28 02:44 sdc.blktrace.0
3018 -rw-r--r-- 1 root root 197278 Oct 28 02:44 sdc.blktrace.1
3019 </literallayout>
3020 And run blkparse on the host system using the device name:
3021 <literallayout class='monospaced'>
3022 $ blkparse sdc
3023
3024 8,32 1 1 0.000000000 1263 Q RM 6016 + 8 [ls]
3025 8,32 1 0 0.000036038 0 m N cfq1263 alloced
3026 8,32 1 2 0.000039390 1263 G RM 6016 + 8 [ls]
3027 8,32 1 3 0.000049168 1263 I RM 6016 + 8 [ls]
3028 8,32 1 0 0.000056152 0 m N cfq1263 insert_request
3029 8,32 1 0 0.000061600 0 m N cfq1263 add_to_rr
3030 8,32 1 0 0.000075498 0 m N cfq workload slice:300
3031 .
3032 .
3033 .
3034 8,32 0 0 177.266385696 0 m N cfq1267 arm_idle: 8 group_idle: 0
3035 8,32 0 0 177.266388140 0 m N cfq schedule dispatch
3036 8,32 1 0 177.266679239 0 m N cfq1267 slice expired t=0
3037 8,32 1 0 177.266689297 0 m N cfq1267 sl_used=9 disp=6 charge=9 iops=0 sect=56
3038 8,32 1 0 177.266692649 0 m N cfq1267 del_from_rr
3039 8,32 1 0 177.266696560 0 m N cfq1267 put_queue
3040
3041 CPU0 (sdc):
3042 Reads Queued: 0, 0KiB Writes Queued: 270, 21,708KiB
3043 Read Dispatches: 59, 2,628KiB Write Dispatches: 495, 39,964KiB
3044 Reads Requeued: 0 Writes Requeued: 0
3045 Reads Completed: 90, 2,752KiB Writes Completed: 543, 41,596KiB
3046 Read Merges: 0, 0KiB Write Merges: 9, 344KiB
3047 Read depth: 2 Write depth: 2
3048 IO unplugs: 20 Timer unplugs: 1
3049 CPU1 (sdc):
3050 Reads Queued: 688, 2,752KiB Writes Queued: 381, 20,652KiB
3051 Read Dispatches: 31, 124KiB Write Dispatches: 59, 2,396KiB
3052 Reads Requeued: 0 Writes Requeued: 0
3053 Reads Completed: 0, 0KiB Writes Completed: 11, 764KiB
3054 Read Merges: 598, 2,392KiB Write Merges: 88, 448KiB
3055 Read depth: 2 Write depth: 2
3056 IO unplugs: 52 Timer unplugs: 0
3057
3058 Total (sdc):
3059 Reads Queued: 688, 2,752KiB Writes Queued: 651, 42,360KiB
3060 Read Dispatches: 90, 2,752KiB Write Dispatches: 554, 42,360KiB
3061 Reads Requeued: 0 Writes Requeued: 0
3062 Reads Completed: 90, 2,752KiB Writes Completed: 554, 42,360KiB
3063 Read Merges: 598, 2,392KiB Write Merges: 97, 792KiB
3064 IO unplugs: 72 Timer unplugs: 1
3065
3066 Throughput (R/W): 15KiB/s / 238KiB/s
3067 Events (sdc): 9,301 entries
3068 Skips: 0 forward (0 - 0.0%)
3069 </literallayout>
3070 You should see the trace events and summary just as
3071 you would have if you'd run the same command on the target.
3072 </para>
3073 </section>
3074
3075 <section id='tracing-block-io-via-ftrace'>
3076 <title>Tracing Block I/O via 'ftrace'</title>
3077
3078 <para>
3079 It's also possible to trace block I/O using only
3080 <link linkend='the-trace-events-subsystem'>trace events subsystem</link>,
3081 which can be useful for casual tracing
3082 if you don't want to bother dealing with the userspace tools.
3083 </para>
3084
3085 <para>
3086 To enable tracing for a given device, use
3087 /sys/block/xxx/trace/enable, where xxx is the device name.
3088 This for example enables tracing for /dev/sdc:
3089 <literallayout class='monospaced'>
3090 root@crownbay:/sys/kernel/debug/tracing# echo 1 > /sys/block/sdc/trace/enable
3091 </literallayout>
3092 Once you've selected the device(s) you want to trace,
3093 selecting the 'blk' tracer will turn the blk tracer on:
3094 <literallayout class='monospaced'>
3095 root@crownbay:/sys/kernel/debug/tracing# cat available_tracers
3096 blk function_graph function nop
3097
3098 root@crownbay:/sys/kernel/debug/tracing# echo blk > current_tracer
3099 </literallayout>
3100 Execute the workload you're interested in:
3101 <literallayout class='monospaced'>
3102 root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt
3103 </literallayout>
3104 And look at the output (note here that we're using
3105 'trace_pipe' instead of trace to capture this trace -
3106 this allows us to wait around on the pipe for data to
3107 appear):
3108 <literallayout class='monospaced'>
3109 root@crownbay:/sys/kernel/debug/tracing# cat trace_pipe
3110 cat-3587 [001] d..1 3023.276361: 8,32 Q R 1699848 + 8 [cat]
3111 cat-3587 [001] d..1 3023.276410: 8,32 m N cfq3587 alloced
3112 cat-3587 [001] d..1 3023.276415: 8,32 G R 1699848 + 8 [cat]
3113 cat-3587 [001] d..1 3023.276424: 8,32 P N [cat]
3114 cat-3587 [001] d..2 3023.276432: 8,32 I R 1699848 + 8 [cat]
3115 cat-3587 [001] d..1 3023.276439: 8,32 m N cfq3587 insert_request
3116 cat-3587 [001] d..1 3023.276445: 8,32 m N cfq3587 add_to_rr
3117 cat-3587 [001] d..2 3023.276454: 8,32 U N [cat] 1
3118 cat-3587 [001] d..1 3023.276464: 8,32 m N cfq workload slice:150
3119 cat-3587 [001] d..1 3023.276471: 8,32 m N cfq3587 set_active wl_prio:0 wl_type:2
3120 cat-3587 [001] d..1 3023.276478: 8,32 m N cfq3587 fifo= (null)
3121 cat-3587 [001] d..1 3023.276483: 8,32 m N cfq3587 dispatch_insert
3122 cat-3587 [001] d..1 3023.276490: 8,32 m N cfq3587 dispatched a request
3123 cat-3587 [001] d..1 3023.276497: 8,32 m N cfq3587 activate rq, drv=1
3124 cat-3587 [001] d..2 3023.276500: 8,32 D R 1699848 + 8 [cat]
3125 </literallayout>
3126 And this turns off tracing for the specified device:
3127 <literallayout class='monospaced'>
3128 root@crownbay:/sys/kernel/debug/tracing# echo 0 > /sys/block/sdc/trace/enable
3129 </literallayout>
3130 </para>
3131 </section>
3132 </section>
3133
3134 <section id='blktrace-documentation'>
3135 <title>Documentation</title>
3136
3137 <para>
3138 Online versions of the man pages for the commands discussed
3139 in this section can be found here:
3140 <itemizedlist>
3141 <listitem><para><ulink url='http://linux.die.net/man/8/blktrace'>http://linux.die.net/man/8/blktrace</ulink>
3142 </para></listitem>
3143 <listitem><para><ulink url='http://linux.die.net/man/1/blkparse'>http://linux.die.net/man/1/blkparse</ulink>
3144 </para></listitem>
3145 <listitem><para><ulink url='http://linux.die.net/man/8/btrace'>http://linux.die.net/man/8/btrace</ulink>
3146 </para></listitem>
3147 </itemizedlist>
3148 </para>
3149
3150 <para>
3151 The above manpages, along with manpages for the other
3152 blktrace utilities (btt, blkiomon, etc) can be found in the
3153 /doc directory of the blktrace tools git repo:
3154 <literallayout class='monospaced'>
3155 $ git clone git://git.kernel.dk/blktrace.git
3156 </literallayout>
3157 </para>
3158 </section>
3159</section>
3160</chapter>
3161<!--
3162vim: expandtab tw=80 ts=4
3163-->