Andrew Geissler | f034379 | 2020-11-18 10:42:21 -0600 | [diff] [blame] | 1 | .. SPDX-License-Identifier: CC-BY-SA-2.0-UK |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2 | .. highlight:: shell |
| 3 | |
| 4 | *************************************************************** |
| 5 | Basic Usage (with examples) for each of the Yocto Tracing Tools |
| 6 | *************************************************************** |
| 7 | |
| 8 | | |
| 9 | |
| 10 | This chapter presents basic usage examples for each of the tracing |
| 11 | tools. |
| 12 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 13 | perf |
| 14 | ==== |
| 15 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 16 | The perf tool is the profiling and tracing tool that comes bundled |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 17 | with the Linux kernel. |
| 18 | |
| 19 | Don't let the fact that it's part of the kernel fool you into thinking |
Andrew Geissler | 615f2f1 | 2022-07-15 14:00:58 -0500 | [diff] [blame] | 20 | that it's only for tracing and profiling the kernel --- you can indeed use |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 21 | it to trace and profile just the kernel, but you can also use it to |
| 22 | profile specific applications separately (with or without kernel |
| 23 | context), and you can also use it to trace and profile the kernel and |
| 24 | all applications on the system simultaneously to gain a system-wide view |
| 25 | of what's going on. |
| 26 | |
| 27 | In many ways, perf aims to be a superset of all the tracing and |
| 28 | profiling tools available in Linux today, including all the other tools |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 29 | covered in this How-to. The past couple of years have seen perf subsume a |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 30 | lot of the functionality of those other tools and, at the same time, |
| 31 | those other tools have removed large portions of their previous |
| 32 | functionality and replaced it with calls to the equivalent functionality |
| 33 | now implemented by the perf subsystem. Extrapolation suggests that at |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 34 | some point those other tools will become completely redundant and |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 35 | go away; until then, we'll cover those other tools in these pages and in |
| 36 | many cases show how the same things can be accomplished in perf and the |
| 37 | other tools when it seems useful to do so. |
| 38 | |
| 39 | The coverage below details some of the most common ways you'll likely |
| 40 | want to apply the tool; full documentation can be found either within |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 41 | the tool itself or in the manual pages at |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 42 | `perf(1) <https://linux.die.net/man/1/perf>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 43 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 44 | perf Setup |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 45 | ---------- |
| 46 | |
| 47 | For this section, we'll assume you've already performed the basic setup |
Andrew Geissler | 09209ee | 2020-12-13 08:44:15 -0600 | [diff] [blame] | 48 | outlined in the ":ref:`profile-manual/intro:General Setup`" section. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 49 | |
| 50 | In particular, you'll get the most mileage out of perf if you profile an |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 51 | image built with the following in your ``local.conf`` file:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 52 | |
| 53 | INHIBIT_PACKAGE_STRIP = "1" |
| 54 | |
| 55 | perf runs on the target system for the most part. You can archive |
| 56 | profile data and copy it to the host for analysis, but for the rest of |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 57 | this document we assume you're connected to the host through SSH and will be |
| 58 | running the perf commands on the target. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 59 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 60 | Basic perf Usage |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 61 | ---------------- |
| 62 | |
| 63 | The perf tool is pretty much self-documenting. To remind yourself of the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 64 | available commands, just type ``perf``, which will show you basic usage |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 65 | along with the available perf subcommands:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 66 | |
| 67 | root@crownbay:~# perf |
| 68 | |
| 69 | usage: perf [--version] [--help] COMMAND [ARGS] |
| 70 | |
| 71 | The most commonly used perf commands are: |
| 72 | annotate Read perf.data (created by perf record) and display annotated code |
| 73 | archive Create archive with object files with build-ids found in perf.data file |
| 74 | bench General framework for benchmark suites |
| 75 | buildid-cache Manage build-id cache. |
| 76 | buildid-list List the buildids in a perf.data file |
| 77 | diff Read two perf.data files and display the differential profile |
| 78 | evlist List the event names in a perf.data file |
| 79 | inject Filter to augment the events stream with additional information |
| 80 | kmem Tool to trace/measure kernel memory(slab) properties |
| 81 | kvm Tool to trace/measure kvm guest os |
| 82 | list List all symbolic event types |
| 83 | lock Analyze lock events |
| 84 | probe Define new dynamic tracepoints |
| 85 | record Run a command and record its profile into perf.data |
| 86 | report Read perf.data (created by perf record) and display the profile |
| 87 | sched Tool to trace/measure scheduler properties (latencies) |
| 88 | script Read perf.data (created by perf record) and display trace output |
| 89 | stat Run a command and gather performance counter statistics |
| 90 | test Runs sanity tests. |
| 91 | timechart Tool to visualize total system behavior during a workload |
| 92 | top System profiling tool. |
| 93 | |
| 94 | See 'perf help COMMAND' for more information on a specific command. |
| 95 | |
| 96 | |
| 97 | Using perf to do Basic Profiling |
| 98 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 99 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 100 | As a simple test case, we'll profile the ``wget`` of a fairly large file, |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 101 | which is a minimally interesting case because it has both file and |
| 102 | network I/O aspects, and at least in the case of standard Yocto images, |
Andrew Geissler | 3b8a17c | 2021-04-15 15:55:55 -0500 | [diff] [blame] | 103 | it's implemented as part of BusyBox, so the methods we use to analyze it |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 104 | can be used in a similar way to the whole host of supported BusyBox |
| 105 | applets in Yocto:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 106 | |
| 107 | root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \ |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 108 | wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 109 | |
| 110 | The quickest and easiest way to get some basic overall data about what's |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 111 | going on for a particular workload is to profile it using ``perf stat``. |
| 112 | This command basically profiles using a few default counters and displays |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 113 | the summed counts at the end of the run:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 114 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 115 | root@crownbay:~# perf stat wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 116 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 117 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA |
| 118 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 119 | Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2': |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 120 | |
| 121 | 4597.223902 task-clock # 0.077 CPUs utilized |
| 122 | 23568 context-switches # 0.005 M/sec |
| 123 | 68 CPU-migrations # 0.015 K/sec |
| 124 | 241 page-faults # 0.052 K/sec |
| 125 | 3045817293 cycles # 0.663 GHz |
| 126 | <not supported> stalled-cycles-frontend |
| 127 | <not supported> stalled-cycles-backend |
| 128 | 858909167 instructions # 0.28 insns per cycle |
| 129 | 165441165 branches # 35.987 M/sec |
| 130 | 19550329 branch-misses # 11.82% of all branches |
| 131 | |
| 132 | 59.836627620 seconds time elapsed |
| 133 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 134 | Such a simple-minded test doesn't always yield much of interest, but sometimes |
| 135 | it does (see the :yocto_bugs:`Slow write speed on live images with denzil |
| 136 | </show_bug.cgi?id=3049>` bug report). |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 137 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 138 | Also, note that ``perf stat`` isn't restricted to a fixed set of counters |
| 139 | --- basically any event listed in the output of ``perf list`` can be tallied |
| 140 | by ``perf stat``. For example, suppose we wanted to see a summary of all |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 141 | the events related to kernel memory allocation/freeing along with cache |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 142 | hits and misses:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 143 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 144 | root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 145 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 146 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA |
| 147 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 148 | Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2': |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 149 | |
| 150 | 5566 kmem:kmalloc |
| 151 | 125517 kmem:kmem_cache_alloc |
| 152 | 0 kmem:kmalloc_node |
| 153 | 0 kmem:kmem_cache_alloc_node |
| 154 | 34401 kmem:kfree |
| 155 | 69920 kmem:kmem_cache_free |
| 156 | 133 kmem:mm_page_free |
| 157 | 41 kmem:mm_page_free_batched |
| 158 | 11502 kmem:mm_page_alloc |
| 159 | 11375 kmem:mm_page_alloc_zone_locked |
| 160 | 0 kmem:mm_page_pcpu_drain |
| 161 | 0 kmem:mm_page_alloc_extfrag |
| 162 | 66848602 cache-references |
| 163 | 2917740 cache-misses # 4.365 % of all cache refs |
| 164 | |
| 165 | 44.831023415 seconds time elapsed |
| 166 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 167 | As you can see, ``perf stat`` gives us a nice easy |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 168 | way to get a quick overview of what might be happening for a set of |
| 169 | events, but normally we'd need a little more detail in order to |
| 170 | understand what's going on in a way that we can act on in a useful way. |
| 171 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 172 | To dive down into a next level of detail, we can use ``perf record`` / |
| 173 | ``perf report`` which will collect profiling data and present it to use using an |
| 174 | interactive text-based UI (or just as text if we specify ``--stdio`` to |
| 175 | ``perf report``). |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 176 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 177 | As our first attempt at profiling this workload, we'll just run ``perf |
| 178 | record``, handing it the workload we want to profile (everything after |
| 179 | ``perf record`` and any perf options we hand it --- here none, will be |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 180 | executed in a new shell). perf collects samples until the process exits |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 181 | and records them in a file named ``perf.data`` in the current working |
| 182 | directory:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 183 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 184 | root@crownbay:~# perf record wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 185 | |
| 186 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 187 | linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA |
| 188 | [ perf record: Woken up 1 times to write data ] |
| 189 | [ perf record: Captured and wrote 0.176 MB perf.data (~7700 samples) ] |
| 190 | |
| 191 | To see the results in a |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 192 | "text-based UI" (tui), just run ``perf report``, which will read the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 193 | perf.data file in the current working directory and display the results |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 194 | in an interactive UI:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 195 | |
| 196 | root@crownbay:~# perf report |
| 197 | |
| 198 | .. image:: figures/perf-wget-flat-stripped.png |
| 199 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 200 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 201 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 202 | The above screenshot displays a "flat" profile, one entry for each |
| 203 | "bucket" corresponding to the functions that were profiled during the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 204 | profiling run, ordered from the most popular to the least (perf has |
| 205 | options to sort in various orders and keys as well as display entries |
Andrew Geissler | 615f2f1 | 2022-07-15 14:00:58 -0500 | [diff] [blame] | 206 | only above a certain threshold and so on --- see the perf documentation |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 207 | for details). Note that this includes both user space functions (entries |
| 208 | containing a ``[.]``) and kernel functions accounted to the process (entries |
| 209 | containing a ``[k]``). perf has command-line modifiers that can be used to |
| 210 | restrict the profiling to kernel or user space, among others. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 211 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 212 | Notice also that the above report shows an entry for ``busybox``, which is |
| 213 | the executable that implements ``wget`` in Yocto, but that instead of a |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 214 | useful function name in that entry, it displays a not-so-friendly hex |
| 215 | value instead. The steps below will show how to fix that problem. |
| 216 | |
| 217 | Before we do that, however, let's try running a different profile, one |
| 218 | which shows something a little more interesting. The only difference |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 219 | between the new profile and the previous one is that we'll add the ``-g`` |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 220 | option, which will record not just the address of a sampled function, |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 221 | but the entire call chain to the sampled function as well:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 222 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 223 | root@crownbay:~# perf record -g wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 224 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 225 | linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA |
| 226 | [ perf record: Woken up 3 times to write data ] |
| 227 | [ perf record: Captured and wrote 0.652 MB perf.data (~28476 samples) ] |
| 228 | |
| 229 | |
| 230 | root@crownbay:~# perf report |
| 231 | |
| 232 | .. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png |
| 233 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 234 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 235 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 236 | Using the call graph view, we can actually see not only which functions |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 237 | took the most time, but we can also see a summary of how those functions |
| 238 | were called and learn something about how the program interacts with the |
| 239 | kernel in the process. |
| 240 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 241 | Notice that each entry in the above screenshot now contains a ``+`` on the |
| 242 | left side. This means that we can expand the entry and drill down |
| 243 | into the call chains that feed into that entry. Pressing ``Enter`` on any |
| 244 | one of them will expand the call chain (you can also press ``E`` to expand |
| 245 | them all at the same time or ``C`` to collapse them all). |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 246 | |
| 247 | In the screenshot above, we've toggled the ``__copy_to_user_ll()`` entry |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 248 | and several subnodes all the way down. This lets us see which call chains |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 249 | contributed to the profiled ``__copy_to_user_ll()`` function which |
| 250 | contributed 1.77% to the total profile. |
| 251 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 252 | As a bit of background explanation for these call chains, think about |
| 253 | what happens at a high level when you run ``wget`` to get a file out on the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 254 | network. Basically what happens is that the data comes into the kernel |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 255 | via the network connection (socket) and is passed to the user space |
| 256 | program ``wget`` (which is actually a part of BusyBox, but that's not |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 257 | important for now), which takes the buffers the kernel passes to it and |
| 258 | writes it to a disk file to save it. |
| 259 | |
| 260 | The part of this process that we're looking at in the above call stacks |
Andrew Geissler | 3b8a17c | 2021-04-15 15:55:55 -0500 | [diff] [blame] | 261 | is the part where the kernel passes the data it has read from the socket |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 262 | down to wget i.e. a ``copy-to-user``. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 263 | |
| 264 | Notice also that here there's also a case where the hex value is |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 265 | displayed in the call stack, here in the expanded ``sys_clock_gettime()`` |
| 266 | function. Later we'll see it resolve to a user space function call in |
| 267 | BusyBox. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 268 | |
| 269 | .. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png |
| 270 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 271 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 272 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 273 | The above screenshot shows the other half of the journey for the data --- |
| 274 | from the ``wget`` program's user space buffers to disk. To get the buffers to |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 275 | disk, the wget program issues a ``write(2)``, which does a ``copy-from-user`` to |
| 276 | the kernel, which then takes care via some circuitous path (probably |
| 277 | also present somewhere in the profile data), to get it safely to disk. |
| 278 | |
| 279 | Now that we've seen the basic layout of the profile data and the basics |
| 280 | of how to extract useful information out of it, let's get back to the |
| 281 | task at hand and see if we can get some basic idea about where the time |
| 282 | is spent in the program we're profiling, wget. Remember that wget is |
Andrew Geissler | 3b8a17c | 2021-04-15 15:55:55 -0500 | [diff] [blame] | 283 | actually implemented as an applet in BusyBox, so while the process name |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 284 | is ``wget``, the executable we're actually interested in is ``busybox``. |
| 285 | Therefore, let's expand the first entry containing BusyBox: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 286 | |
| 287 | .. image:: figures/perf-wget-busybox-expanded-stripped.png |
| 288 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 289 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 290 | |
| 291 | Again, before we expanded we saw that the function was labeled with a |
| 292 | hex value instead of a symbol as with most of the kernel entries. |
Andrew Geissler | 3b8a17c | 2021-04-15 15:55:55 -0500 | [diff] [blame] | 293 | Expanding the BusyBox entry doesn't make it any better. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 294 | |
| 295 | The problem is that perf can't find the symbol information for the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 296 | ``busybox`` binary, which is actually stripped out by the Yocto build |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 297 | system. |
| 298 | |
| 299 | One way around that is to put the following in your ``local.conf`` file |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 300 | when you build the image:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 301 | |
| 302 | INHIBIT_PACKAGE_STRIP = "1" |
| 303 | |
| 304 | However, we already have an image with the binaries stripped, so |
| 305 | what can we do to get perf to resolve the symbols? Basically we need to |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 306 | install the debugging information for the BusyBox package. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 307 | |
| 308 | To generate the debug info for the packages in the image, we can add |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 309 | ``dbg-pkgs`` to :term:`EXTRA_IMAGE_FEATURES` in ``local.conf``. For example:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 310 | |
| 311 | EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs" |
| 312 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 313 | Additionally, in order to generate the type of debugging information that perf |
| 314 | understands, we also need to set :term:`PACKAGE_DEBUG_SPLIT_STYLE` |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 315 | in the ``local.conf`` file:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 316 | |
| 317 | PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory' |
| 318 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 319 | Once we've done that, we can install the debugging information for BusyBox. The |
| 320 | debug packages once built can be found in ``build/tmp/deploy/rpm/*`` |
| 321 | on the host system. Find the ``busybox-dbg-...rpm`` file and copy it |
| 322 | to the target. For example:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 323 | |
| 324 | [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2_32/busybox-dbg-1.20.2-r2.core2_32.rpm root@192.168.1.31: |
| 325 | busybox-dbg-1.20.2-r2.core2_32.rpm 100% 1826KB 1.8MB/s 00:01 |
| 326 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 327 | Now install the debug RPM on the target:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 328 | |
| 329 | root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2_32.rpm |
| 330 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 331 | Now that the debugging information is installed, we see that the BusyBox entries now display |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 332 | their functions symbolically: |
| 333 | |
| 334 | .. image:: figures/perf-wget-busybox-debuginfo.png |
| 335 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 336 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 337 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 338 | If we expand one of the entries and press ``Enter`` on a leaf node, we're |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 339 | presented with a menu of actions we can take to get more information |
| 340 | related to that entry: |
| 341 | |
| 342 | .. image:: figures/perf-wget-busybox-dso-zoom-menu.png |
| 343 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 344 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 345 | |
| 346 | One of these actions allows us to show a view that displays a |
| 347 | busybox-centric view of the profiled functions (in this case we've also |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 348 | expanded all the nodes using the ``E`` key): |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 349 | |
| 350 | .. image:: figures/perf-wget-busybox-dso-zoom.png |
| 351 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 352 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 353 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 354 | Finally, we can see that now that the BusyBox debugging information is installed, |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 355 | the previously unresolved symbol in the ``sys_clock_gettime()`` entry |
| 356 | mentioned previously is now resolved, and shows that the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 357 | ``sys_clock_gettime`` system call that was the source of 6.75% of the |
| 358 | ``copy-to-user`` overhead was initiated by the ``handle_input()`` BusyBox |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 359 | function: |
| 360 | |
| 361 | .. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png |
| 362 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 363 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 364 | |
| 365 | At the lowest level of detail, we can dive down to the assembly level |
| 366 | and see which instructions caused the most overhead in a function. |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 367 | Pressing ``Enter`` on the ``udhcpc_main`` function, we're again presented |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 368 | with a menu: |
| 369 | |
| 370 | .. image:: figures/perf-wget-busybox-annotate-menu.png |
| 371 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 372 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 373 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 374 | Selecting ``Annotate udhcpc_main``, we get a detailed listing of |
| 375 | percentages by instruction for the ``udhcpc_main`` function. From the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 376 | display, we can see that over 50% of the time spent in this function is |
| 377 | taken up by a couple tests and the move of a constant (1) to a register: |
| 378 | |
| 379 | .. image:: figures/perf-wget-busybox-annotate-udhcpc.png |
| 380 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 381 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 382 | |
| 383 | As a segue into tracing, let's try another profile using a different |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 384 | counter, something other than the default ``cycles``. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 385 | |
| 386 | The tracing and profiling infrastructure in Linux has become unified in |
| 387 | a way that allows us to use the same tool with a completely different |
| 388 | set of counters, not just the standard hardware counters that |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 389 | traditional tools have had to restrict themselves to (the |
| 390 | traditional tools can now actually make use of the expanded possibilities now |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 391 | available to them, and in some cases have, as mentioned previously). |
| 392 | |
| 393 | We can get a list of the available events that can be used to profile a |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 394 | workload via ``perf list``:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 395 | |
| 396 | root@crownbay:~# perf list |
| 397 | |
| 398 | List of pre-defined events (to be used in -e): |
| 399 | cpu-cycles OR cycles [Hardware event] |
| 400 | stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] |
| 401 | stalled-cycles-backend OR idle-cycles-backend [Hardware event] |
| 402 | instructions [Hardware event] |
| 403 | cache-references [Hardware event] |
| 404 | cache-misses [Hardware event] |
| 405 | branch-instructions OR branches [Hardware event] |
| 406 | branch-misses [Hardware event] |
| 407 | bus-cycles [Hardware event] |
| 408 | ref-cycles [Hardware event] |
| 409 | |
| 410 | cpu-clock [Software event] |
| 411 | task-clock [Software event] |
| 412 | page-faults OR faults [Software event] |
| 413 | minor-faults [Software event] |
| 414 | major-faults [Software event] |
| 415 | context-switches OR cs [Software event] |
| 416 | cpu-migrations OR migrations [Software event] |
| 417 | alignment-faults [Software event] |
| 418 | emulation-faults [Software event] |
| 419 | |
| 420 | L1-dcache-loads [Hardware cache event] |
| 421 | L1-dcache-load-misses [Hardware cache event] |
| 422 | L1-dcache-prefetch-misses [Hardware cache event] |
| 423 | L1-icache-loads [Hardware cache event] |
| 424 | L1-icache-load-misses [Hardware cache event] |
| 425 | . |
| 426 | . |
| 427 | . |
| 428 | rNNN [Raw hardware event descriptor] |
| 429 | cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] |
| 430 | (see 'perf list --help' on how to encode it) |
| 431 | |
| 432 | mem:<addr>[:access] [Hardware breakpoint] |
| 433 | |
| 434 | sunrpc:rpc_call_status [Tracepoint event] |
| 435 | sunrpc:rpc_bind_status [Tracepoint event] |
| 436 | sunrpc:rpc_connect_status [Tracepoint event] |
| 437 | sunrpc:rpc_task_begin [Tracepoint event] |
| 438 | skb:kfree_skb [Tracepoint event] |
| 439 | skb:consume_skb [Tracepoint event] |
| 440 | skb:skb_copy_datagram_iovec [Tracepoint event] |
| 441 | net:net_dev_xmit [Tracepoint event] |
| 442 | net:net_dev_queue [Tracepoint event] |
| 443 | net:netif_receive_skb [Tracepoint event] |
| 444 | net:netif_rx [Tracepoint event] |
| 445 | napi:napi_poll [Tracepoint event] |
| 446 | sock:sock_rcvqueue_full [Tracepoint event] |
| 447 | sock:sock_exceed_buf_limit [Tracepoint event] |
| 448 | udp:udp_fail_queue_rcv_skb [Tracepoint event] |
| 449 | hda:hda_send_cmd [Tracepoint event] |
| 450 | hda:hda_get_response [Tracepoint event] |
| 451 | hda:hda_bus_reset [Tracepoint event] |
| 452 | scsi:scsi_dispatch_cmd_start [Tracepoint event] |
| 453 | scsi:scsi_dispatch_cmd_error [Tracepoint event] |
| 454 | scsi:scsi_eh_wakeup [Tracepoint event] |
| 455 | drm:drm_vblank_event [Tracepoint event] |
| 456 | drm:drm_vblank_event_queued [Tracepoint event] |
| 457 | drm:drm_vblank_event_delivered [Tracepoint event] |
| 458 | random:mix_pool_bytes [Tracepoint event] |
| 459 | random:mix_pool_bytes_nolock [Tracepoint event] |
| 460 | random:credit_entropy_bits [Tracepoint event] |
| 461 | gpio:gpio_direction [Tracepoint event] |
| 462 | gpio:gpio_value [Tracepoint event] |
| 463 | block:block_rq_abort [Tracepoint event] |
| 464 | block:block_rq_requeue [Tracepoint event] |
| 465 | block:block_rq_issue [Tracepoint event] |
| 466 | block:block_bio_bounce [Tracepoint event] |
| 467 | block:block_bio_complete [Tracepoint event] |
| 468 | block:block_bio_backmerge [Tracepoint event] |
| 469 | . |
| 470 | . |
| 471 | writeback:writeback_wake_thread [Tracepoint event] |
| 472 | writeback:writeback_wake_forker_thread [Tracepoint event] |
| 473 | writeback:writeback_bdi_register [Tracepoint event] |
| 474 | . |
| 475 | . |
| 476 | writeback:writeback_single_inode_requeue [Tracepoint event] |
| 477 | writeback:writeback_single_inode [Tracepoint event] |
| 478 | kmem:kmalloc [Tracepoint event] |
| 479 | kmem:kmem_cache_alloc [Tracepoint event] |
| 480 | kmem:mm_page_alloc [Tracepoint event] |
| 481 | kmem:mm_page_alloc_zone_locked [Tracepoint event] |
| 482 | kmem:mm_page_pcpu_drain [Tracepoint event] |
| 483 | kmem:mm_page_alloc_extfrag [Tracepoint event] |
| 484 | vmscan:mm_vmscan_kswapd_sleep [Tracepoint event] |
| 485 | vmscan:mm_vmscan_kswapd_wake [Tracepoint event] |
| 486 | vmscan:mm_vmscan_wakeup_kswapd [Tracepoint event] |
| 487 | vmscan:mm_vmscan_direct_reclaim_begin [Tracepoint event] |
| 488 | . |
| 489 | . |
| 490 | module:module_get [Tracepoint event] |
| 491 | module:module_put [Tracepoint event] |
| 492 | module:module_request [Tracepoint event] |
| 493 | sched:sched_kthread_stop [Tracepoint event] |
| 494 | sched:sched_wakeup [Tracepoint event] |
| 495 | sched:sched_wakeup_new [Tracepoint event] |
| 496 | sched:sched_process_fork [Tracepoint event] |
| 497 | sched:sched_process_exec [Tracepoint event] |
| 498 | sched:sched_stat_runtime [Tracepoint event] |
| 499 | rcu:rcu_utilization [Tracepoint event] |
| 500 | workqueue:workqueue_queue_work [Tracepoint event] |
| 501 | workqueue:workqueue_execute_end [Tracepoint event] |
| 502 | signal:signal_generate [Tracepoint event] |
| 503 | signal:signal_deliver [Tracepoint event] |
| 504 | timer:timer_init [Tracepoint event] |
| 505 | timer:timer_start [Tracepoint event] |
| 506 | timer:hrtimer_cancel [Tracepoint event] |
| 507 | timer:itimer_state [Tracepoint event] |
| 508 | timer:itimer_expire [Tracepoint event] |
| 509 | irq:irq_handler_entry [Tracepoint event] |
| 510 | irq:irq_handler_exit [Tracepoint event] |
| 511 | irq:softirq_entry [Tracepoint event] |
| 512 | irq:softirq_exit [Tracepoint event] |
| 513 | irq:softirq_raise [Tracepoint event] |
| 514 | printk:console [Tracepoint event] |
| 515 | task:task_newtask [Tracepoint event] |
| 516 | task:task_rename [Tracepoint event] |
| 517 | syscalls:sys_enter_socketcall [Tracepoint event] |
| 518 | syscalls:sys_exit_socketcall [Tracepoint event] |
| 519 | . |
| 520 | . |
| 521 | . |
| 522 | syscalls:sys_enter_unshare [Tracepoint event] |
| 523 | syscalls:sys_exit_unshare [Tracepoint event] |
| 524 | raw_syscalls:sys_enter [Tracepoint event] |
| 525 | raw_syscalls:sys_exit [Tracepoint event] |
| 526 | |
| 527 | .. admonition:: Tying it Together |
| 528 | |
| 529 | These are exactly the same set of events defined by the trace event |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 530 | subsystem and exposed by ftrace / trace-cmd / KernelShark as files in |
| 531 | ``/sys/kernel/debug/tracing/events``, by SystemTap as |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 532 | kernel.trace("tracepoint_name") and (partially) accessed by LTTng. |
| 533 | |
| 534 | Only a subset of these would be of interest to us when looking at this |
| 535 | workload, so let's choose the most likely subsystems (identified by the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 536 | string before the colon in the ``Tracepoint`` events) and do a ``perf stat`` |
| 537 | run using only those subsystem wildcards:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 538 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 539 | root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 540 | Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2': |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 541 | |
| 542 | 23323 skb:kfree_skb |
| 543 | 0 skb:consume_skb |
| 544 | 49897 skb:skb_copy_datagram_iovec |
| 545 | 6217 net:net_dev_xmit |
| 546 | 6217 net:net_dev_queue |
| 547 | 7962 net:netif_receive_skb |
| 548 | 2 net:netif_rx |
| 549 | 8340 napi:napi_poll |
| 550 | 0 sched:sched_kthread_stop |
| 551 | 0 sched:sched_kthread_stop_ret |
| 552 | 3749 sched:sched_wakeup |
| 553 | 0 sched:sched_wakeup_new |
| 554 | 0 sched:sched_switch |
| 555 | 29 sched:sched_migrate_task |
| 556 | 0 sched:sched_process_free |
| 557 | 1 sched:sched_process_exit |
| 558 | 0 sched:sched_wait_task |
| 559 | 0 sched:sched_process_wait |
| 560 | 0 sched:sched_process_fork |
| 561 | 1 sched:sched_process_exec |
| 562 | 0 sched:sched_stat_wait |
| 563 | 2106519415641 sched:sched_stat_sleep |
| 564 | 0 sched:sched_stat_iowait |
| 565 | 147453613 sched:sched_stat_blocked |
| 566 | 12903026955 sched:sched_stat_runtime |
| 567 | 0 sched:sched_pi_setprio |
| 568 | 3574 workqueue:workqueue_queue_work |
| 569 | 3574 workqueue:workqueue_activate_work |
| 570 | 0 workqueue:workqueue_execute_start |
| 571 | 0 workqueue:workqueue_execute_end |
| 572 | 16631 irq:irq_handler_entry |
| 573 | 16631 irq:irq_handler_exit |
| 574 | 28521 irq:softirq_entry |
| 575 | 28521 irq:softirq_exit |
| 576 | 28728 irq:softirq_raise |
| 577 | 1 syscalls:sys_enter_sendmmsg |
| 578 | 1 syscalls:sys_exit_sendmmsg |
| 579 | 0 syscalls:sys_enter_recvmmsg |
| 580 | 0 syscalls:sys_exit_recvmmsg |
| 581 | 14 syscalls:sys_enter_socketcall |
| 582 | 14 syscalls:sys_exit_socketcall |
| 583 | . |
| 584 | . |
| 585 | . |
| 586 | 16965 syscalls:sys_enter_read |
| 587 | 16965 syscalls:sys_exit_read |
| 588 | 12854 syscalls:sys_enter_write |
| 589 | 12854 syscalls:sys_exit_write |
| 590 | . |
| 591 | . |
| 592 | . |
| 593 | |
| 594 | 58.029710972 seconds time elapsed |
| 595 | |
| 596 | |
| 597 | |
| 598 | Let's pick one of these tracepoints |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 599 | and tell perf to do a profile using it as the sampling event:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 600 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 601 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 602 | |
| 603 | .. image:: figures/sched-wakeup-profile.png |
| 604 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 605 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 606 | |
| 607 | The screenshot above shows the results of running a profile using |
| 608 | sched:sched_switch tracepoint, which shows the relative costs of various |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 609 | paths to ``sched_wakeup`` (note that ``sched_wakeup`` is the name of the |
| 610 | tracepoint --- it's actually defined just inside ``ttwu_do_wakeup()``, which |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 611 | accounts for the function name actually displayed in the profile: |
| 612 | |
| 613 | .. code-block:: c |
| 614 | |
| 615 | /* |
| 616 | * Mark the task runnable and perform wakeup-preemption. |
| 617 | */ |
| 618 | static void |
| 619 | ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags) |
| 620 | { |
| 621 | trace_sched_wakeup(p, true); |
| 622 | . |
| 623 | . |
| 624 | . |
| 625 | } |
| 626 | |
| 627 | A couple of the more interesting |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 628 | call chains are expanded and displayed above, basically some network |
| 629 | receive paths that presumably end up waking up wget (BusyBox) when |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 630 | network data is ready. |
| 631 | |
| 632 | Note that because tracepoints are normally used for tracing, the default |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 633 | sampling period for tracepoints is ``1`` i.e. for tracepoints perf will |
| 634 | sample on every event occurrence (this can be changed using the ``-c`` |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 635 | option). This is in contrast to hardware counters such as for example |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 636 | the default ``cycles`` hardware counter used for normal profiling, where |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 637 | sampling periods are much higher (in the thousands) because profiling |
| 638 | should have as low an overhead as possible and sampling on every cycle |
| 639 | would be prohibitively expensive. |
| 640 | |
| 641 | Using perf to do Basic Tracing |
| 642 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 643 | |
| 644 | Profiling is a great tool for solving many problems or for getting a |
| 645 | high-level view of what's going on with a workload or across the system. |
| 646 | It is however by definition an approximation, as suggested by the most |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 647 | prominent word associated with it, ``sampling``. On the one hand, it |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 648 | allows a representative picture of what's going on in the system to be |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 649 | cheaply taken, but alternatively, that cheapness limits its utility |
| 650 | when that data suggests a need to "dive down" more deeply to discover |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 651 | what's really going on. In such cases, the only way to see what's really |
| 652 | going on is to be able to look at (or summarize more intelligently) the |
| 653 | individual steps that go into the higher-level behavior exposed by the |
| 654 | coarse-grained profiling data. |
| 655 | |
| 656 | As a concrete example, we can trace all the events we think might be |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 657 | applicable to our workload:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 658 | |
| 659 | root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:* |
| 660 | -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 661 | wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 662 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 663 | We can look at the raw trace output using ``perf script`` with no |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 664 | arguments:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 665 | |
| 666 | root@crownbay:~# perf script |
| 667 | |
| 668 | perf 1262 [000] 11624.857082: sys_exit_read: 0x0 |
| 669 | perf 1262 [000] 11624.857193: sched_wakeup: comm=migration/0 pid=6 prio=0 success=1 target_cpu=000 |
| 670 | wget 1262 [001] 11624.858021: softirq_raise: vec=1 [action=TIMER] |
| 671 | wget 1262 [001] 11624.858074: softirq_entry: vec=1 [action=TIMER] |
| 672 | wget 1262 [001] 11624.858081: softirq_exit: vec=1 [action=TIMER] |
| 673 | wget 1262 [001] 11624.858166: sys_enter_read: fd: 0x0003, buf: 0xbf82c940, count: 0x0200 |
| 674 | wget 1262 [001] 11624.858177: sys_exit_read: 0x200 |
| 675 | wget 1262 [001] 11624.858878: kfree_skb: skbaddr=0xeb248d80 protocol=0 location=0xc15a5308 |
| 676 | wget 1262 [001] 11624.858945: kfree_skb: skbaddr=0xeb248000 protocol=0 location=0xc15a5308 |
| 677 | wget 1262 [001] 11624.859020: softirq_raise: vec=1 [action=TIMER] |
| 678 | wget 1262 [001] 11624.859076: softirq_entry: vec=1 [action=TIMER] |
| 679 | wget 1262 [001] 11624.859083: softirq_exit: vec=1 [action=TIMER] |
| 680 | wget 1262 [001] 11624.859167: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 |
| 681 | wget 1262 [001] 11624.859192: sys_exit_read: 0x1d7 |
| 682 | wget 1262 [001] 11624.859228: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 |
| 683 | wget 1262 [001] 11624.859233: sys_exit_read: 0x0 |
| 684 | wget 1262 [001] 11624.859573: sys_enter_read: fd: 0x0003, buf: 0xbf82c580, count: 0x0200 |
| 685 | wget 1262 [001] 11624.859584: sys_exit_read: 0x200 |
| 686 | wget 1262 [001] 11624.859864: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 |
| 687 | wget 1262 [001] 11624.859888: sys_exit_read: 0x400 |
| 688 | wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 |
| 689 | wget 1262 [001] 11624.859944: sys_exit_read: 0x400 |
| 690 | |
| 691 | This gives us a detailed timestamped sequence of events that occurred within the |
| 692 | workload with respect to those events. |
| 693 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 694 | In many ways, profiling can be viewed as a subset of tracing --- |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 695 | theoretically, if you have a set of trace events that's sufficient to |
| 696 | capture all the important aspects of a workload, you can derive any of |
| 697 | the results or views that a profiling run can. |
| 698 | |
| 699 | Another aspect of traditional profiling is that while powerful in many |
| 700 | ways, it's limited by the granularity of the underlying data. Profiling |
| 701 | tools offer various ways of sorting and presenting the sample data, |
| 702 | which make it much more useful and amenable to user experimentation, but |
| 703 | in the end it can't be used in an open-ended way to extract data that |
| 704 | just isn't present as a consequence of the fact that conceptually, most |
| 705 | of it has been thrown away. |
| 706 | |
| 707 | Full-blown detailed tracing data does however offer the opportunity to |
| 708 | manipulate and present the information collected during a tracing run in |
| 709 | an infinite variety of ways. |
| 710 | |
| 711 | Another way to look at it is that there are only so many ways that the |
| 712 | 'primitive' counters can be used on their own to generate interesting |
| 713 | output; to get anything more complicated than simple counts requires |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 714 | some amount of additional logic, which is typically specific to the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 715 | problem at hand. For example, if we wanted to make use of a 'counter' |
| 716 | that maps to the value of the time difference between when a process was |
| 717 | scheduled to run on a processor and the time it actually ran, we |
| 718 | wouldn't expect such a counter to exist on its own, but we could derive |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 719 | one called say ``wakeup_latency`` and use it to extract a useful view of |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 720 | that metric from trace data. Likewise, we really can't figure out from |
| 721 | standard profiling tools how much data every process on the system reads |
| 722 | and writes, along with how many of those reads and writes fail |
| 723 | completely. If we have sufficient trace data, however, we could with the |
| 724 | right tools easily extract and present that information, but we'd need |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 725 | something other than ready-made profiling tools to do that. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 726 | |
| 727 | Luckily, there is a general-purpose way to handle such needs, called |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 728 | "programming languages". Making programming languages easily available |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 729 | to apply to such problems given the specific format of data is called a |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 730 | 'programming language binding' for that data and language. perf supports |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 731 | two programming language bindings, one for Python and one for Perl. |
| 732 | |
| 733 | .. admonition:: Tying it Together |
| 734 | |
| 735 | Language bindings for manipulating and aggregating trace data are of |
| 736 | course not a new idea. One of the first projects to do this was IBM's |
| 737 | DProbes dpcc compiler, an ANSI C compiler which targeted a low-level |
| 738 | assembly language running on an in-kernel interpreter on the target |
| 739 | system. This is exactly analogous to what Sun's DTrace did, except |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 740 | that DTrace invented its own language for the purpose. SystemTap, |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 741 | heavily inspired by DTrace, also created its own one-off language, |
| 742 | but rather than running the product on an in-kernel interpreter, |
| 743 | created an elaborate compiler-based machinery to translate its |
| 744 | language into kernel modules written in C. |
| 745 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 746 | Now that we have the trace data in ``perf.data``, we can use ``perf script |
| 747 | -g`` to generate a skeleton script with handlers for the read / write |
| 748 | entry / exit events we recorded:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 749 | |
| 750 | root@crownbay:~# perf script -g python |
| 751 | generated Python script: perf-script.py |
| 752 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 753 | The skeleton script just creates a Python function for each event type in the |
| 754 | ``perf.data`` file. The body of each function just prints the event name along |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 755 | with its parameters. For example: |
| 756 | |
| 757 | .. code-block:: python |
| 758 | |
| 759 | def net__netif_rx(event_name, context, common_cpu, |
| 760 | common_secs, common_nsecs, common_pid, common_comm, |
| 761 | skbaddr, len, name): |
| 762 | print_header(event_name, common_cpu, common_secs, common_nsecs, |
| 763 | common_pid, common_comm) |
| 764 | |
| 765 | print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name), |
| 766 | |
| 767 | We can run that script directly to print all of the events contained in the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 768 | ``perf.data`` file:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 769 | |
| 770 | root@crownbay:~# perf script -s perf-script.py |
| 771 | |
| 772 | in trace_begin |
| 773 | syscalls__sys_exit_read 0 11624.857082795 1262 perf nr=3, ret=0 |
| 774 | sched__sched_wakeup 0 11624.857193498 1262 perf comm=migration/0, pid=6, prio=0, success=1, target_cpu=0 |
| 775 | irq__softirq_raise 1 11624.858021635 1262 wget vec=TIMER |
| 776 | irq__softirq_entry 1 11624.858074075 1262 wget vec=TIMER |
| 777 | irq__softirq_exit 1 11624.858081389 1262 wget vec=TIMER |
| 778 | syscalls__sys_enter_read 1 11624.858166434 1262 wget nr=3, fd=3, buf=3213019456, count=512 |
| 779 | syscalls__sys_exit_read 1 11624.858177924 1262 wget nr=3, ret=512 |
| 780 | skb__kfree_skb 1 11624.858878188 1262 wget skbaddr=3945041280, location=3243922184, protocol=0 |
| 781 | skb__kfree_skb 1 11624.858945608 1262 wget skbaddr=3945037824, location=3243922184, protocol=0 |
| 782 | irq__softirq_raise 1 11624.859020942 1262 wget vec=TIMER |
| 783 | irq__softirq_entry 1 11624.859076935 1262 wget vec=TIMER |
| 784 | irq__softirq_exit 1 11624.859083469 1262 wget vec=TIMER |
| 785 | syscalls__sys_enter_read 1 11624.859167565 1262 wget nr=3, fd=3, buf=3077701632, count=1024 |
| 786 | syscalls__sys_exit_read 1 11624.859192533 1262 wget nr=3, ret=471 |
| 787 | syscalls__sys_enter_read 1 11624.859228072 1262 wget nr=3, fd=3, buf=3077701632, count=1024 |
| 788 | syscalls__sys_exit_read 1 11624.859233707 1262 wget nr=3, ret=0 |
| 789 | syscalls__sys_enter_read 1 11624.859573008 1262 wget nr=3, fd=3, buf=3213018496, count=512 |
| 790 | syscalls__sys_exit_read 1 11624.859584818 1262 wget nr=3, ret=512 |
| 791 | syscalls__sys_enter_read 1 11624.859864562 1262 wget nr=3, fd=3, buf=3077701632, count=1024 |
| 792 | syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024 |
| 793 | syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024 |
| 794 | syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024 |
| 795 | |
| 796 | That in itself isn't very useful; after all, we can accomplish pretty much the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 797 | same thing by just running ``perf script`` without arguments in the same |
| 798 | directory as the ``perf.data`` file. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 799 | |
| 800 | We can however replace the print statements in the generated function |
| 801 | bodies with whatever we want, and thereby make it infinitely more |
| 802 | useful. |
| 803 | |
| 804 | As a simple example, let's just replace the print statements in the |
| 805 | function bodies with a simple function that does nothing but increment a |
| 806 | per-event count. When the program is run against a perf.data file, each |
| 807 | time a particular event is encountered, a tally is incremented for that |
| 808 | event. For example: |
| 809 | |
| 810 | .. code-block:: python |
| 811 | |
| 812 | def net__netif_rx(event_name, context, common_cpu, |
| 813 | common_secs, common_nsecs, common_pid, common_comm, |
| 814 | skbaddr, len, name): |
| 815 | inc_counts(event_name) |
| 816 | |
| 817 | Each event handler function in the generated code |
| 818 | is modified to do this. For convenience, we define a common function |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 819 | called ``inc_counts()`` that each handler calls; ``inc_counts()`` just tallies |
| 820 | a count for each event using the ``counts`` hash, which is a specialized |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 821 | hash function that does Perl-like autovivification, a capability that's |
| 822 | extremely useful for kinds of multi-level aggregation commonly used in |
| 823 | processing traces (see perf's documentation on the Python language |
| 824 | binding for details): |
| 825 | |
| 826 | .. code-block:: python |
| 827 | |
| 828 | counts = autodict() |
| 829 | |
| 830 | def inc_counts(event_name): |
| 831 | try: |
| 832 | counts[event_name] += 1 |
| 833 | except TypeError: |
| 834 | counts[event_name] = 1 |
| 835 | |
| 836 | Finally, at the end of the trace processing run, we want to print the |
| 837 | result of all the per-event tallies. For that, we use the special |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 838 | ``trace_end()`` function: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 839 | |
| 840 | .. code-block:: python |
| 841 | |
| 842 | def trace_end(): |
| 843 | for event_name, count in counts.iteritems(): |
| 844 | print "%-40s %10s\n" % (event_name, count) |
| 845 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 846 | The end result is a summary of all the events recorded in the trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 847 | |
| 848 | skb__skb_copy_datagram_iovec 13148 |
| 849 | irq__softirq_entry 4796 |
| 850 | irq__irq_handler_exit 3805 |
| 851 | irq__softirq_exit 4795 |
| 852 | syscalls__sys_enter_write 8990 |
| 853 | net__net_dev_xmit 652 |
| 854 | skb__kfree_skb 4047 |
| 855 | sched__sched_wakeup 1155 |
| 856 | irq__irq_handler_entry 3804 |
| 857 | irq__softirq_raise 4799 |
| 858 | net__net_dev_queue 652 |
| 859 | syscalls__sys_enter_read 17599 |
| 860 | net__netif_receive_skb 1743 |
| 861 | syscalls__sys_exit_read 17598 |
| 862 | net__netif_rx 2 |
| 863 | napi__napi_poll 1877 |
| 864 | syscalls__sys_exit_write 8990 |
| 865 | |
| 866 | Note that this is |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 867 | pretty much exactly the same information we get from ``perf stat``, which |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 868 | goes a little way to support the idea mentioned previously that given |
| 869 | the right kind of trace data, higher-level profiling-type summaries can |
| 870 | be derived from it. |
| 871 | |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 872 | Documentation on using the `'perf script' Python |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 873 | binding <https://linux.die.net/man/1/perf-script-python>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 874 | |
| 875 | System-Wide Tracing and Profiling |
| 876 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 877 | |
| 878 | The examples so far have focused on tracing a particular program or |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 879 | workload --- that is, every profiling run has specified the program |
| 880 | to profile in the command-line e.g. ``perf record wget ...``. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 881 | |
| 882 | It's also possible, and more interesting in many cases, to run a |
| 883 | system-wide profile or trace while running the workload in a separate |
| 884 | shell. |
| 885 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 886 | To do system-wide profiling or tracing, you typically use the ``-a`` flag to |
| 887 | ``perf record``. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 888 | |
| 889 | To demonstrate this, open up one window and start the profile using the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 890 | ``-a`` flag (press ``Ctrl-C`` to stop tracing):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 891 | |
| 892 | root@crownbay:~# perf record -g -a |
| 893 | ^C[ perf record: Woken up 6 times to write data ] |
| 894 | [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] |
| 895 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 896 | In another window, run the ``wget`` test:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 897 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 898 | root@crownbay:~# wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 899 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 900 | linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA |
| 901 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 902 | Here we see entries not only for our ``wget`` load, but for |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 903 | other processes running on the system as well: |
| 904 | |
| 905 | .. image:: figures/perf-systemwide.png |
| 906 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 907 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 908 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 909 | In the snapshot above, we can see call chains that originate in ``libc``, and |
| 910 | a call chain from ``Xorg`` that demonstrates that we're using a proprietary X |
| 911 | driver in user space (notice the presence of ``PVR`` and some other |
| 912 | unresolvable symbols in the expanded ``Xorg`` call chain). |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 913 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 914 | Note also that we have both kernel and user space entries in the above |
| 915 | snapshot. We can also tell perf to focus on user space but providing a |
| 916 | modifier, in this case ``u``, to the ``cycles`` hardware counter when we |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 917 | record a profile:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 918 | |
| 919 | root@crownbay:~# perf record -g -a -e cycles:u |
| 920 | ^C[ perf record: Woken up 2 times to write data ] |
| 921 | [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] |
| 922 | |
| 923 | .. image:: figures/perf-report-cycles-u.png |
| 924 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 925 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 926 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 927 | Notice in the screenshot above, we see only user space entries (``[.]``) |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 928 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 929 | Finally, we can press ``Enter`` on a leaf node and select the ``Zoom into |
| 930 | DSO`` menu item to show only entries associated with a specific DSO. In |
| 931 | the screenshot below, we've zoomed into the ``libc`` DSO which shows all |
| 932 | the entries associated with the ``libc-xxx.so`` DSO. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 933 | |
| 934 | .. image:: figures/perf-systemwide-libc.png |
| 935 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 936 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 937 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 938 | We can also use the system-wide ``-a`` switch to do system-wide tracing. |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 939 | Here we'll trace a couple of scheduler events:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 940 | |
| 941 | root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup |
| 942 | ^C[ perf record: Woken up 38 times to write data ] |
| 943 | [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] |
| 944 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 945 | We can look at the raw output using ``perf script`` with no arguments:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 946 | |
| 947 | root@crownbay:~# perf script |
| 948 | |
| 949 | perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 950 | perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 |
| 951 | kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 |
| 952 | swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 |
| 953 | swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 |
| 954 | kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 |
| 955 | perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 956 | perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 |
| 957 | kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 |
| 958 | perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 959 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 960 | Filtering |
| 961 | ^^^^^^^^^ |
| 962 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 963 | Notice that there are many events that don't really have anything to |
| 964 | do with what we're interested in, namely events that schedule ``perf`` |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 965 | itself in and out or that wake perf up. We can get rid of those by using |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 966 | the ``--filter`` option --- for each event we specify using ``-e``, we can add a |
| 967 | ``--filter`` after that to filter out trace events that contain fields with |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 968 | specific values:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 969 | |
| 970 | root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' |
| 971 | ^C[ perf record: Woken up 38 times to write data ] |
| 972 | [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] |
| 973 | |
| 974 | |
| 975 | root@crownbay:~# perf script |
| 976 | |
| 977 | swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 |
| 978 | kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 |
| 979 | perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 980 | perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 981 | perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 982 | perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 983 | perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 984 | perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 985 | swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 986 | swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 |
| 987 | kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120 |
| 988 | swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 |
| 989 | swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 |
| 990 | kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 |
| 991 | |
| 992 | In this case, we've filtered out all events that have |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 993 | ``perf`` in their ``comm``, ``comm_prev`` or ``comm_next`` fields. Notice that |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 994 | there are still events recorded for perf, but notice that those events |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 995 | don't have values of ``perf`` for the filtered fields. To completely |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 996 | filter out anything from perf will require a bit more work, but for the |
| 997 | purpose of demonstrating how to use filters, it's close enough. |
| 998 | |
| 999 | .. admonition:: Tying it Together |
| 1000 | |
| 1001 | These are exactly the same set of event filters defined by the trace |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1002 | event subsystem. See the ftrace / trace-cmd / KernelShark section for more |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1003 | discussion about these event filters. |
| 1004 | |
| 1005 | .. admonition:: Tying it Together |
| 1006 | |
| 1007 | These event filters are implemented by a special-purpose |
| 1008 | pseudo-interpreter in the kernel and are an integral and |
| 1009 | indispensable part of the perf design as it relates to tracing. |
| 1010 | kernel-based event filters provide a mechanism to precisely throttle |
| 1011 | the event stream that appears in user space, where it makes sense to |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1012 | provide bindings to real programming languages for post-processing the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1013 | event stream. This architecture allows for the intelligent and |
| 1014 | flexible partitioning of processing between the kernel and user |
| 1015 | space. Contrast this with other tools such as SystemTap, which does |
| 1016 | all of its processing in the kernel and as such requires a special |
| 1017 | project-defined language in order to accommodate that design, or |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1018 | LTTng, where everything is sent to user space and as such requires a |
| 1019 | super-efficient kernel-to-user space transport mechanism in order to |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1020 | function properly. While perf certainly can benefit from for instance |
| 1021 | advances in the design of the transport, it doesn't fundamentally |
| 1022 | depend on them. Basically, if you find that your perf tracing |
| 1023 | application is causing buffer I/O overruns, it probably means that |
| 1024 | you aren't taking enough advantage of the kernel filtering engine. |
| 1025 | |
| 1026 | Using Dynamic Tracepoints |
| 1027 | ~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 1028 | |
| 1029 | perf isn't restricted to the fixed set of static tracepoints listed by |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1030 | ``perf list``. Users can also add their own "dynamic" tracepoints anywhere |
| 1031 | in the kernel. For example, suppose we want to define our own |
| 1032 | tracepoint on ``do_fork()``. We can do that using the ``perf probe`` perf |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1033 | subcommand:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1034 | |
| 1035 | root@crownbay:~# perf probe do_fork |
| 1036 | Added new event: |
| 1037 | probe:do_fork (on do_fork) |
| 1038 | |
| 1039 | You can now use it in all perf tools, such as: |
| 1040 | |
| 1041 | perf record -e probe:do_fork -aR sleep 1 |
| 1042 | |
| 1043 | Adding a new tracepoint via |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1044 | ``perf probe`` results in an event with all the expected files and format |
| 1045 | in ``/sys/kernel/debug/tracing/events``, just the same as for static |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1046 | tracepoints (as discussed in more detail in the trace events subsystem |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1047 | section:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1048 | |
| 1049 | root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al |
| 1050 | drwxr-xr-x 2 root root 0 Oct 28 11:42 . |
| 1051 | drwxr-xr-x 3 root root 0 Oct 28 11:42 .. |
| 1052 | -rw-r--r-- 1 root root 0 Oct 28 11:42 enable |
| 1053 | -rw-r--r-- 1 root root 0 Oct 28 11:42 filter |
| 1054 | -r--r--r-- 1 root root 0 Oct 28 11:42 format |
| 1055 | -r--r--r-- 1 root root 0 Oct 28 11:42 id |
| 1056 | |
| 1057 | root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format |
| 1058 | name: do_fork |
| 1059 | ID: 944 |
| 1060 | format: |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 1061 | field:unsigned short common_type; offset:0; size:2; signed:0; |
| 1062 | field:unsigned char common_flags; offset:2; size:1; signed:0; |
| 1063 | field:unsigned char common_preempt_count; offset:3; size:1; signed:0; |
| 1064 | field:int common_pid; offset:4; size:4; signed:1; |
| 1065 | field:int common_padding; offset:8; size:4; signed:1; |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1066 | |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 1067 | field:unsigned long __probe_ip; offset:12; size:4; signed:0; |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1068 | |
| 1069 | print fmt: "(%lx)", REC->__probe_ip |
| 1070 | |
| 1071 | We can list all dynamic tracepoints currently in |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1072 | existence:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1073 | |
| 1074 | root@crownbay:~# perf probe -l |
| 1075 | probe:do_fork (on do_fork) |
| 1076 | probe:schedule (on schedule) |
| 1077 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1078 | Let's record system-wide (``sleep 30`` is a |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1079 | trick for recording system-wide but basically do nothing and then wake |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1080 | up after 30 seconds):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1081 | |
| 1082 | root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 |
| 1083 | [ perf record: Woken up 1 times to write data ] |
| 1084 | [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] |
| 1085 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1086 | Using ``perf script`` we can see each ``do_fork`` event that fired:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1087 | |
| 1088 | root@crownbay:~# perf script |
| 1089 | |
| 1090 | # ======== |
| 1091 | # captured on: Sun Oct 28 11:55:18 2012 |
| 1092 | # hostname : crownbay |
| 1093 | # os release : 3.4.11-yocto-standard |
| 1094 | # perf version : 3.4.11 |
| 1095 | # arch : i686 |
| 1096 | # nrcpus online : 2 |
| 1097 | # nrcpus avail : 2 |
| 1098 | # cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz |
| 1099 | # cpuid : GenuineIntel,6,38,1 |
| 1100 | # total memory : 1017184 kB |
| 1101 | # cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30 |
| 1102 | # event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern |
| 1103 | = 0, id = { 5, 6 } |
| 1104 | # HEADER_CPU_TOPOLOGY info available, use -I to display |
| 1105 | # ======== |
| 1106 | # |
| 1107 | matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460) |
| 1108 | matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460) |
| 1109 | pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460) |
| 1110 | pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460) |
| 1111 | matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460) |
| 1112 | matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460) |
| 1113 | gthumb 1300 [001] 34217.697451: do_fork: (c1028460) |
| 1114 | gthumb 1300 [001] 34219.085734: do_fork: (c1028460) |
| 1115 | gthumb 1300 [000] 34219.121351: do_fork: (c1028460) |
| 1116 | gthumb 1300 [001] 34219.264551: do_fork: (c1028460) |
| 1117 | pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460) |
| 1118 | matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460) |
| 1119 | matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460) |
| 1120 | matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460) |
| 1121 | matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460) |
| 1122 | matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460) |
| 1123 | matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) |
| 1124 | matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) |
| 1125 | gaku 1312 [000] 34237.202388: do_fork: (c1028460) |
| 1126 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1127 | And using ``perf report`` on the same file, we can see the |
| 1128 | call graphs from starting a few programs during those 30 seconds: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1129 | |
| 1130 | .. image:: figures/perf-probe-do_fork-profile.png |
| 1131 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 1132 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1133 | |
| 1134 | .. admonition:: Tying it Together |
| 1135 | |
| 1136 | The trace events subsystem accommodate static and dynamic tracepoints |
Andrew Geissler | 615f2f1 | 2022-07-15 14:00:58 -0500 | [diff] [blame] | 1137 | in exactly the same way --- there's no difference as far as the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1138 | infrastructure is concerned. See the ftrace section for more details |
| 1139 | on the trace event subsystem. |
| 1140 | |
| 1141 | .. admonition:: Tying it Together |
| 1142 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1143 | Dynamic tracepoints are implemented under the covers by Kprobes and |
| 1144 | Uprobes. Kprobes and Uprobes are also used by and in fact are the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1145 | main focus of SystemTap. |
| 1146 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1147 | perf Documentation |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1148 | ------------------ |
| 1149 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1150 | Online versions of the manual pages for the commands discussed in this |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1151 | section can be found here: |
| 1152 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1153 | - The `'perf stat' manual page <https://linux.die.net/man/1/perf-stat>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1154 | |
| 1155 | - The `'perf record' |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1156 | manual page <https://linux.die.net/man/1/perf-record>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1157 | |
| 1158 | - The `'perf report' |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1159 | manual page <https://linux.die.net/man/1/perf-report>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1160 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1161 | - The `'perf probe' manual page <https://linux.die.net/man/1/perf-probe>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1162 | |
| 1163 | - The `'perf script' |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1164 | manual page <https://linux.die.net/man/1/perf-script>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1165 | |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 1166 | - Documentation on using the `'perf script' Python |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 1167 | binding <https://linux.die.net/man/1/perf-script-python>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1168 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1169 | - The top-level `perf(1) manual page <https://linux.die.net/man/1/perf>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1170 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1171 | Normally, you should be able to open the manual pages via perf itself |
| 1172 | e.g. ``perf help`` or ``perf help record``. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1173 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1174 | To have the perf manual pages installed on your target, modify your |
Patrick Williams | 213cb26 | 2021-08-07 19:21:33 -0500 | [diff] [blame] | 1175 | configuration as follows:: |
| 1176 | |
| 1177 | IMAGE_INSTALL:append = " perf perf-doc" |
| 1178 | DISTRO_FEATURES:append = " api-documentation" |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1179 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1180 | The manual pages in text form, along with some other files, such as a set |
| 1181 | of examples, can also be found in the ``perf`` directory of the kernel tree:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1182 | |
| 1183 | tools/perf/Documentation |
| 1184 | |
| 1185 | There's also a nice perf tutorial on the perf |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1186 | wiki that goes into more detail than we do here in certain areas: `perf |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1187 | Tutorial <https://perf.wiki.kernel.org/index.php/Tutorial>`__ |
| 1188 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1189 | ftrace |
| 1190 | ====== |
| 1191 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1192 | "ftrace" literally refers to the "ftrace function tracer" but in reality |
| 1193 | this encompasses several related tracers along with the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1194 | infrastructure that they all make use of. |
| 1195 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1196 | ftrace Setup |
| 1197 | ------------ |
| 1198 | |
| 1199 | For this section, we'll assume you've already performed the basic setup |
Andrew Geissler | 09209ee | 2020-12-13 08:44:15 -0600 | [diff] [blame] | 1200 | outlined in the ":ref:`profile-manual/intro:General Setup`" section. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1201 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1202 | ftrace, trace-cmd, and KernelShark run on the target system, and are |
Andrew Geissler | 615f2f1 | 2022-07-15 14:00:58 -0500 | [diff] [blame] | 1203 | ready to go out-of-the-box --- no additional setup is necessary. For the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1204 | rest of this section we assume you're connected to the host through SSH and |
| 1205 | will be running ftrace on the target. KernelShark is a GUI application and if |
| 1206 | you use the ``-X`` option to ``ssh`` you can have the KernelShark GUI run on |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1207 | the target but display remotely on the host if you want. |
| 1208 | |
| 1209 | Basic ftrace usage |
| 1210 | ------------------ |
| 1211 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1212 | "ftrace" essentially refers to everything included in the ``/tracing`` |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1213 | directory of the mounted debugfs filesystem (Yocto follows the standard |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1214 | convention and mounts it at ``/sys/kernel/debug``). All the files found in |
| 1215 | ``/sys/kernel/debug/tracing`` on a Yocto system are:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1216 | |
| 1217 | root@sugarbay:/sys/kernel/debug/tracing# ls |
| 1218 | README kprobe_events trace |
| 1219 | available_events kprobe_profile trace_clock |
| 1220 | available_filter_functions options trace_marker |
| 1221 | available_tracers per_cpu trace_options |
| 1222 | buffer_size_kb printk_formats trace_pipe |
| 1223 | buffer_total_size_kb saved_cmdlines tracing_cpumask |
| 1224 | current_tracer set_event tracing_enabled |
| 1225 | dyn_ftrace_total_info set_ftrace_filter tracing_on |
| 1226 | enabled_functions set_ftrace_notrace tracing_thresh |
| 1227 | events set_ftrace_pid |
| 1228 | free_buffer set_graph_function |
| 1229 | |
| 1230 | The files listed above are used for various purposes |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1231 | --- some relate directly to the tracers themselves, others are used to set |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1232 | tracing options, and yet others actually contain the tracing output when |
| 1233 | a tracer is in effect. Some of the functions can be guessed from their |
| 1234 | names, others need explanation; in any case, we'll cover some of the |
| 1235 | files we see here below but for an explanation of the others, please see |
| 1236 | the ftrace documentation. |
| 1237 | |
| 1238 | We'll start by looking at some of the available built-in tracers. |
| 1239 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1240 | The ``available_tracers`` file lists the set of available tracers:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1241 | |
| 1242 | root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers |
| 1243 | blk function_graph function nop |
| 1244 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1245 | The ``current_tracer`` file contains the tracer currently in effect:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1246 | |
| 1247 | root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer |
| 1248 | nop |
| 1249 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1250 | The above listing of ``current_tracer`` shows that the |
| 1251 | ``nop`` tracer is in effect, which is just another way of saying that |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1252 | there's actually no tracer currently in effect. |
| 1253 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1254 | Writing one of the available tracers into ``current_tracer`` makes the |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1255 | specified tracer the current tracer:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1256 | |
| 1257 | root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer |
| 1258 | root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer |
| 1259 | function |
| 1260 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1261 | The above sets the current tracer to be the ``function`` tracer. This tracer |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1262 | traces every function call in the kernel and makes it available as the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1263 | contents of the ``trace`` file. Reading the ``trace`` file lists the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1264 | currently buffered function calls that have been traced by the function |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1265 | tracer:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1266 | |
| 1267 | root@sugarbay:/sys/kernel/debug/tracing# cat trace | less |
| 1268 | |
| 1269 | # tracer: function |
| 1270 | # |
| 1271 | # entries-in-buffer/entries-written: 310629/766471 #P:8 |
| 1272 | # |
| 1273 | # _-----=> irqs-off |
| 1274 | # / _----=> need-resched |
| 1275 | # | / _---=> hardirq/softirq |
| 1276 | # || / _--=> preempt-depth |
| 1277 | # ||| / delay |
| 1278 | # TASK-PID CPU# |||| TIMESTAMP FUNCTION |
| 1279 | # | | | |||| | | |
| 1280 | <idle>-0 [004] d..1 470.867169: ktime_get_real <-intel_idle |
| 1281 | <idle>-0 [004] d..1 470.867170: getnstimeofday <-ktime_get_real |
| 1282 | <idle>-0 [004] d..1 470.867171: ns_to_timeval <-intel_idle |
| 1283 | <idle>-0 [004] d..1 470.867171: ns_to_timespec <-ns_to_timeval |
| 1284 | <idle>-0 [004] d..1 470.867172: smp_apic_timer_interrupt <-apic_timer_interrupt |
| 1285 | <idle>-0 [004] d..1 470.867172: native_apic_mem_write <-smp_apic_timer_interrupt |
| 1286 | <idle>-0 [004] d..1 470.867172: irq_enter <-smp_apic_timer_interrupt |
| 1287 | <idle>-0 [004] d..1 470.867172: rcu_irq_enter <-irq_enter |
| 1288 | <idle>-0 [004] d..1 470.867173: rcu_idle_exit_common.isra.33 <-rcu_irq_enter |
| 1289 | <idle>-0 [004] d..1 470.867173: local_bh_disable <-irq_enter |
| 1290 | <idle>-0 [004] d..1 470.867173: add_preempt_count <-local_bh_disable |
| 1291 | <idle>-0 [004] d.s1 470.867174: tick_check_idle <-irq_enter |
| 1292 | <idle>-0 [004] d.s1 470.867174: tick_check_oneshot_broadcast <-tick_check_idle |
| 1293 | <idle>-0 [004] d.s1 470.867174: ktime_get <-tick_check_idle |
| 1294 | <idle>-0 [004] d.s1 470.867174: tick_nohz_stop_idle <-tick_check_idle |
| 1295 | <idle>-0 [004] d.s1 470.867175: update_ts_time_stats <-tick_nohz_stop_idle |
| 1296 | <idle>-0 [004] d.s1 470.867175: nr_iowait_cpu <-update_ts_time_stats |
| 1297 | <idle>-0 [004] d.s1 470.867175: tick_do_update_jiffies64 <-tick_check_idle |
| 1298 | <idle>-0 [004] d.s1 470.867175: _raw_spin_lock <-tick_do_update_jiffies64 |
| 1299 | <idle>-0 [004] d.s1 470.867176: add_preempt_count <-_raw_spin_lock |
| 1300 | <idle>-0 [004] d.s2 470.867176: do_timer <-tick_do_update_jiffies64 |
| 1301 | <idle>-0 [004] d.s2 470.867176: _raw_spin_lock <-do_timer |
| 1302 | <idle>-0 [004] d.s2 470.867176: add_preempt_count <-_raw_spin_lock |
| 1303 | <idle>-0 [004] d.s3 470.867177: ntp_tick_length <-do_timer |
| 1304 | <idle>-0 [004] d.s3 470.867177: _raw_spin_lock_irqsave <-ntp_tick_length |
| 1305 | . |
| 1306 | . |
| 1307 | . |
| 1308 | |
| 1309 | Each line in the trace above shows what was happening in the kernel on a given |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1310 | CPU, to the level of detail of function calls. Each entry shows the function |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1311 | called, followed by its caller (after the arrow). |
| 1312 | |
| 1313 | The function tracer gives you an extremely detailed idea of what the |
| 1314 | kernel was doing at the point in time the trace was taken, and is a |
| 1315 | great way to learn about how the kernel code works in a dynamic sense. |
| 1316 | |
| 1317 | .. admonition:: Tying it Together |
| 1318 | |
| 1319 | The ftrace function tracer is also available from within perf, as the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1320 | ``ftrace:function`` tracepoint. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1321 | |
| 1322 | It is a little more difficult to follow the call chains than it needs to |
Andrew Geissler | 615f2f1 | 2022-07-15 14:00:58 -0500 | [diff] [blame] | 1323 | be --- luckily there's a variant of the function tracer that displays the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1324 | call chains explicitly, called the ``function_graph`` tracer:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1325 | |
| 1326 | root@sugarbay:/sys/kernel/debug/tracing# echo function_graph > current_tracer |
| 1327 | root@sugarbay:/sys/kernel/debug/tracing# cat trace | less |
| 1328 | |
| 1329 | tracer: function_graph |
| 1330 | |
| 1331 | CPU DURATION FUNCTION CALLS |
| 1332 | | | | | | | | |
| 1333 | 7) 0.046 us | pick_next_task_fair(); |
| 1334 | 7) 0.043 us | pick_next_task_stop(); |
| 1335 | 7) 0.042 us | pick_next_task_rt(); |
| 1336 | 7) 0.032 us | pick_next_task_fair(); |
| 1337 | 7) 0.030 us | pick_next_task_idle(); |
| 1338 | 7) | _raw_spin_unlock_irq() { |
| 1339 | 7) 0.033 us | sub_preempt_count(); |
| 1340 | 7) 0.258 us | } |
| 1341 | 7) 0.032 us | sub_preempt_count(); |
| 1342 | 7) + 13.341 us | } /* __schedule */ |
| 1343 | 7) 0.095 us | } /* sub_preempt_count */ |
| 1344 | 7) | schedule() { |
| 1345 | 7) | __schedule() { |
| 1346 | 7) 0.060 us | add_preempt_count(); |
| 1347 | 7) 0.044 us | rcu_note_context_switch(); |
| 1348 | 7) | _raw_spin_lock_irq() { |
| 1349 | 7) 0.033 us | add_preempt_count(); |
| 1350 | 7) 0.247 us | } |
| 1351 | 7) | idle_balance() { |
| 1352 | 7) | _raw_spin_unlock() { |
| 1353 | 7) 0.031 us | sub_preempt_count(); |
| 1354 | 7) 0.246 us | } |
| 1355 | 7) | update_shares() { |
| 1356 | 7) 0.030 us | __rcu_read_lock(); |
| 1357 | 7) 0.029 us | __rcu_read_unlock(); |
| 1358 | 7) 0.484 us | } |
| 1359 | 7) 0.030 us | __rcu_read_lock(); |
| 1360 | 7) | load_balance() { |
| 1361 | 7) | find_busiest_group() { |
| 1362 | 7) 0.031 us | idle_cpu(); |
| 1363 | 7) 0.029 us | idle_cpu(); |
| 1364 | 7) 0.035 us | idle_cpu(); |
| 1365 | 7) 0.906 us | } |
| 1366 | 7) 1.141 us | } |
| 1367 | 7) 0.022 us | msecs_to_jiffies(); |
| 1368 | 7) | load_balance() { |
| 1369 | 7) | find_busiest_group() { |
| 1370 | 7) 0.031 us | idle_cpu(); |
| 1371 | . |
| 1372 | . |
| 1373 | . |
| 1374 | 4) 0.062 us | msecs_to_jiffies(); |
| 1375 | 4) 0.062 us | __rcu_read_unlock(); |
| 1376 | 4) | _raw_spin_lock() { |
| 1377 | 4) 0.073 us | add_preempt_count(); |
| 1378 | 4) 0.562 us | } |
| 1379 | 4) + 17.452 us | } |
| 1380 | 4) 0.108 us | put_prev_task_fair(); |
| 1381 | 4) 0.102 us | pick_next_task_fair(); |
| 1382 | 4) 0.084 us | pick_next_task_stop(); |
| 1383 | 4) 0.075 us | pick_next_task_rt(); |
| 1384 | 4) 0.062 us | pick_next_task_fair(); |
| 1385 | 4) 0.066 us | pick_next_task_idle(); |
| 1386 | ------------------------------------------ |
| 1387 | 4) kworker-74 => <idle>-0 |
| 1388 | ------------------------------------------ |
| 1389 | |
| 1390 | 4) | finish_task_switch() { |
| 1391 | 4) | _raw_spin_unlock_irq() { |
| 1392 | 4) 0.100 us | sub_preempt_count(); |
| 1393 | 4) 0.582 us | } |
| 1394 | 4) 1.105 us | } |
| 1395 | 4) 0.088 us | sub_preempt_count(); |
| 1396 | 4) ! 100.066 us | } |
| 1397 | . |
| 1398 | . |
| 1399 | . |
| 1400 | 3) | sys_ioctl() { |
| 1401 | 3) 0.083 us | fget_light(); |
| 1402 | 3) | security_file_ioctl() { |
| 1403 | 3) 0.066 us | cap_file_ioctl(); |
| 1404 | 3) 0.562 us | } |
| 1405 | 3) | do_vfs_ioctl() { |
| 1406 | 3) | drm_ioctl() { |
| 1407 | 3) 0.075 us | drm_ut_debug_printk(); |
| 1408 | 3) | i915_gem_pwrite_ioctl() { |
| 1409 | 3) | i915_mutex_lock_interruptible() { |
| 1410 | 3) 0.070 us | mutex_lock_interruptible(); |
| 1411 | 3) 0.570 us | } |
| 1412 | 3) | drm_gem_object_lookup() { |
| 1413 | 3) | _raw_spin_lock() { |
| 1414 | 3) 0.080 us | add_preempt_count(); |
| 1415 | 3) 0.620 us | } |
| 1416 | 3) | _raw_spin_unlock() { |
| 1417 | 3) 0.085 us | sub_preempt_count(); |
| 1418 | 3) 0.562 us | } |
| 1419 | 3) 2.149 us | } |
| 1420 | 3) 0.133 us | i915_gem_object_pin(); |
| 1421 | 3) | i915_gem_object_set_to_gtt_domain() { |
| 1422 | 3) 0.065 us | i915_gem_object_flush_gpu_write_domain(); |
| 1423 | 3) 0.065 us | i915_gem_object_wait_rendering(); |
| 1424 | 3) 0.062 us | i915_gem_object_flush_cpu_write_domain(); |
| 1425 | 3) 1.612 us | } |
| 1426 | 3) | i915_gem_object_put_fence() { |
| 1427 | 3) 0.097 us | i915_gem_object_flush_fence.constprop.36(); |
| 1428 | 3) 0.645 us | } |
| 1429 | 3) 0.070 us | add_preempt_count(); |
| 1430 | 3) 0.070 us | sub_preempt_count(); |
| 1431 | 3) 0.073 us | i915_gem_object_unpin(); |
| 1432 | 3) 0.068 us | mutex_unlock(); |
| 1433 | 3) 9.924 us | } |
| 1434 | 3) + 11.236 us | } |
| 1435 | 3) + 11.770 us | } |
| 1436 | 3) + 13.784 us | } |
| 1437 | 3) | sys_ioctl() { |
| 1438 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1439 | As you can see, the ``function_graph`` display is much easier |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1440 | to follow. Also note that in addition to the function calls and |
| 1441 | associated braces, other events such as scheduler events are displayed |
| 1442 | in context. In fact, you can freely include any tracepoint available in |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1443 | the trace events subsystem described in the next section by just |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1444 | enabling those events, and they'll appear in context in the function |
| 1445 | graph display. Quite a powerful tool for understanding kernel dynamics. |
| 1446 | |
| 1447 | Also notice that there are various annotations on the left hand side of |
| 1448 | the display. For example if the total time it took for a given function |
| 1449 | to execute is above a certain threshold, an exclamation point or plus |
| 1450 | sign appears on the left hand side. Please see the ftrace documentation |
| 1451 | for details on all these fields. |
| 1452 | |
| 1453 | The 'trace events' Subsystem |
| 1454 | ---------------------------- |
| 1455 | |
| 1456 | One especially important directory contained within the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1457 | ``/sys/kernel/debug/tracing`` directory is the ``events`` subdirectory, which |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1458 | contains representations of every tracepoint in the system. Listing out |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1459 | the contents of the ``events`` subdirectory, we see mainly another set of |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1460 | subdirectories:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1461 | |
| 1462 | root@sugarbay:/sys/kernel/debug/tracing# cd events |
| 1463 | root@sugarbay:/sys/kernel/debug/tracing/events# ls -al |
| 1464 | drwxr-xr-x 38 root root 0 Nov 14 23:19 . |
| 1465 | drwxr-xr-x 5 root root 0 Nov 14 23:19 .. |
| 1466 | drwxr-xr-x 19 root root 0 Nov 14 23:19 block |
| 1467 | drwxr-xr-x 32 root root 0 Nov 14 23:19 btrfs |
| 1468 | drwxr-xr-x 5 root root 0 Nov 14 23:19 drm |
| 1469 | -rw-r--r-- 1 root root 0 Nov 14 23:19 enable |
| 1470 | drwxr-xr-x 40 root root 0 Nov 14 23:19 ext3 |
| 1471 | drwxr-xr-x 79 root root 0 Nov 14 23:19 ext4 |
| 1472 | drwxr-xr-x 14 root root 0 Nov 14 23:19 ftrace |
| 1473 | drwxr-xr-x 8 root root 0 Nov 14 23:19 hda |
| 1474 | -r--r--r-- 1 root root 0 Nov 14 23:19 header_event |
| 1475 | -r--r--r-- 1 root root 0 Nov 14 23:19 header_page |
| 1476 | drwxr-xr-x 25 root root 0 Nov 14 23:19 i915 |
| 1477 | drwxr-xr-x 7 root root 0 Nov 14 23:19 irq |
| 1478 | drwxr-xr-x 12 root root 0 Nov 14 23:19 jbd |
| 1479 | drwxr-xr-x 14 root root 0 Nov 14 23:19 jbd2 |
| 1480 | drwxr-xr-x 14 root root 0 Nov 14 23:19 kmem |
| 1481 | drwxr-xr-x 7 root root 0 Nov 14 23:19 module |
| 1482 | drwxr-xr-x 3 root root 0 Nov 14 23:19 napi |
| 1483 | drwxr-xr-x 6 root root 0 Nov 14 23:19 net |
| 1484 | drwxr-xr-x 3 root root 0 Nov 14 23:19 oom |
| 1485 | drwxr-xr-x 12 root root 0 Nov 14 23:19 power |
| 1486 | drwxr-xr-x 3 root root 0 Nov 14 23:19 printk |
| 1487 | drwxr-xr-x 8 root root 0 Nov 14 23:19 random |
| 1488 | drwxr-xr-x 4 root root 0 Nov 14 23:19 raw_syscalls |
| 1489 | drwxr-xr-x 3 root root 0 Nov 14 23:19 rcu |
| 1490 | drwxr-xr-x 6 root root 0 Nov 14 23:19 rpm |
| 1491 | drwxr-xr-x 20 root root 0 Nov 14 23:19 sched |
| 1492 | drwxr-xr-x 7 root root 0 Nov 14 23:19 scsi |
| 1493 | drwxr-xr-x 4 root root 0 Nov 14 23:19 signal |
| 1494 | drwxr-xr-x 5 root root 0 Nov 14 23:19 skb |
| 1495 | drwxr-xr-x 4 root root 0 Nov 14 23:19 sock |
| 1496 | drwxr-xr-x 10 root root 0 Nov 14 23:19 sunrpc |
| 1497 | drwxr-xr-x 538 root root 0 Nov 14 23:19 syscalls |
| 1498 | drwxr-xr-x 4 root root 0 Nov 14 23:19 task |
| 1499 | drwxr-xr-x 14 root root 0 Nov 14 23:19 timer |
| 1500 | drwxr-xr-x 3 root root 0 Nov 14 23:19 udp |
| 1501 | drwxr-xr-x 21 root root 0 Nov 14 23:19 vmscan |
| 1502 | drwxr-xr-x 3 root root 0 Nov 14 23:19 vsyscall |
| 1503 | drwxr-xr-x 6 root root 0 Nov 14 23:19 workqueue |
| 1504 | drwxr-xr-x 26 root root 0 Nov 14 23:19 writeback |
| 1505 | |
| 1506 | Each one of these subdirectories |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1507 | corresponds to a "subsystem" and contains yet again more subdirectories, |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1508 | each one of those finally corresponding to a tracepoint. For example, |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1509 | here are the contents of the ``kmem`` subsystem:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1510 | |
| 1511 | root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem |
| 1512 | root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al |
| 1513 | drwxr-xr-x 14 root root 0 Nov 14 23:19 . |
| 1514 | drwxr-xr-x 38 root root 0 Nov 14 23:19 .. |
| 1515 | -rw-r--r-- 1 root root 0 Nov 14 23:19 enable |
| 1516 | -rw-r--r-- 1 root root 0 Nov 14 23:19 filter |
| 1517 | drwxr-xr-x 2 root root 0 Nov 14 23:19 kfree |
| 1518 | drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc |
| 1519 | drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc_node |
| 1520 | drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc |
| 1521 | drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc_node |
| 1522 | drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_free |
| 1523 | drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc |
| 1524 | drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_extfrag |
| 1525 | drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_zone_locked |
| 1526 | drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free |
| 1527 | drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free_batched |
| 1528 | drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_pcpu_drain |
| 1529 | |
| 1530 | Let's see what's inside the subdirectory for a |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1531 | specific tracepoint, in this case the one for ``kmalloc``:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1532 | |
| 1533 | root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc |
| 1534 | root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al |
| 1535 | drwxr-xr-x 2 root root 0 Nov 14 23:19 . |
| 1536 | drwxr-xr-x 14 root root 0 Nov 14 23:19 .. |
| 1537 | -rw-r--r-- 1 root root 0 Nov 14 23:19 enable |
| 1538 | -rw-r--r-- 1 root root 0 Nov 14 23:19 filter |
| 1539 | -r--r--r-- 1 root root 0 Nov 14 23:19 format |
| 1540 | -r--r--r-- 1 root root 0 Nov 14 23:19 id |
| 1541 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1542 | The ``format`` file for the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1543 | tracepoint describes the event in memory, which is used by the various |
| 1544 | tracing tools that now make use of these tracepoint to parse the event |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1545 | and make sense of it, along with a ``print fmt`` field that allows tools |
| 1546 | like ftrace to display the event as text. The format of the |
| 1547 | ``kmalloc`` event looks like:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1548 | |
| 1549 | root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format |
| 1550 | name: kmalloc |
| 1551 | ID: 313 |
| 1552 | format: |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 1553 | field:unsigned short common_type; offset:0; size:2; signed:0; |
| 1554 | field:unsigned char common_flags; offset:2; size:1; signed:0; |
| 1555 | field:unsigned char common_preempt_count; offset:3; size:1; signed:0; |
| 1556 | field:int common_pid; offset:4; size:4; signed:1; |
| 1557 | field:int common_padding; offset:8; size:4; signed:1; |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1558 | |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 1559 | field:unsigned long call_site; offset:16; size:8; signed:0; |
| 1560 | field:const void * ptr; offset:24; size:8; signed:0; |
| 1561 | field:size_t bytes_req; offset:32; size:8; signed:0; |
| 1562 | field:size_t bytes_alloc; offset:40; size:8; signed:0; |
| 1563 | field:gfp_t gfp_flags; offset:48; size:4; signed:0; |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1564 | |
| 1565 | print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc, |
| 1566 | (REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( |
| 1567 | gfp_t)0x20000u) | (( gfp_t)0x02u) | (( gfp_t)0x08u)) | (( gfp_t)0x4000u) | (( gfp_t)0x10000u) | (( gfp_t)0x1000u) | (( gfp_t)0x200u) | (( |
| 1568 | gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | (( |
| 1569 | gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( |
| 1570 | gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( |
| 1571 | gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"}, |
| 1572 | {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)), |
| 1573 | "GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)(( |
| 1574 | gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)(( |
| 1575 | gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned |
| 1576 | long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"}, {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"}, |
| 1577 | {(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u), |
| 1578 | "GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned |
| 1579 | long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"}, |
| 1580 | {(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT" |
| 1581 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1582 | The ``enable`` file |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1583 | in the tracepoint directory is what allows the user (or tools such as |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1584 | ``trace-cmd``) to actually turn the tracepoint on and off. When enabled, the |
| 1585 | corresponding tracepoint will start appearing in the ftrace ``trace`` file |
| 1586 | described previously. For example, this turns on the ``kmalloc`` tracepoint:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1587 | |
| 1588 | root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable |
| 1589 | |
| 1590 | At the moment, we're not interested in the function tracer or |
| 1591 | some other tracer that might be in effect, so we first turn it off, but |
| 1592 | if we do that, we still need to turn tracing on in order to see the |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1593 | events in the output buffer:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1594 | |
| 1595 | root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer |
| 1596 | root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on |
| 1597 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1598 | Now, if we look at the ``trace`` file, we see nothing |
| 1599 | but the ``kmalloc`` events we just turned on:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1600 | |
| 1601 | root@sugarbay:/sys/kernel/debug/tracing# cat trace | less |
| 1602 | # tracer: nop |
| 1603 | # |
| 1604 | # entries-in-buffer/entries-written: 1897/1897 #P:8 |
| 1605 | # |
| 1606 | # _-----=> irqs-off |
| 1607 | # / _----=> need-resched |
| 1608 | # | / _---=> hardirq/softirq |
| 1609 | # || / _--=> preempt-depth |
| 1610 | # ||| / delay |
| 1611 | # TASK-PID CPU# |||| TIMESTAMP FUNCTION |
| 1612 | # | | | |||| | | |
| 1613 | dropbear-1465 [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL |
| 1614 | <idle>-0 [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1615 | <idle>-0 [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1616 | matchbox-termin-1361 [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT |
| 1617 | Xorg-1264 [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY |
| 1618 | Xorg-1264 [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO |
| 1619 | Xorg-1264 [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO |
| 1620 | matchbox-termin-1361 [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT |
| 1621 | Xorg-1264 [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY |
| 1622 | Xorg-1264 [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO |
| 1623 | Xorg-1264 [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO |
| 1624 | <idle>-0 [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1625 | dropbear-1465 [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL |
| 1626 | <idle>-0 [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1627 | <idle>-0 [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1628 | dropbear-1465 [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL |
| 1629 | <idle>-0 [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1630 | <idle>-0 [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1631 | dropbear-1465 [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL |
| 1632 | <idle>-0 [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1633 | matchbox-termin-1361 [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT |
| 1634 | Xorg-1264 [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY |
| 1635 | Xorg-1264 [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO |
| 1636 | Xorg-1264 [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO |
| 1637 | <idle>-0 [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1638 | dropbear-1465 [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL |
| 1639 | <idle>-0 [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1640 | <idle>-0 [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1641 | dropbear-1465 [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL |
| 1642 | <idle>-0 [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC |
| 1643 | matchbox-termin-1361 [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT |
| 1644 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1645 | To again disable the ``kmalloc`` event, we need to send ``0`` to the ``enable`` file:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1646 | |
| 1647 | root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable |
| 1648 | |
| 1649 | You can enable any number of events or complete subsystems (by |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1650 | using the ``enable`` file in the subsystem directory) and get an |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1651 | arbitrarily fine-grained idea of what's going on in the system by |
| 1652 | enabling as many of the appropriate tracepoints as applicable. |
| 1653 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1654 | Several tools described in this How-to do just that, including |
| 1655 | ``trace-cmd`` and KernelShark in the next section. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1656 | |
| 1657 | .. admonition:: Tying it Together |
| 1658 | |
| 1659 | These tracepoints and their representation are used not only by |
| 1660 | ftrace, but by many of the other tools covered in this document and |
| 1661 | they form a central point of integration for the various tracers |
| 1662 | available in Linux. They form a central part of the instrumentation |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1663 | for the following tools: perf, LTTng, ftrace, blktrace and SystemTap |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1664 | |
| 1665 | .. admonition:: Tying it Together |
| 1666 | |
| 1667 | Eventually all the special-purpose tracers currently available in |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1668 | ``/sys/kernel/debug/tracing`` will be removed and replaced with |
| 1669 | equivalent tracers based on the "trace events" subsystem. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1670 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1671 | trace-cmd / KernelShark |
| 1672 | ----------------------- |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1673 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1674 | trace-cmd is essentially an extensive command-line "wrapper" interface |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1675 | that hides the details of all the individual files in |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1676 | ``/sys/kernel/debug/tracing``, allowing users to specify specific particular |
| 1677 | events within the ``/sys/kernel/debug/tracing/events/`` subdirectory and to |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1678 | collect traces and avoid having to deal with those details directly. |
| 1679 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1680 | As yet another layer on top of that, KernelShark provides a GUI that |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1681 | allows users to start and stop traces and specify sets of events using |
| 1682 | an intuitive interface, and view the output as both trace events and as |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1683 | a per-CPU graphical display. It directly uses trace-cmd as the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1684 | plumbing that accomplishes all that underneath the covers (and actually |
| 1685 | displays the trace-cmd command it uses, as we'll see). |
| 1686 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1687 | To start a trace using KernelShark, first start this tool:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1688 | |
| 1689 | root@sugarbay:~# kernelshark |
| 1690 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1691 | Then open up the ``Capture`` dialog by choosing from the KernelShark menu:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1692 | |
| 1693 | Capture | Record |
| 1694 | |
| 1695 | That will display the following dialog, which allows you to choose one or more |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1696 | events (or even entire subsystems) to trace: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1697 | |
| 1698 | .. image:: figures/kernelshark-choose-events.png |
| 1699 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 1700 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1701 | |
| 1702 | Note that these are exactly the same sets of events described in the |
| 1703 | previous trace events subsystem section, and in fact is where trace-cmd |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1704 | gets them for KernelShark. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1705 | |
| 1706 | In the above screenshot, we've decided to explore the graphics subsystem |
| 1707 | a bit and so have chosen to trace all the tracepoints contained within |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1708 | the ``i915`` and ``drm`` subsystems. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1709 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1710 | After doing that, we can start and stop the trace using the ``Run`` and |
| 1711 | ``Stop`` button on the lower right corner of the dialog (the same button |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1712 | will turn into the 'Stop' button after the trace has started): |
| 1713 | |
| 1714 | .. image:: figures/kernelshark-output-display.png |
| 1715 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 1716 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1717 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1718 | Notice that the right pane shows the exact trace-cmd command-line |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1719 | that's used to run the trace, along with the results of the trace-cmd |
| 1720 | run. |
| 1721 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1722 | Once the ``Stop`` button is pressed, the graphical view magically fills up |
| 1723 | with a colorful per-CPU display of the trace data, along with the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1724 | detailed event listing below that: |
| 1725 | |
| 1726 | .. image:: figures/kernelshark-i915-display.png |
| 1727 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 1728 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1729 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1730 | Here's another example, this time a display resulting from tracing ``all |
| 1731 | events``: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1732 | |
| 1733 | .. image:: figures/kernelshark-all.png |
| 1734 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 1735 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1736 | |
| 1737 | The tool is pretty self-explanatory, but for more detailed information |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1738 | on navigating through the data, see the `KernelShark |
Andrew Geissler | 6aa7eec | 2023-03-03 12:41:14 -0600 | [diff] [blame] | 1739 | website <https://kernelshark.org/Documentation.html>`__. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1740 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1741 | ftrace Documentation |
| 1742 | -------------------- |
| 1743 | |
| 1744 | The documentation for ftrace can be found in the kernel Documentation |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1745 | directory:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1746 | |
| 1747 | Documentation/trace/ftrace.txt |
| 1748 | |
| 1749 | The documentation for the trace event subsystem can also be found in the kernel |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1750 | Documentation directory:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1751 | |
| 1752 | Documentation/trace/events.txt |
| 1753 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1754 | A nice series of articles on using ftrace and trace-cmd are available at LWN: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1755 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1756 | - `Debugging the kernel using ftrace - part |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 1757 | 1 <https://lwn.net/Articles/365835/>`__ |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1758 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1759 | - `Debugging the kernel using ftrace - part |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 1760 | 2 <https://lwn.net/Articles/366796/>`__ |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1761 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1762 | - `Secrets of the ftrace function |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 1763 | tracer <https://lwn.net/Articles/370423/>`__ |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1764 | |
| 1765 | - `trace-cmd: A front-end for |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1766 | ftrace <https://lwn.net/Articles/410200/>`__ |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1767 | |
Andrew Geissler | 6aa7eec | 2023-03-03 12:41:14 -0600 | [diff] [blame] | 1768 | See also `KernelShark's documentation <https://kernelshark.org/Documentation.html>`__ |
| 1769 | for further usage details. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1770 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1771 | An amusing yet useful README (a tracing mini-How-to) can be found in |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1772 | ``/sys/kernel/debug/tracing/README``. |
| 1773 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1774 | SystemTap |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1775 | ========= |
| 1776 | |
| 1777 | SystemTap is a system-wide script-based tracing and profiling tool. |
| 1778 | |
| 1779 | SystemTap scripts are C-like programs that are executed in the kernel to |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1780 | gather / print / aggregate data extracted from the context they end up being |
| 1781 | called under. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1782 | |
| 1783 | For example, this probe from the `SystemTap |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1784 | tutorial <https://sourceware.org/systemtap/tutorial/>`__ just prints a |
| 1785 | line every time any process on the system runs ``open()`` on a file. For each line, |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1786 | it prints the executable name of the program that opened the file, along |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1787 | with its PID, and the name of the file it opened (or tried to open), which it |
| 1788 | extracts from the argument string (``argstr``) of the ``open`` system call. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1789 | |
| 1790 | .. code-block:: none |
| 1791 | |
| 1792 | probe syscall.open |
| 1793 | { |
| 1794 | printf ("%s(%d) open (%s)\n", execname(), pid(), argstr) |
| 1795 | } |
| 1796 | |
| 1797 | probe timer.ms(4000) # after 4 seconds |
| 1798 | { |
| 1799 | exit () |
| 1800 | } |
| 1801 | |
| 1802 | Normally, to execute this |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1803 | probe, you'd just install SystemTap on the system you want to probe, |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1804 | and directly run the probe on that system e.g. assuming the name of the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1805 | file containing the above text is ``trace_open.stp``:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1806 | |
| 1807 | # stap trace_open.stp |
| 1808 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1809 | What SystemTap does under the covers to run this probe is 1) parse and |
| 1810 | convert the probe to an equivalent "C" form, 2) compile the "C" form |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1811 | into a kernel module, 3) insert the module into the kernel, which arms |
| 1812 | it, and 4) collect the data generated by the probe and display it to the |
| 1813 | user. |
| 1814 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1815 | In order to accomplish steps 1 and 2, the ``stap`` program needs access to |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1816 | the kernel build system that produced the kernel that the probed system |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1817 | is running. In the case of a typical embedded system (the "target"), the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1818 | kernel build system unfortunately isn't typically part of the image |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1819 | running on the target. It is normally available on the "host" system |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1820 | that produced the target image however; in such cases, steps 1 and 2 are |
| 1821 | executed on the host system, and steps 3 and 4 are executed on the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1822 | target system, using only the SystemTap "runtime". |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1823 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1824 | The SystemTap support in Yocto assumes that only steps 3 and 4 are run |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1825 | on the target; it is possible to do everything on the target, but this |
| 1826 | section assumes only the typical embedded use-case. |
| 1827 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1828 | Therefore, what you need to do in order to run a SystemTap script on |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1829 | the target is to 1) on the host system, compile the probe into a kernel |
| 1830 | module that makes sense to the target, 2) copy the module onto the |
| 1831 | target system and 3) insert the module into the target kernel, which |
| 1832 | arms it, and 4) collect the data generated by the probe and display it |
| 1833 | to the user. |
| 1834 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1835 | SystemTap Setup |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1836 | --------------- |
| 1837 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1838 | Those are many steps and details, but fortunately Yocto |
| 1839 | includes a script called ``crosstap`` that will take care of those |
| 1840 | details, allowing you to just execute a SystemTap script on the remote |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1841 | target, with arguments if necessary. |
| 1842 | |
| 1843 | In order to do this from a remote host, however, you need to have access |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1844 | to the build for the image you booted. The ``crosstap`` script provides |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1845 | details on how to do this if you run the script on the host without |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1846 | having done a build:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1847 | |
| 1848 | $ crosstap root@192.168.1.88 trace_open.stp |
| 1849 | |
| 1850 | Error: No target kernel build found. |
| 1851 | Did you forget to create a local build of your image? |
| 1852 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1853 | 'crosstap' requires a local SDK build of the target system |
| 1854 | (or a build that includes 'tools-profile') in order to build |
| 1855 | kernel modules that can probe the target system. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1856 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1857 | Practically speaking, that means you need to do the following: |
| 1858 | |
| 1859 | - If you're running a pre-built image, download the release |
| 1860 | and/or BSP tarballs used to build the image. |
| 1861 | |
| 1862 | - If you're working from git sources, just clone the metadata |
| 1863 | and BSP layers needed to build the image you'll be booting. |
| 1864 | |
| 1865 | - Make sure you're properly set up to build a new image (see |
| 1866 | the BSP README and/or the widely available basic documentation |
| 1867 | that discusses how to build images). |
| 1868 | |
| 1869 | - Build an ``-sdk`` version of the image e.g.:: |
| 1870 | |
| 1871 | $ bitbake core-image-sato-sdk |
| 1872 | |
| 1873 | - Or build a non-SDK image but include the profiling tools |
| 1874 | (edit ``local.conf`` and add ``tools-profile`` to the end of |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 1875 | :term:`EXTRA_IMAGE_FEATURES` variable):: |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1876 | |
| 1877 | $ bitbake core-image-sato |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1878 | |
| 1879 | Once you've build the image on the host system, you're ready to |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1880 | boot it (or the equivalent pre-built image) and use ``crosstap`` |
| 1881 | to probe it (you need to source the environment as usual first):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1882 | |
| 1883 | $ source oe-init-build-env |
| 1884 | $ cd ~/my/systemtap/scripts |
| 1885 | $ crosstap root@192.168.1.xxx myscript.stp |
| 1886 | |
| 1887 | .. note:: |
| 1888 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1889 | SystemTap, which uses ``crosstap``, assumes you can establish an SSH |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1890 | connection to the remote target. Please refer to the crosstap wiki |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1891 | page for details on verifying SSH connections. Also, the ability to SSH |
| 1892 | into the target system is not enabled by default in ``*-minimal`` images. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1893 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1894 | Therefore, what you need to do is build an SDK image or image with |
| 1895 | ``tools-profile`` as detailed in the ":ref:`profile-manual/intro:General Setup`" |
| 1896 | section of this manual, and boot the resulting target image. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1897 | |
| 1898 | .. note:: |
| 1899 | |
Patrick Williams | 2390b1b | 2022-11-03 13:47:49 -0500 | [diff] [blame] | 1900 | If you have a :term:`Build Directory` containing multiple machines, you need |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1901 | to have the :term:`MACHINE` you're connecting to selected in ``local.conf``, and |
Patrick Williams | 2390b1b | 2022-11-03 13:47:49 -0500 | [diff] [blame] | 1902 | the kernel in that machine's :term:`Build Directory` must match the kernel on |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1903 | the booted system exactly, or you'll get the above ``crosstap`` message |
| 1904 | when you try to call a script. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1905 | |
| 1906 | Running a Script on a Target |
| 1907 | ---------------------------- |
| 1908 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1909 | Once you've done that, you should be able to run a SystemTap script on |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1910 | the target:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1911 | |
| 1912 | $ cd /path/to/yocto |
| 1913 | $ source oe-init-build-env |
| 1914 | |
| 1915 | ### Shell environment set up for builds. ### |
| 1916 | |
| 1917 | You can now run 'bitbake <target>' |
| 1918 | |
| 1919 | Common targets are: |
| 1920 | core-image-minimal |
| 1921 | core-image-sato |
| 1922 | meta-toolchain |
| 1923 | meta-ide-support |
| 1924 | |
Andrew Geissler | 3b8a17c | 2021-04-15 15:55:55 -0500 | [diff] [blame] | 1925 | You can also run generated QEMU images with a command like 'runqemu qemux86-64' |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1926 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1927 | Once you've done that, you can ``cd`` to whatever |
| 1928 | directory contains your scripts and use ``crosstap`` to run the script:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1929 | |
| 1930 | $ cd /path/to/my/systemap/script |
| 1931 | $ crosstap root@192.168.7.2 trace_open.stp |
| 1932 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 1933 | If you get an error connecting to the target e.g.:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1934 | |
| 1935 | $ crosstap root@192.168.7.2 trace_open.stp |
| 1936 | error establishing ssh connection on remote 'root@192.168.7.2' |
| 1937 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1938 | Try connecting to the target through SSH and see what happens:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1939 | |
| 1940 | $ ssh root@192.168.7.2 |
| 1941 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1942 | Connection problems are often due specifying a wrong IP address or having a ``host key |
| 1943 | verification error``. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1944 | |
| 1945 | If everything worked as planned, you should see something like this |
| 1946 | (enter the password when prompted, or press enter if it's set up to use |
| 1947 | no password): |
| 1948 | |
| 1949 | .. code-block:: none |
| 1950 | |
| 1951 | $ crosstap root@192.168.7.2 trace_open.stp |
| 1952 | root@192.168.7.2's password: |
| 1953 | matchbox-termin(1036) open ("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600) |
| 1954 | matchbox-termin(1036) open ("/tmp/vteJMC7LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600) |
| 1955 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1956 | SystemTap Documentation |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1957 | ----------------------- |
| 1958 | |
| 1959 | The SystemTap language reference can be found here: `SystemTap Language |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 1960 | Reference <https://sourceware.org/systemtap/langref/>`__ |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1961 | |
| 1962 | Links to other SystemTap documents, tutorials, and examples can be found |
| 1963 | here: `SystemTap documentation |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 1964 | page <https://sourceware.org/systemtap/documentation.html>`__ |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1965 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1966 | Sysprof |
| 1967 | ======= |
| 1968 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1969 | Sysprof is an easy to use system-wide profiler that consists of a |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1970 | single window with three panes and a few buttons which allow you to |
| 1971 | start, stop, and view the profile from one place. |
| 1972 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1973 | Sysprof Setup |
| 1974 | ------------- |
| 1975 | |
| 1976 | For this section, we'll assume you've already performed the basic setup |
Andrew Geissler | 09209ee | 2020-12-13 08:44:15 -0600 | [diff] [blame] | 1977 | outlined in the ":ref:`profile-manual/intro:General Setup`" section. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1978 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1979 | Sysprof is a GUI-based application that runs on the target system. For the rest |
| 1980 | of this document we assume you're connected to the host through SSH and will be |
| 1981 | running Sysprof on the target (you can use the ``-X`` option to ``ssh`` and |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1982 | have the Sysprof GUI run on the target but display remotely on the host |
| 1983 | if you want). |
| 1984 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1985 | Basic Sysprof Usage |
| 1986 | ------------------- |
| 1987 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1988 | To start profiling the system, you just press the ``Start`` button. To |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1989 | stop profiling and to start viewing the profile data in one easy step, |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 1990 | press the ``Profile`` button. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1991 | |
| 1992 | Once you've pressed the profile button, the three panes will fill up |
| 1993 | with profiling data: |
| 1994 | |
| 1995 | .. image:: figures/sysprof-copy-to-user.png |
| 1996 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 1997 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 1998 | |
| 1999 | The left pane shows a list of functions and processes. Selecting one of |
| 2000 | those expands that function in the right pane, showing all its callees. |
| 2001 | Note that this caller-oriented display is essentially the inverse of |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2002 | perf's default callee-oriented call chain display. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2003 | |
| 2004 | In the screenshot above, we're focusing on ``__copy_to_user_ll()`` and |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2005 | looking up the call chain we can see that one of the callers of |
| 2006 | ``__copy_to_user_ll`` is ``sys_read()`` and the complete call path between them. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2007 | Notice that this is essentially a portion of the same information we saw |
| 2008 | in the perf display shown in the perf section of this page. |
| 2009 | |
| 2010 | .. image:: figures/sysprof-copy-from-user.png |
| 2011 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 2012 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2013 | |
| 2014 | Similarly, the above is a snapshot of the Sysprof display of a |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2015 | ``copy-from-user`` call chain. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2016 | |
| 2017 | Finally, looking at the third Sysprof pane in the lower left, we can see |
| 2018 | a list of all the callers of a particular function selected in the top |
| 2019 | left pane. In this case, the lower pane is showing all the callers of |
| 2020 | ``__mark_inode_dirty``: |
| 2021 | |
| 2022 | .. image:: figures/sysprof-callers.png |
| 2023 | :align: center |
Andrew Geissler | d583833 | 2022-05-27 11:33:10 -0500 | [diff] [blame] | 2024 | :width: 70% |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2025 | |
| 2026 | Double-clicking on one of those functions will in turn change the focus |
| 2027 | to the selected function, and so on. |
| 2028 | |
| 2029 | .. admonition:: Tying it Together |
| 2030 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2031 | If you like Sysprof's ``caller-oriented`` display, you may be able to |
| 2032 | approximate it in other tools as well. For example, ``perf report`` has |
| 2033 | the ``-g`` (``--call-graph``) option that you can experiment with; one of the |
| 2034 | options is ``caller`` for an inverted caller-based call graph display. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2035 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2036 | Sysprof Documentation |
| 2037 | --------------------- |
| 2038 | |
| 2039 | There doesn't seem to be any documentation for Sysprof, but maybe that's |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2040 | because it's pretty self-explanatory. The Sysprof website, however, is here: |
| 2041 | `Sysprof, System-wide Performance Profiler for Linux <http://sysprof.com/>`__ |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2042 | |
| 2043 | LTTng (Linux Trace Toolkit, next generation) |
| 2044 | ============================================ |
| 2045 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2046 | LTTng Setup |
| 2047 | ----------- |
| 2048 | |
| 2049 | For this section, we'll assume you've already performed the basic setup |
Andrew Geissler | 09209ee | 2020-12-13 08:44:15 -0600 | [diff] [blame] | 2050 | outlined in the ":ref:`profile-manual/intro:General Setup`" section. |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2051 | LTTng is run on the target system by connecting to it through SSH. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2052 | |
| 2053 | Collecting and Viewing Traces |
| 2054 | ----------------------------- |
| 2055 | |
| 2056 | Once you've applied the above commits and built and booted your image |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2057 | (you need to build the ``core-image-sato-sdk`` image or use one of the other |
Andrew Geissler | 09209ee | 2020-12-13 08:44:15 -0600 | [diff] [blame] | 2058 | methods described in the ":ref:`profile-manual/intro:General Setup`" section), you're ready to start |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2059 | tracing. |
| 2060 | |
| 2061 | Collecting and viewing a trace on the target (inside a shell) |
| 2062 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 2063 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2064 | First, from the host, connect to the target through SSH:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2065 | |
| 2066 | $ ssh -l root 192.168.1.47 |
| 2067 | The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established. |
| 2068 | RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e. |
| 2069 | Are you sure you want to continue connecting (yes/no)? yes |
| 2070 | Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts. |
| 2071 | root@192.168.1.47's password: |
| 2072 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2073 | Once on the target, use these steps to create a trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2074 | |
| 2075 | root@crownbay:~# lttng create |
| 2076 | Spawning a session daemon |
| 2077 | Session auto-20121015-232120 created. |
| 2078 | Traces will be written in /home/root/lttng-traces/auto-20121015-232120 |
| 2079 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2080 | Enable the events you want to trace (in this case all kernel events):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2081 | |
| 2082 | root@crownbay:~# lttng enable-event --kernel --all |
| 2083 | All kernel events are enabled in channel channel0 |
| 2084 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2085 | Start the trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2086 | |
| 2087 | root@crownbay:~# lttng start |
| 2088 | Tracing started for session auto-20121015-232120 |
| 2089 | |
| 2090 | And then stop the trace after awhile or after running a particular workload that |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2091 | you want to trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2092 | |
| 2093 | root@crownbay:~# lttng stop |
| 2094 | Tracing stopped for session auto-20121015-232120 |
| 2095 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2096 | You can now view the trace in text form on the target:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2097 | |
| 2098 | root@crownbay:~# lttng view |
| 2099 | [23:21:56.989270399] (+?.?????????) sys_geteuid: { 1 }, { } |
| 2100 | [23:21:56.989278081] (+0.000007682) exit_syscall: { 1 }, { ret = 0 } |
| 2101 | [23:21:56.989286043] (+0.000007962) sys_pipe: { 1 }, { fildes = 0xB77B9E8C } |
| 2102 | [23:21:56.989321802] (+0.000035759) exit_syscall: { 1 }, { ret = 0 } |
| 2103 | [23:21:56.989329345] (+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot = 3, flags = 131362, fd = 4294967295, pgoff = 0 } |
| 2104 | [23:21:56.989351694] (+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 } |
| 2105 | [23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags = 0x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 } |
| 2106 | [23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388 } |
| 2107 | [23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm = "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 } |
| 2108 | [23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer = 3970832076, clockid = 1, mode = 1 } |
| 2109 | [23:21:56.989770462] (+0.000262044) hrtimer_cancel: { 1 }, { hrtimer = 3993865440 } |
| 2110 | [23:21:56.989771580] (+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 } |
| 2111 | [23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, { hrtimer = 3993865440, now = 79815980007057, function = 3238465232 } |
| 2112 | [23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, { hrtimer = 3993812192, now = 79815980008174, function = 3238465232 } |
| 2113 | [23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 } |
| 2114 | [23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 } |
| 2115 | [23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 } |
| 2116 | [23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098 } |
| 2117 | [23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime = 36976733240 } |
| 2118 | [23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: { 0 }, { hrtimer = 3993812192 } |
| 2119 | [23:21:56.989819631] (+0.000001117) hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 } |
| 2120 | [23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer = 3993812192, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 } |
| 2121 | [23:21:56.989822984] (+0.000001118) hrtimer_start: { 1 }, { hrtimer = 3993865440, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 } |
| 2122 | [23:21:56.989832762] (+0.000009778) softirq_entry: { 1 }, { vec = 1 } |
| 2123 | [23:21:56.989833879] (+0.000001117) softirq_entry: { 0 }, { vec = 1 } |
| 2124 | [23:21:56.989838069] (+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 } |
| 2125 | [23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer = 3993818708 } |
| 2126 | [23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1 }, { timer = 3993871956, now = 79515980, function = 3238277552 } |
| 2127 | [23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer = 3993818708, now = 79515980, function = 3238277552 } |
| 2128 | [23:21:56.989854831] (+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 49237, vruntime = 43368363335 } |
| 2129 | [23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 } |
| 2130 | [23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm = "kworker/1:1", tid = 21, delay = 9451318 } |
| 2131 | [23:21:56.989862374] (+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4, delay = 9958820 } |
| 2132 | [23:21:56.989868241] (+0.000005867) sched_wakeup: { 0 }, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu = 0 } |
| 2133 | [23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm = "kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 } |
| 2134 | [23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer = 3993871956 } |
| 2135 | [23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0 }, { timer = 3993818708 } |
| 2136 | . |
| 2137 | . |
| 2138 | . |
| 2139 | |
| 2140 | You can now safely destroy the trace |
Andrew Geissler | 615f2f1 | 2022-07-15 14:00:58 -0500 | [diff] [blame] | 2141 | session (note that this doesn't delete the trace --- it's still there in |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2142 | ``~/lttng-traces``):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2143 | |
| 2144 | root@crownbay:~# lttng destroy |
| 2145 | Session auto-20121015-232120 destroyed at /home/root |
| 2146 | |
| 2147 | Note that the trace is saved in a directory of the same name as returned by |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2148 | ``lttng create``, under the ``~/lttng-traces`` directory (note that you can change this by |
| 2149 | supplying your own name to ``lttng create``):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2150 | |
| 2151 | root@crownbay:~# ls -al ~/lttng-traces |
| 2152 | drwxrwx--- 3 root root 1024 Oct 15 23:21 . |
| 2153 | drwxr-xr-x 5 root root 1024 Oct 15 23:57 .. |
| 2154 | drwxrwx--- 3 root root 1024 Oct 15 23:21 auto-20121015-232120 |
| 2155 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2156 | Collecting and viewing a user space trace on the target (inside a shell) |
| 2157 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2158 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2159 | For LTTng user space tracing, you need to have a properly instrumented |
| 2160 | user space program. For this example, we'll use the ``hello`` test program |
| 2161 | generated by the ``lttng-ust`` build. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2162 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2163 | The ``hello`` test program isn't installed on the root filesystem by the ``lttng-ust`` |
| 2164 | build, so we need to copy it over manually. First ``cd`` into the build |
| 2165 | directory that contains the ``hello`` executable:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2166 | |
| 2167 | $ cd build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs |
| 2168 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2169 | Copy that over to the target machine:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2170 | |
| 2171 | $ scp hello root@192.168.1.20: |
| 2172 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2173 | You now have the instrumented LTTng "hello world" test program on the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2174 | target, ready to test. |
| 2175 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2176 | First, from the host, connect to the target through SSH:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2177 | |
| 2178 | $ ssh -l root 192.168.1.47 |
| 2179 | The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established. |
| 2180 | RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e. |
| 2181 | Are you sure you want to continue connecting (yes/no)? yes |
| 2182 | Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts. |
| 2183 | root@192.168.1.47's password: |
| 2184 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2185 | Once on the target, use these steps to create a trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2186 | |
| 2187 | root@crownbay:~# lttng create |
| 2188 | Session auto-20190303-021943 created. |
| 2189 | Traces will be written in /home/root/lttng-traces/auto-20190303-021943 |
| 2190 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2191 | Enable the events you want to trace (in this case all user space events):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2192 | |
| 2193 | root@crownbay:~# lttng enable-event --userspace --all |
| 2194 | All UST events are enabled in channel channel0 |
| 2195 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2196 | Start the trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2197 | |
| 2198 | root@crownbay:~# lttng start |
| 2199 | Tracing started for session auto-20190303-021943 |
| 2200 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2201 | Run the instrumented "hello world" program:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2202 | |
| 2203 | root@crownbay:~# ./hello |
| 2204 | Hello, World! |
| 2205 | Tracing... done. |
| 2206 | |
| 2207 | And then stop the trace after awhile or after running a particular workload |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2208 | that you want to trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2209 | |
| 2210 | root@crownbay:~# lttng stop |
| 2211 | Tracing stopped for session auto-20190303-021943 |
| 2212 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2213 | You can now view the trace in text form on the target:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2214 | |
| 2215 | root@crownbay:~# lttng view |
| 2216 | [02:31:14.906146544] (+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0, netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 } |
| 2217 | [02:31:14.906170360] (+0.000023816) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1, longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 } |
| 2218 | [02:31:14.906183140] (+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2, netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 } |
| 2219 | [02:31:14.906194385] (+0.000011245) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3, longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 } |
| 2220 | . |
| 2221 | . |
| 2222 | . |
| 2223 | |
| 2224 | You can now safely destroy the trace session (note that this doesn't delete the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2225 | trace --- it's still there in ``~/lttng-traces``):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2226 | |
| 2227 | root@crownbay:~# lttng destroy |
| 2228 | Session auto-20190303-021943 destroyed at /home/root |
| 2229 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2230 | LTTng Documentation |
| 2231 | ------------------- |
| 2232 | |
| 2233 | You can find the primary LTTng Documentation on the `LTTng |
| 2234 | Documentation <https://lttng.org/docs/>`__ site. The documentation on |
| 2235 | this site is appropriate for intermediate to advanced software |
| 2236 | developers who are working in a Linux environment and are interested in |
| 2237 | efficient software tracing. |
| 2238 | |
| 2239 | For information on LTTng in general, visit the `LTTng |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 2240 | Project <https://lttng.org/lttng2.0>`__ site. You can find a "Getting |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2241 | Started" link on this site that takes you to an LTTng Quick Start. |
| 2242 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2243 | blktrace |
| 2244 | ======== |
| 2245 | |
| 2246 | blktrace is a tool for tracing and reporting low-level disk I/O. |
| 2247 | blktrace provides the tracing half of the equation; its output can be |
| 2248 | piped into the blkparse program, which renders the data in a |
| 2249 | human-readable form and does some basic analysis: |
| 2250 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2251 | blktrace Setup |
| 2252 | -------------- |
| 2253 | |
| 2254 | For this section, we'll assume you've already performed the basic setup |
Andrew Geissler | 09209ee | 2020-12-13 08:44:15 -0600 | [diff] [blame] | 2255 | outlined in the ":ref:`profile-manual/intro:General Setup`" |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2256 | section. |
| 2257 | |
| 2258 | blktrace is an application that runs on the target system. You can run |
| 2259 | the entire blktrace and blkparse pipeline on the target, or you can run |
| 2260 | blktrace in 'listen' mode on the target and have blktrace and blkparse |
| 2261 | collect and analyze the data on the host (see the |
Andrew Geissler | 09209ee | 2020-12-13 08:44:15 -0600 | [diff] [blame] | 2262 | ":ref:`profile-manual/usage:Using blktrace Remotely`" section |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2263 | below). For the rest of this section we assume you've to the host through SSH |
| 2264 | and will be running blktrace on the target. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2265 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2266 | Basic blktrace Usage |
| 2267 | -------------------- |
| 2268 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2269 | To record a trace, just run the ``blktrace`` command, giving it the name |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2270 | of the block device you want to trace activity on:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2271 | |
| 2272 | root@crownbay:~# blktrace /dev/sdc |
| 2273 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2274 | In another shell, execute a workload you want to trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2275 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 2276 | root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2; sync |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2277 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 2278 | linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA |
| 2279 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2280 | Press ``Ctrl-C`` in the blktrace shell to stop the trace. It |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2281 | will display how many events were logged, along with the per-cpu file |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2282 | sizes (blktrace records traces in per-cpu kernel buffers and just |
| 2283 | dumps them to user space for blkparse to merge and sort later):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2284 | |
| 2285 | ^C=== sdc === |
| 2286 | CPU 0: 7082 events, 332 KiB data |
| 2287 | CPU 1: 1578 events, 74 KiB data |
| 2288 | Total: 8660 events (dropped 0), 406 KiB data |
| 2289 | |
| 2290 | If you examine the files saved to disk, you see multiple files, one per CPU and |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2291 | with the device name as the first part of the filename:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2292 | |
| 2293 | root@crownbay:~# ls -al |
| 2294 | drwxr-xr-x 6 root root 1024 Oct 27 22:39 . |
| 2295 | drwxr-sr-x 4 root root 1024 Oct 26 18:24 .. |
| 2296 | -rw-r--r-- 1 root root 339938 Oct 27 22:40 sdc.blktrace.0 |
| 2297 | -rw-r--r-- 1 root root 75753 Oct 27 22:40 sdc.blktrace.1 |
| 2298 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2299 | To view the trace events, just call ``blkparse`` in the directory |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2300 | containing the trace files, giving it the device name that forms the |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2301 | first part of the filenames:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2302 | |
| 2303 | root@crownbay:~# blkparse sdc |
| 2304 | |
| 2305 | 8,32 1 1 0.000000000 1225 Q WS 3417048 + 8 [jbd2/sdc-8] |
| 2306 | 8,32 1 2 0.000025213 1225 G WS 3417048 + 8 [jbd2/sdc-8] |
| 2307 | 8,32 1 3 0.000033384 1225 P N [jbd2/sdc-8] |
| 2308 | 8,32 1 4 0.000043301 1225 I WS 3417048 + 8 [jbd2/sdc-8] |
| 2309 | 8,32 1 0 0.000057270 0 m N cfq1225 insert_request |
| 2310 | 8,32 1 0 0.000064813 0 m N cfq1225 add_to_rr |
| 2311 | 8,32 1 5 0.000076336 1225 U N [jbd2/sdc-8] 1 |
| 2312 | 8,32 1 0 0.000088559 0 m N cfq workload slice:150 |
| 2313 | 8,32 1 0 0.000097359 0 m N cfq1225 set_active wl_prio:0 wl_type:1 |
| 2314 | 8,32 1 0 0.000104063 0 m N cfq1225 Not idling. st->count:1 |
| 2315 | 8,32 1 0 0.000112584 0 m N cfq1225 fifo= (null) |
| 2316 | 8,32 1 0 0.000118730 0 m N cfq1225 dispatch_insert |
| 2317 | 8,32 1 0 0.000127390 0 m N cfq1225 dispatched a request |
| 2318 | 8,32 1 0 0.000133536 0 m N cfq1225 activate rq, drv=1 |
| 2319 | 8,32 1 6 0.000136889 1225 D WS 3417048 + 8 [jbd2/sdc-8] |
| 2320 | 8,32 1 7 0.000360381 1225 Q WS 3417056 + 8 [jbd2/sdc-8] |
| 2321 | 8,32 1 8 0.000377422 1225 G WS 3417056 + 8 [jbd2/sdc-8] |
| 2322 | 8,32 1 9 0.000388876 1225 P N [jbd2/sdc-8] |
| 2323 | 8,32 1 10 0.000397886 1225 Q WS 3417064 + 8 [jbd2/sdc-8] |
| 2324 | 8,32 1 11 0.000404800 1225 M WS 3417064 + 8 [jbd2/sdc-8] |
| 2325 | 8,32 1 12 0.000412343 1225 Q WS 3417072 + 8 [jbd2/sdc-8] |
| 2326 | 8,32 1 13 0.000416533 1225 M WS 3417072 + 8 [jbd2/sdc-8] |
| 2327 | 8,32 1 14 0.000422121 1225 Q WS 3417080 + 8 [jbd2/sdc-8] |
| 2328 | 8,32 1 15 0.000425194 1225 M WS 3417080 + 8 [jbd2/sdc-8] |
| 2329 | 8,32 1 16 0.000431968 1225 Q WS 3417088 + 8 [jbd2/sdc-8] |
| 2330 | 8,32 1 17 0.000435251 1225 M WS 3417088 + 8 [jbd2/sdc-8] |
| 2331 | 8,32 1 18 0.000440279 1225 Q WS 3417096 + 8 [jbd2/sdc-8] |
| 2332 | 8,32 1 19 0.000443911 1225 M WS 3417096 + 8 [jbd2/sdc-8] |
| 2333 | 8,32 1 20 0.000450336 1225 Q WS 3417104 + 8 [jbd2/sdc-8] |
| 2334 | 8,32 1 21 0.000454038 1225 M WS 3417104 + 8 [jbd2/sdc-8] |
| 2335 | 8,32 1 22 0.000462070 1225 Q WS 3417112 + 8 [jbd2/sdc-8] |
| 2336 | 8,32 1 23 0.000465422 1225 M WS 3417112 + 8 [jbd2/sdc-8] |
| 2337 | 8,32 1 24 0.000474222 1225 I WS 3417056 + 64 [jbd2/sdc-8] |
| 2338 | 8,32 1 0 0.000483022 0 m N cfq1225 insert_request |
| 2339 | 8,32 1 25 0.000489727 1225 U N [jbd2/sdc-8] 1 |
| 2340 | 8,32 1 0 0.000498457 0 m N cfq1225 Not idling. st->count:1 |
| 2341 | 8,32 1 0 0.000503765 0 m N cfq1225 dispatch_insert |
| 2342 | 8,32 1 0 0.000512914 0 m N cfq1225 dispatched a request |
| 2343 | 8,32 1 0 0.000518851 0 m N cfq1225 activate rq, drv=2 |
| 2344 | . |
| 2345 | . |
| 2346 | . |
| 2347 | 8,32 0 0 58.515006138 0 m N cfq3551 complete rqnoidle 1 |
| 2348 | 8,32 0 2024 58.516603269 3 C WS 3156992 + 16 [0] |
| 2349 | 8,32 0 0 58.516626736 0 m N cfq3551 complete rqnoidle 1 |
| 2350 | 8,32 0 0 58.516634558 0 m N cfq3551 arm_idle: 8 group_idle: 0 |
| 2351 | 8,32 0 0 58.516636933 0 m N cfq schedule dispatch |
| 2352 | 8,32 1 0 58.516971613 0 m N cfq3551 slice expired t=0 |
| 2353 | 8,32 1 0 58.516982089 0 m N cfq3551 sl_used=13 disp=6 charge=13 iops=0 sect=80 |
| 2354 | 8,32 1 0 58.516985511 0 m N cfq3551 del_from_rr |
| 2355 | 8,32 1 0 58.516990819 0 m N cfq3551 put_queue |
| 2356 | |
| 2357 | CPU0 (sdc): |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 2358 | Reads Queued: 0, 0KiB Writes Queued: 331, 26,284KiB |
| 2359 | Read Dispatches: 0, 0KiB Write Dispatches: 485, 40,484KiB |
| 2360 | Reads Requeued: 0 Writes Requeued: 0 |
| 2361 | Reads Completed: 0, 0KiB Writes Completed: 511, 41,000KiB |
| 2362 | Read Merges: 0, 0KiB Write Merges: 13, 160KiB |
| 2363 | Read depth: 0 Write depth: 2 |
| 2364 | IO unplugs: 23 Timer unplugs: 0 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2365 | CPU1 (sdc): |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 2366 | Reads Queued: 0, 0KiB Writes Queued: 249, 15,800KiB |
| 2367 | Read Dispatches: 0, 0KiB Write Dispatches: 42, 1,600KiB |
| 2368 | Reads Requeued: 0 Writes Requeued: 0 |
| 2369 | Reads Completed: 0, 0KiB Writes Completed: 16, 1,084KiB |
| 2370 | Read Merges: 0, 0KiB Write Merges: 40, 276KiB |
| 2371 | Read depth: 0 Write depth: 2 |
| 2372 | IO unplugs: 30 Timer unplugs: 1 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2373 | |
| 2374 | Total (sdc): |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 2375 | Reads Queued: 0, 0KiB Writes Queued: 580, 42,084KiB |
| 2376 | Read Dispatches: 0, 0KiB Write Dispatches: 527, 42,084KiB |
| 2377 | Reads Requeued: 0 Writes Requeued: 0 |
| 2378 | Reads Completed: 0, 0KiB Writes Completed: 527, 42,084KiB |
| 2379 | Read Merges: 0, 0KiB Write Merges: 53, 436KiB |
| 2380 | IO unplugs: 53 Timer unplugs: 1 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2381 | |
| 2382 | Throughput (R/W): 0KiB/s / 719KiB/s |
| 2383 | Events (sdc): 6,592 entries |
| 2384 | Skips: 0 forward (0 - 0.0%) |
| 2385 | Input file sdc.blktrace.0 added |
| 2386 | Input file sdc.blktrace.1 added |
| 2387 | |
| 2388 | The report shows each event that was |
| 2389 | found in the blktrace data, along with a summary of the overall block |
| 2390 | I/O traffic during the run. You can look at the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2391 | `blkparse <https://linux.die.net/man/1/blkparse>`__ manual page to learn the |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2392 | meaning of each field displayed in the trace listing. |
| 2393 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2394 | Live Mode |
| 2395 | ~~~~~~~~~ |
| 2396 | |
| 2397 | blktrace and blkparse are designed from the ground up to be able to |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2398 | operate together in a "pipe mode" where the standard output of blktrace can be |
| 2399 | fed directly into the standard input of blkparse:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2400 | |
| 2401 | root@crownbay:~# blktrace /dev/sdc -o - | blkparse -i - |
| 2402 | |
| 2403 | This enables long-lived tracing sessions |
| 2404 | to run without writing anything to disk, and allows the user to look for |
| 2405 | certain conditions in the trace data in 'real-time' by viewing the trace |
| 2406 | output as it scrolls by on the screen or by passing it along to yet |
| 2407 | another program in the pipeline such as grep which can be used to |
| 2408 | identify and capture conditions of interest. |
| 2409 | |
| 2410 | There's actually another blktrace command that implements the above |
| 2411 | pipeline as a single command, so the user doesn't have to bother typing |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2412 | in the above command sequence:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2413 | |
| 2414 | root@crownbay:~# btrace /dev/sdc |
| 2415 | |
| 2416 | Using blktrace Remotely |
| 2417 | ~~~~~~~~~~~~~~~~~~~~~~~ |
| 2418 | |
| 2419 | Because blktrace traces block I/O and at the same time normally writes |
| 2420 | its trace data to a block device, and in general because it's not really |
| 2421 | a great idea to make the device being traced the same as the device the |
| 2422 | tracer writes to, blktrace provides a way to trace without perturbing |
| 2423 | the traced device at all by providing native support for sending all |
| 2424 | trace data over the network. |
| 2425 | |
Patrick Williams | 520786c | 2023-06-25 16:20:36 -0500 | [diff] [blame] | 2426 | To have blktrace operate in this mode, start blktrace in server mode on the |
| 2427 | host system, which is going to store the captured data:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2428 | |
Patrick Williams | 520786c | 2023-06-25 16:20:36 -0500 | [diff] [blame] | 2429 | $ blktrace -l |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2430 | server: waiting for connections... |
| 2431 | |
Patrick Williams | 520786c | 2023-06-25 16:20:36 -0500 | [diff] [blame] | 2432 | On the target system that is going to be traced, start blktrace in client |
| 2433 | mode with the -h option to connect to the host system, also passing it the |
| 2434 | device to trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2435 | |
Patrick Williams | 520786c | 2023-06-25 16:20:36 -0500 | [diff] [blame] | 2436 | root@crownbay:~# blktrace -d /dev/sdc -h 192.168.1.43 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2437 | blktrace: connecting to 192.168.1.43 |
| 2438 | blktrace: connected! |
| 2439 | |
Patrick Williams | 520786c | 2023-06-25 16:20:36 -0500 | [diff] [blame] | 2440 | On the host system, you should see this:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2441 | |
| 2442 | server: connection from 192.168.1.43 |
| 2443 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2444 | In another shell, execute a workload you want to trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2445 | |
Andrew Geissler | eff2747 | 2021-10-29 15:35:00 -0500 | [diff] [blame] | 2446 | root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2; sync |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2447 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 2448 | linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA |
| 2449 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2450 | When it's done, do a ``Ctrl-C`` on the target system to stop the |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2451 | trace:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2452 | |
| 2453 | ^C=== sdc === |
| 2454 | CPU 0: 7691 events, 361 KiB data |
| 2455 | CPU 1: 4109 events, 193 KiB data |
| 2456 | Total: 11800 events (dropped 0), 554 KiB data |
| 2457 | |
Patrick Williams | 520786c | 2023-06-25 16:20:36 -0500 | [diff] [blame] | 2458 | On the host system, you should also see a trace summary for the trace |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2459 | just ended:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2460 | |
| 2461 | server: end of run for 192.168.1.43:sdc |
| 2462 | === sdc === |
| 2463 | CPU 0: 7691 events, 361 KiB data |
| 2464 | CPU 1: 4109 events, 193 KiB data |
| 2465 | Total: 11800 events (dropped 0), 554 KiB data |
| 2466 | |
| 2467 | The blktrace instance on the host will |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2468 | save the target output inside a ``<hostname>-<timestamp>`` directory:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2469 | |
| 2470 | $ ls -al |
| 2471 | drwxr-xr-x 10 root root 1024 Oct 28 02:40 . |
| 2472 | drwxr-sr-x 4 root root 1024 Oct 26 18:24 .. |
| 2473 | drwxr-xr-x 2 root root 1024 Oct 28 02:40 192.168.1.43-2012-10-28-02:40:56 |
| 2474 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2475 | ``cd`` into that directory to see the output files:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2476 | |
| 2477 | $ ls -l |
| 2478 | -rw-r--r-- 1 root root 369193 Oct 28 02:44 sdc.blktrace.0 |
| 2479 | -rw-r--r-- 1 root root 197278 Oct 28 02:44 sdc.blktrace.1 |
| 2480 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2481 | And run blkparse on the host system using the device name:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2482 | |
| 2483 | $ blkparse sdc |
| 2484 | |
| 2485 | 8,32 1 1 0.000000000 1263 Q RM 6016 + 8 [ls] |
| 2486 | 8,32 1 0 0.000036038 0 m N cfq1263 alloced |
| 2487 | 8,32 1 2 0.000039390 1263 G RM 6016 + 8 [ls] |
| 2488 | 8,32 1 3 0.000049168 1263 I RM 6016 + 8 [ls] |
| 2489 | 8,32 1 0 0.000056152 0 m N cfq1263 insert_request |
| 2490 | 8,32 1 0 0.000061600 0 m N cfq1263 add_to_rr |
| 2491 | 8,32 1 0 0.000075498 0 m N cfq workload slice:300 |
| 2492 | . |
| 2493 | . |
| 2494 | . |
| 2495 | 8,32 0 0 177.266385696 0 m N cfq1267 arm_idle: 8 group_idle: 0 |
| 2496 | 8,32 0 0 177.266388140 0 m N cfq schedule dispatch |
| 2497 | 8,32 1 0 177.266679239 0 m N cfq1267 slice expired t=0 |
| 2498 | 8,32 1 0 177.266689297 0 m N cfq1267 sl_used=9 disp=6 charge=9 iops=0 sect=56 |
| 2499 | 8,32 1 0 177.266692649 0 m N cfq1267 del_from_rr |
| 2500 | 8,32 1 0 177.266696560 0 m N cfq1267 put_queue |
| 2501 | |
| 2502 | CPU0 (sdc): |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 2503 | Reads Queued: 0, 0KiB Writes Queued: 270, 21,708KiB |
| 2504 | Read Dispatches: 59, 2,628KiB Write Dispatches: 495, 39,964KiB |
| 2505 | Reads Requeued: 0 Writes Requeued: 0 |
| 2506 | Reads Completed: 90, 2,752KiB Writes Completed: 543, 41,596KiB |
| 2507 | Read Merges: 0, 0KiB Write Merges: 9, 344KiB |
| 2508 | Read depth: 2 Write depth: 2 |
| 2509 | IO unplugs: 20 Timer unplugs: 1 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2510 | CPU1 (sdc): |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 2511 | Reads Queued: 688, 2,752KiB Writes Queued: 381, 20,652KiB |
| 2512 | Read Dispatches: 31, 124KiB Write Dispatches: 59, 2,396KiB |
| 2513 | Reads Requeued: 0 Writes Requeued: 0 |
| 2514 | Reads Completed: 0, 0KiB Writes Completed: 11, 764KiB |
| 2515 | Read Merges: 598, 2,392KiB Write Merges: 88, 448KiB |
| 2516 | Read depth: 2 Write depth: 2 |
| 2517 | IO unplugs: 52 Timer unplugs: 0 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2518 | |
| 2519 | Total (sdc): |
Patrick Williams | 44b3caf | 2024-04-12 16:51:14 -0500 | [diff] [blame] | 2520 | Reads Queued: 688, 2,752KiB Writes Queued: 651, 42,360KiB |
| 2521 | Read Dispatches: 90, 2,752KiB Write Dispatches: 554, 42,360KiB |
| 2522 | Reads Requeued: 0 Writes Requeued: 0 |
| 2523 | Reads Completed: 90, 2,752KiB Writes Completed: 554, 42,360KiB |
| 2524 | Read Merges: 598, 2,392KiB Write Merges: 97, 792KiB |
| 2525 | IO unplugs: 72 Timer unplugs: 1 |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2526 | |
| 2527 | Throughput (R/W): 15KiB/s / 238KiB/s |
| 2528 | Events (sdc): 9,301 entries |
| 2529 | Skips: 0 forward (0 - 0.0%) |
| 2530 | |
| 2531 | You should see the trace events and summary just as you would have if you'd run |
| 2532 | the same command on the target. |
| 2533 | |
| 2534 | Tracing Block I/O via 'ftrace' |
| 2535 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 2536 | |
| 2537 | It's also possible to trace block I/O using only |
Andrew Geissler | 09209ee | 2020-12-13 08:44:15 -0600 | [diff] [blame] | 2538 | :ref:`profile-manual/usage:The 'trace events' Subsystem`, which |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2539 | can be useful for casual tracing if you don't want to bother dealing with the |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2540 | user space tools. |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2541 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2542 | To enable tracing for a given device, use ``/sys/block/xxx/trace/enable``, |
| 2543 | where ``xxx`` is the device name. This for example enables tracing for |
| 2544 | ``/dev/sdc``:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2545 | |
| 2546 | root@crownbay:/sys/kernel/debug/tracing# echo 1 > /sys/block/sdc/trace/enable |
| 2547 | |
| 2548 | Once you've selected the device(s) you want |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2549 | to trace, selecting the ``blk`` tracer will turn the blk tracer on:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2550 | |
| 2551 | root@crownbay:/sys/kernel/debug/tracing# cat available_tracers |
| 2552 | blk function_graph function nop |
| 2553 | |
| 2554 | root@crownbay:/sys/kernel/debug/tracing# echo blk > current_tracer |
| 2555 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2556 | Execute the workload you're interested in:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2557 | |
| 2558 | root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt |
| 2559 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2560 | And look at the output (note here that we're using ``trace_pipe`` instead of |
Andrew Geissler | 615f2f1 | 2022-07-15 14:00:58 -0500 | [diff] [blame] | 2561 | trace to capture this trace --- this allows us to wait around on the pipe |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2562 | for data to appear):: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2563 | |
| 2564 | root@crownbay:/sys/kernel/debug/tracing# cat trace_pipe |
| 2565 | cat-3587 [001] d..1 3023.276361: 8,32 Q R 1699848 + 8 [cat] |
| 2566 | cat-3587 [001] d..1 3023.276410: 8,32 m N cfq3587 alloced |
| 2567 | cat-3587 [001] d..1 3023.276415: 8,32 G R 1699848 + 8 [cat] |
| 2568 | cat-3587 [001] d..1 3023.276424: 8,32 P N [cat] |
| 2569 | cat-3587 [001] d..2 3023.276432: 8,32 I R 1699848 + 8 [cat] |
| 2570 | cat-3587 [001] d..1 3023.276439: 8,32 m N cfq3587 insert_request |
| 2571 | cat-3587 [001] d..1 3023.276445: 8,32 m N cfq3587 add_to_rr |
| 2572 | cat-3587 [001] d..2 3023.276454: 8,32 U N [cat] 1 |
| 2573 | cat-3587 [001] d..1 3023.276464: 8,32 m N cfq workload slice:150 |
| 2574 | cat-3587 [001] d..1 3023.276471: 8,32 m N cfq3587 set_active wl_prio:0 wl_type:2 |
| 2575 | cat-3587 [001] d..1 3023.276478: 8,32 m N cfq3587 fifo= (null) |
| 2576 | cat-3587 [001] d..1 3023.276483: 8,32 m N cfq3587 dispatch_insert |
| 2577 | cat-3587 [001] d..1 3023.276490: 8,32 m N cfq3587 dispatched a request |
| 2578 | cat-3587 [001] d..1 3023.276497: 8,32 m N cfq3587 activate rq, drv=1 |
| 2579 | cat-3587 [001] d..2 3023.276500: 8,32 D R 1699848 + 8 [cat] |
| 2580 | |
Andrew Geissler | c926e17 | 2021-05-07 16:11:35 -0500 | [diff] [blame] | 2581 | And this turns off tracing for the specified device:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2582 | |
| 2583 | root@crownbay:/sys/kernel/debug/tracing# echo 0 > /sys/block/sdc/trace/enable |
| 2584 | |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2585 | blktrace Documentation |
| 2586 | ---------------------- |
| 2587 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2588 | Online versions of the manual pages for the commands discussed in this |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2589 | section can be found here: |
| 2590 | |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 2591 | - https://linux.die.net/man/8/blktrace |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2592 | |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 2593 | - https://linux.die.net/man/1/blkparse |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2594 | |
Andrew Geissler | d1e8949 | 2021-02-12 15:35:20 -0600 | [diff] [blame] | 2595 | - https://linux.die.net/man/8/btrace |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2596 | |
Patrick Williams | 03514f1 | 2024-04-05 07:04:11 -0500 | [diff] [blame] | 2597 | The above manual pages, along with manuals for the other blktrace utilities |
| 2598 | (``btt``, ``blkiomon``, etc) can be found in the ``/doc`` directory of the blktrace |
| 2599 | tools git repository:: |
Andrew Geissler | c9f7865 | 2020-09-18 14:11:35 -0500 | [diff] [blame] | 2600 | |
| 2601 | $ git clone git://git.kernel.dk/blktrace.git |