Blame - poky/documentation/profile-manual/usage.rst - openbmc/openbmc

2021-02-12 15:35:20 -0600

[diff] [blame]

863

binding <https://linux.die.net/man/1/perf-script-python>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

864

865

System-Wide Tracing and Profiling

866

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

867

868

The examples so far have focused on tracing a particular program or

869

workload - in other words, every profiling run has specified the program

870

to profile in the command-line e.g. 'perf record wget ...'.

871

872

It's also possible, and more interesting in many cases, to run a

873

system-wide profile or trace while running the workload in a separate

874

shell.

875

876

To do system-wide profiling or tracing, you typically use the -a flag to

877

'perf record'.

878

879

To demonstrate this, open up one window and start the profile using the

880

-a flag (press Ctrl-C to stop tracing): ::

881

882

root@crownbay:~# perf record -g -a

883

^C[ perf record: Woken up 6 times to write data ]

884

[ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ]

885

886

In another window, run the wget test: ::

887

888

root@crownbay:~# wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2

889

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

890

linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA

891

892

Here we see entries not only for our wget load, but for

893

other processes running on the system as well:

894

895

.. image:: figures/perf-systemwide.png

896

:align: center

897

898

In the snapshot above, we can see callchains that originate in libc, and

899

a callchain from Xorg that demonstrates that we're using a proprietary X

900

driver in userspace (notice the presence of 'PVR' and some other

901

unresolvable symbols in the expanded Xorg callchain).

902

903

Note also that we have both kernel and userspace entries in the above

904

snapshot. We can also tell perf to focus on userspace but providing a

905

modifier, in this case 'u', to the 'cycles' hardware counter when we

906

record a profile: ::

907

908

root@crownbay:~# perf record -g -a -e cycles:u

909

^C[ perf record: Woken up 2 times to write data ]

910

[ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ]

911

912

.. image:: figures/perf-report-cycles-u.png

913

:align: center

914

915

Notice in the screenshot above, we see only userspace entries ([.])

916

917

Finally, we can press 'enter' on a leaf node and select the 'Zoom into

918

DSO' menu item to show only entries associated with a specific DSO. In

919

the screenshot below, we've zoomed into the 'libc' DSO which shows all

920

the entries associated with the libc-xxx.so DSO.

921

922

.. image:: figures/perf-systemwide-libc.png

923

:align: center

924

925

We can also use the system-wide -a switch to do system-wide tracing.

926

Here we'll trace a couple of scheduler events: ::

927

928

root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup

929

^C[ perf record: Woken up 38 times to write data ]

930

[ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ]

931

932

We can look at the raw output using 'perf script' with no arguments: ::

933

934

root@crownbay:~# perf script

935

936

perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

937

perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

938

kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120

939

swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000

940

swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

941

kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

942

perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

943

perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

944

kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120

945

perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

946

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Filtering

^^^^^^^^^

Notice that there are a lot of events that don't really have anything to

951

do with what we're interested in, namely events that schedule 'perf'

952

itself in and out or that wake perf up. We can get rid of those by using

953

the '--filter' option - for each event we specify using -e, we can add a

954

--filter after that to filter out trace events that contain fields with

955

specific values: ::

956

957

root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf'

958

^C[ perf record: Woken up 38 times to write data ]

959

[ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ]

960

961

962

root@crownbay:~# perf script

963

964

swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

965

kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

966

perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

967

perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

968

perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

969

perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

970

perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

971

perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

972

swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

973

swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

974

kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120

975

swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000

976

swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

977

kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

978

979

In this case, we've filtered out all events that have

980

'perf' in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice that

981

there are still events recorded for perf, but notice that those events

982

don't have values of 'perf' for the filtered fields. To completely

983

filter out anything from perf will require a bit more work, but for the

984

purpose of demonstrating how to use filters, it's close enough.

985

986

.. admonition:: Tying it Together

987

988

These are exactly the same set of event filters defined by the trace

989

event subsystem. See the ftrace/tracecmd/kernelshark section for more

990

discussion about these event filters.

991

992

.. admonition:: Tying it Together

993

994

These event filters are implemented by a special-purpose

995

pseudo-interpreter in the kernel and are an integral and

996

indispensable part of the perf design as it relates to tracing.

997

kernel-based event filters provide a mechanism to precisely throttle

998

the event stream that appears in user space, where it makes sense to

999

provide bindings to real programming languages for postprocessing the

1000

event stream. This architecture allows for the intelligent and

1001

flexible partitioning of processing between the kernel and user

1002

space. Contrast this with other tools such as SystemTap, which does

1003

all of its processing in the kernel and as such requires a special

1004

project-defined language in order to accommodate that design, or

1005

LTTng, where everything is sent to userspace and as such requires a

1006

super-efficient kernel-to-userspace transport mechanism in order to

1007

function properly. While perf certainly can benefit from for instance

1008

advances in the design of the transport, it doesn't fundamentally

1009

depend on them. Basically, if you find that your perf tracing

1010

application is causing buffer I/O overruns, it probably means that

1011

you aren't taking enough advantage of the kernel filtering engine.

1012

1013

Using Dynamic Tracepoints

1014

~~~~~~~~~~~~~~~~~~~~~~~~~

1015

1016

perf isn't restricted to the fixed set of static tracepoints listed by

1017

'perf list'. Users can also add their own 'dynamic' tracepoints anywhere

1018

in the kernel. For instance, suppose we want to define our own

1019

tracepoint on do_fork(). We can do that using the 'perf probe' perf

1020

subcommand: ::

1021

1022

root@crownbay:~# perf probe do_fork

1023

Added new event:

1024

probe:do_fork (on do_fork)

1025

1026

You can now use it in all perf tools, such as:

1027

1028

perf record -e probe:do_fork -aR sleep 1

1029

1030

Adding a new tracepoint via

1031

'perf probe' results in an event with all the expected files and format

1032

in /sys/kernel/debug/tracing/events, just the same as for static

1033

tracepoints (as discussed in more detail in the trace events subsystem

1034

section: ::

1035

1036

root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al

1037

drwxr-xr-x 2 root root 0 Oct 28 11:42 .

1038

drwxr-xr-x 3 root root 0 Oct 28 11:42 ..

1039

-rw-r--r-- 1 root root 0 Oct 28 11:42 enable

1040

-rw-r--r-- 1 root root 0 Oct 28 11:42 filter

1041

-r--r--r-- 1 root root 0 Oct 28 11:42 format

1042

-r--r--r-- 1 root root 0 Oct 28 11:42 id

1043

1044

root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format

name: do_fork

ID: 944

format:

field:unsigned short common_type; offset:0; size:2; signed:0;

1049

field:unsigned char common_flags; offset:2; size:1; signed:0;

1050

field:unsigned char common_preempt_count; offset:3; size:1; signed:0;

1051

field:int common_pid; offset:4; size:4; signed:1;

1052

field:int common_padding; offset:8; size:4; signed:1;

1053

1054

field:unsigned long __probe_ip; offset:12; size:4; signed:0;

1055

1056

print fmt: "(%lx)", REC->__probe_ip

1057

1058

We can list all dynamic tracepoints currently in

1059

existence: ::

1060

1061

root@crownbay:~# perf probe -l

1062

probe:do_fork (on do_fork)

1063

probe:schedule (on schedule)

1064

1065

Let's record system-wide ('sleep 30' is a

1066

trick for recording system-wide but basically do nothing and then wake

1067

up after 30 seconds): ::

1068

1069

root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30

1070

[ perf record: Woken up 1 times to write data ]

1071

[ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ]

1072

1073

Using 'perf script' we can see each do_fork event that fired: ::

1074

1075

root@crownbay:~# perf script

1076

1077

# ========

1078

# captured on: Sun Oct 28 11:55:18 2012

1079

# hostname : crownbay

1080

# os release : 3.4.11-yocto-standard

1081

# perf version : 3.4.11

# arch : i686

# nrcpus online : 2

# nrcpus avail : 2

# cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz

1086

# cpuid : GenuineIntel,6,38,1

1087

# total memory : 1017184 kB

1088

# cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30

1089

# event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern

1090

= 0, id = { 5, 6 }

1091

# HEADER_CPU_TOPOLOGY info available, use -I to display

1092

# ========

1093

#

1094

matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460)

1095

matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460)

1096

pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460)

1097

pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460)

1098

matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460)

1099

matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460)

1100

gthumb 1300 [001] 34217.697451: do_fork: (c1028460)

1101

gthumb 1300 [001] 34219.085734: do_fork: (c1028460)

1102

gthumb 1300 [000] 34219.121351: do_fork: (c1028460)

1103

gthumb 1300 [001] 34219.264551: do_fork: (c1028460)

1104

pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460)

1105

matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460)

1106

matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460)

1107

matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460)

1108

matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460)

1109

matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460)

1110

matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460)

1111

matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460)

1112

gaku 1312 [000] 34237.202388: do_fork: (c1028460)

1113

1114

And using 'perf report' on the same file, we can see the

1115

callgraphs from starting a few programs during those 30 seconds:

1116

1117

.. image:: figures/perf-probe-do_fork-profile.png

1118

:align: center

1119

1120

.. admonition:: Tying it Together

1121

1122

The trace events subsystem accommodate static and dynamic tracepoints

1123

in exactly the same way - there's no difference as far as the

1124

infrastructure is concerned. See the ftrace section for more details

1125

on the trace event subsystem.

1126

1127

.. admonition:: Tying it Together

1128

1129

Dynamic tracepoints are implemented under the covers by kprobes and

1130

uprobes. kprobes and uprobes are also used by and in fact are the

1131

main focus of SystemTap.

1132

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Perf Documentation

------------------

Online versions of the man pages for the commands discussed in this

1137

section can be found here:

1138

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1139

- The `'perf stat' manpage <https://linux.die.net/man/1/perf-stat>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1140

1141

- The `'perf record'

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1142

manpage <https://linux.die.net/man/1/perf-record>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1143

1144

- The `'perf report'

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1145

manpage <https://linux.die.net/man/1/perf-report>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1146

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1147

- The `'perf probe' manpage <https://linux.die.net/man/1/perf-probe>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1148

1149

- The `'perf script'

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1150

manpage <https://linux.die.net/man/1/perf-script>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1151

1152

- Documentation on using the `'perf script' python

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1153

binding <https://linux.die.net/man/1/perf-script-python>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1154

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1155

- The top-level `perf(1) manpage <https://linux.die.net/man/1/perf>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1156

1157

Normally, you should be able to invoke the man pages via perf itself

1158

e.g. 'perf help' or 'perf help record'.

1159

1160

However, by default Yocto doesn't install man pages, but perf invokes

1161

the man pages for most help functionality. This is a bug and is being

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1162

addressed by a Yocto bug: :yocto_bugs:`Bug 3388 - perf: enable man pages for

1163

basic 'help' functionality </show_bug.cgi?id=3388>`.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1164

1165

The man pages in text form, along with some other files, such as a set

1166

of examples, can be found in the 'perf' directory of the kernel tree: ::

1167

1168

tools/perf/Documentation

1169

1170

There's also a nice perf tutorial on the perf

1171

wiki that goes into more detail than we do here in certain areas: `Perf

1172

Tutorial <https://perf.wiki.kernel.org/index.php/Tutorial>`__

1173

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

ftrace

======

'ftrace' literally refers to the 'ftrace function tracer' but in reality

1178

this encompasses a number of related tracers along with the

1179

infrastructure that they all make use of.

1180

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

ftrace Setup

------------

For this section, we'll assume you've already performed the basic setup

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

1185

outlined in the ":ref:`profile-manual/intro:General Setup`" section.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1186

1187

ftrace, trace-cmd, and kernelshark run on the target system, and are

1188

ready to go out-of-the-box - no additional setup is necessary. For the

1189

rest of this section we assume you've ssh'ed to the host and will be

1190

running ftrace on the target. kernelshark is a GUI application and if

1191

you use the '-X' option to ssh you can have the kernelshark GUI run on

1192

the target but display remotely on the host if you want.

Basic ftrace usage

------------------

'ftrace' essentially refers to everything included in the /tracing

1198

directory of the mounted debugfs filesystem (Yocto follows the standard

1199

convention and mounts it at /sys/kernel/debug). Here's a listing of all

1200

the files found in /sys/kernel/debug/tracing on a Yocto system: ::

1201

1202

root@sugarbay:/sys/kernel/debug/tracing# ls

1203

README kprobe_events trace

1204

available_events kprobe_profile trace_clock

1205

available_filter_functions options trace_marker

1206

available_tracers per_cpu trace_options

1207

buffer_size_kb printk_formats trace_pipe

1208

buffer_total_size_kb saved_cmdlines tracing_cpumask

1209

current_tracer set_event tracing_enabled

1210

dyn_ftrace_total_info set_ftrace_filter tracing_on

1211

enabled_functions set_ftrace_notrace tracing_thresh

1212

events set_ftrace_pid

1213

free_buffer set_graph_function

1214

1215

The files listed above are used for various purposes

1216

- some relate directly to the tracers themselves, others are used to set

1217

tracing options, and yet others actually contain the tracing output when

1218

a tracer is in effect. Some of the functions can be guessed from their

1219

names, others need explanation; in any case, we'll cover some of the

1220

files we see here below but for an explanation of the others, please see

1221

the ftrace documentation.

1222

1223

We'll start by looking at some of the available built-in tracers.

1224

1225

cat'ing the 'available_tracers' file lists the set of available tracers: ::

1226

1227

root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers

1228

blk function_graph function nop

1229

1230

The 'current_tracer' file contains the tracer currently in effect: ::

1231

1232

root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer

1233

nop

1234

1235

The above listing of current_tracer shows that the

1236

'nop' tracer is in effect, which is just another way of saying that

1237

there's actually no tracer currently in effect.

1238

1239

echo'ing one of the available_tracers into current_tracer makes the

1240

specified tracer the current tracer: ::

1241

1242

root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer

1243

root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer

1244

function

1245

1246

The above sets the current tracer to be the 'function tracer'. This tracer

1247

traces every function call in the kernel and makes it available as the

1248

contents of the 'trace' file. Reading the 'trace' file lists the

1249

currently buffered function calls that have been traced by the function

1250

tracer: ::

1251

1252

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

# tracer: function

#

# entries-in-buffer/entries-written: 310629/766471 #P:8

1257

#

1258

# _-----=> irqs-off

1259

# / _----=> need-resched

1260

# | / _---=> hardirq/softirq

1261

# || / _--=> preempt-depth

1262

# ||| / delay

1263

# TASK-PID CPU# |||| TIMESTAMP FUNCTION

1264

# | | | |||| | |

1265

<idle>-0 [004] d..1 470.867169: ktime_get_real <-intel_idle

1266

<idle>-0 [004] d..1 470.867170: getnstimeofday <-ktime_get_real

1267

<idle>-0 [004] d..1 470.867171: ns_to_timeval <-intel_idle

1268

<idle>-0 [004] d..1 470.867171: ns_to_timespec <-ns_to_timeval

1269

<idle>-0 [004] d..1 470.867172: smp_apic_timer_interrupt <-apic_timer_interrupt

1270

<idle>-0 [004] d..1 470.867172: native_apic_mem_write <-smp_apic_timer_interrupt

1271

<idle>-0 [004] d..1 470.867172: irq_enter <-smp_apic_timer_interrupt

1272

<idle>-0 [004] d..1 470.867172: rcu_irq_enter <-irq_enter

1273

<idle>-0 [004] d..1 470.867173: rcu_idle_exit_common.isra.33 <-rcu_irq_enter

1274

<idle>-0 [004] d..1 470.867173: local_bh_disable <-irq_enter

1275

<idle>-0 [004] d..1 470.867173: add_preempt_count <-local_bh_disable

1276

<idle>-0 [004] d.s1 470.867174: tick_check_idle <-irq_enter

1277

<idle>-0 [004] d.s1 470.867174: tick_check_oneshot_broadcast <-tick_check_idle

1278

<idle>-0 [004] d.s1 470.867174: ktime_get <-tick_check_idle

1279

<idle>-0 [004] d.s1 470.867174: tick_nohz_stop_idle <-tick_check_idle

1280

<idle>-0 [004] d.s1 470.867175: update_ts_time_stats <-tick_nohz_stop_idle

1281

<idle>-0 [004] d.s1 470.867175: nr_iowait_cpu <-update_ts_time_stats

1282

<idle>-0 [004] d.s1 470.867175: tick_do_update_jiffies64 <-tick_check_idle

1283

<idle>-0 [004] d.s1 470.867175: _raw_spin_lock <-tick_do_update_jiffies64

1284

<idle>-0 [004] d.s1 470.867176: add_preempt_count <-_raw_spin_lock

1285

<idle>-0 [004] d.s2 470.867176: do_timer <-tick_do_update_jiffies64

1286

<idle>-0 [004] d.s2 470.867176: _raw_spin_lock <-do_timer

1287

<idle>-0 [004] d.s2 470.867176: add_preempt_count <-_raw_spin_lock

1288

<idle>-0 [004] d.s3 470.867177: ntp_tick_length <-do_timer

1289

<idle>-0 [004] d.s3 470.867177: _raw_spin_lock_irqsave <-ntp_tick_length

.

.

.

Each line in the trace above shows what was happening in the kernel on a given

1295

cpu, to the level of detail of function calls. Each entry shows the function

1296

called, followed by its caller (after the arrow).

1297

1298

The function tracer gives you an extremely detailed idea of what the

1299

kernel was doing at the point in time the trace was taken, and is a

1300

great way to learn about how the kernel code works in a dynamic sense.

1301

1302

.. admonition:: Tying it Together

1303

1304

The ftrace function tracer is also available from within perf, as the

1305

ftrace:function tracepoint.

1306

1307

It is a little more difficult to follow the call chains than it needs to

1308

be - luckily there's a variant of the function tracer that displays the

1309

callchains explicitly, called the 'function_graph' tracer: ::

1310

1311

root@sugarbay:/sys/kernel/debug/tracing# echo function_graph > current_tracer

1312

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

1313

1314

tracer: function_graph

1315

1316

CPU DURATION FUNCTION CALLS

1317

| | | | | | |

1318

7) 0.046 us | pick_next_task_fair();

1319

7) 0.043 us | pick_next_task_stop();

1320

7) 0.042 us | pick_next_task_rt();

1321

7) 0.032 us | pick_next_task_fair();

1322

7) 0.030 us | pick_next_task_idle();

1323

7) | _raw_spin_unlock_irq() {

1324

7) 0.033 us | sub_preempt_count();

1325

7) 0.258 us | }

1326

7) 0.032 us | sub_preempt_count();

1327

7) + 13.341 us | } /* __schedule */

1328

7) 0.095 us | } /* sub_preempt_count */

1329

7) | schedule() {

1330

7) | __schedule() {

1331

7) 0.060 us | add_preempt_count();

1332

7) 0.044 us | rcu_note_context_switch();

1333

7) | _raw_spin_lock_irq() {

1334

7) 0.033 us | add_preempt_count();

1335

7) 0.247 us | }

1336

7) | idle_balance() {

1337

7) | _raw_spin_unlock() {

1338

7) 0.031 us | sub_preempt_count();

1339

7) 0.246 us | }

1340

7) | update_shares() {

1341

7) 0.030 us | __rcu_read_lock();

1342

7) 0.029 us | __rcu_read_unlock();

1343

7) 0.484 us | }

1344

7) 0.030 us | __rcu_read_lock();

1345

7) | load_balance() {

1346

7) | find_busiest_group() {

1347

7) 0.031 us | idle_cpu();

1348

7) 0.029 us | idle_cpu();

1349

7) 0.035 us | idle_cpu();

1350

7) 0.906 us | }

1351

7) 1.141 us | }

1352

7) 0.022 us | msecs_to_jiffies();

1353

7) | load_balance() {

1354

7) | find_busiest_group() {

1355

7) 0.031 us | idle_cpu();

.

.

.

4) 0.062 us | msecs_to_jiffies();

1360

4) 0.062 us | __rcu_read_unlock();

1361

4) | _raw_spin_lock() {

1362

4) 0.073 us | add_preempt_count();

1363

4) 0.562 us | }

1364

4) + 17.452 us | }

1365

4) 0.108 us | put_prev_task_fair();

1366

4) 0.102 us | pick_next_task_fair();

1367

4) 0.084 us | pick_next_task_stop();

1368

4) 0.075 us | pick_next_task_rt();

1369

4) 0.062 us | pick_next_task_fair();

1370

4) 0.066 us | pick_next_task_idle();

1371

------------------------------------------

1372

4) kworker-74 => <idle>-0

1373

------------------------------------------

1374

1375

4) | finish_task_switch() {

1376

4) | _raw_spin_unlock_irq() {

1377

4) 0.100 us | sub_preempt_count();

1378

4) 0.582 us | }

1379

4) 1.105 us | }

1380

4) 0.088 us | sub_preempt_count();

4) ! 100.066 us | }

.

.

.

3) | sys_ioctl() {

3) 0.083 us | fget_light();

1387

3) | security_file_ioctl() {

1388

3) 0.066 us | cap_file_ioctl();

1389

3) 0.562 us | }

1390

3) | do_vfs_ioctl() {

1391

3) | drm_ioctl() {

1392

3) 0.075 us | drm_ut_debug_printk();

1393

3) | i915_gem_pwrite_ioctl() {

1394

3) | i915_mutex_lock_interruptible() {

1395

3) 0.070 us | mutex_lock_interruptible();

1396

3) 0.570 us | }

1397

3) | drm_gem_object_lookup() {

1398

3) | _raw_spin_lock() {

1399

3) 0.080 us | add_preempt_count();

1400

3) 0.620 us | }

1401

3) | _raw_spin_unlock() {

1402

3) 0.085 us | sub_preempt_count();

1403

3) 0.562 us | }

1404

3) 2.149 us | }

1405

3) 0.133 us | i915_gem_object_pin();

1406

3) | i915_gem_object_set_to_gtt_domain() {

1407

3) 0.065 us | i915_gem_object_flush_gpu_write_domain();

1408

3) 0.065 us | i915_gem_object_wait_rendering();

1409

3) 0.062 us | i915_gem_object_flush_cpu_write_domain();

1410

3) 1.612 us | }

1411

3) | i915_gem_object_put_fence() {

1412

3) 0.097 us | i915_gem_object_flush_fence.constprop.36();

1413

3) 0.645 us | }

1414

3) 0.070 us | add_preempt_count();

1415

3) 0.070 us | sub_preempt_count();

1416

3) 0.073 us | i915_gem_object_unpin();

1417

3) 0.068 us | mutex_unlock();

3) 9.924 us | }

3) + 11.236 us | }

3) + 11.770 us | }

3) + 13.784 us | }

3) | sys_ioctl() {

As you can see, the function_graph display is much easier

1425

to follow. Also note that in addition to the function calls and

1426

associated braces, other events such as scheduler events are displayed

1427

in context. In fact, you can freely include any tracepoint available in

1428

the trace events subsystem described in the next section by simply

1429

enabling those events, and they'll appear in context in the function

1430

graph display. Quite a powerful tool for understanding kernel dynamics.

1431

1432

Also notice that there are various annotations on the left hand side of

1433

the display. For example if the total time it took for a given function

1434

to execute is above a certain threshold, an exclamation point or plus

1435

sign appears on the left hand side. Please see the ftrace documentation

1436

for details on all these fields.

1437

1438

The 'trace events' Subsystem

1439

----------------------------

1440

1441

One especially important directory contained within the

1442

/sys/kernel/debug/tracing directory is the 'events' subdirectory, which

1443

contains representations of every tracepoint in the system. Listing out

1444

the contents of the 'events' subdirectory, we see mainly another set of

1445

subdirectories: ::

1446

1447

root@sugarbay:/sys/kernel/debug/tracing# cd events

1448

root@sugarbay:/sys/kernel/debug/tracing/events# ls -al

1449

drwxr-xr-x 38 root root 0 Nov 14 23:19 .

1450

drwxr-xr-x 5 root root 0 Nov 14 23:19 ..

1451

drwxr-xr-x 19 root root 0 Nov 14 23:19 block

1452

drwxr-xr-x 32 root root 0 Nov 14 23:19 btrfs

1453

drwxr-xr-x 5 root root 0 Nov 14 23:19 drm

1454

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1455

drwxr-xr-x 40 root root 0 Nov 14 23:19 ext3

1456

drwxr-xr-x 79 root root 0 Nov 14 23:19 ext4

1457

drwxr-xr-x 14 root root 0 Nov 14 23:19 ftrace

1458

drwxr-xr-x 8 root root 0 Nov 14 23:19 hda

1459

-r--r--r-- 1 root root 0 Nov 14 23:19 header_event

1460

-r--r--r-- 1 root root 0 Nov 14 23:19 header_page

1461

drwxr-xr-x 25 root root 0 Nov 14 23:19 i915

1462

drwxr-xr-x 7 root root 0 Nov 14 23:19 irq

1463

drwxr-xr-x 12 root root 0 Nov 14 23:19 jbd

1464

drwxr-xr-x 14 root root 0 Nov 14 23:19 jbd2

1465

drwxr-xr-x 14 root root 0 Nov 14 23:19 kmem

1466

drwxr-xr-x 7 root root 0 Nov 14 23:19 module

1467

drwxr-xr-x 3 root root 0 Nov 14 23:19 napi

1468

drwxr-xr-x 6 root root 0 Nov 14 23:19 net

1469

drwxr-xr-x 3 root root 0 Nov 14 23:19 oom

1470

drwxr-xr-x 12 root root 0 Nov 14 23:19 power

1471

drwxr-xr-x 3 root root 0 Nov 14 23:19 printk

1472

drwxr-xr-x 8 root root 0 Nov 14 23:19 random

1473

drwxr-xr-x 4 root root 0 Nov 14 23:19 raw_syscalls

1474

drwxr-xr-x 3 root root 0 Nov 14 23:19 rcu

1475

drwxr-xr-x 6 root root 0 Nov 14 23:19 rpm

1476

drwxr-xr-x 20 root root 0 Nov 14 23:19 sched

1477

drwxr-xr-x 7 root root 0 Nov 14 23:19 scsi

1478

drwxr-xr-x 4 root root 0 Nov 14 23:19 signal

1479

drwxr-xr-x 5 root root 0 Nov 14 23:19 skb

1480

drwxr-xr-x 4 root root 0 Nov 14 23:19 sock

1481

drwxr-xr-x 10 root root 0 Nov 14 23:19 sunrpc

1482

drwxr-xr-x 538 root root 0 Nov 14 23:19 syscalls

1483

drwxr-xr-x 4 root root 0 Nov 14 23:19 task

1484

drwxr-xr-x 14 root root 0 Nov 14 23:19 timer

1485

drwxr-xr-x 3 root root 0 Nov 14 23:19 udp

1486

drwxr-xr-x 21 root root 0 Nov 14 23:19 vmscan

1487

drwxr-xr-x 3 root root 0 Nov 14 23:19 vsyscall

1488

drwxr-xr-x 6 root root 0 Nov 14 23:19 workqueue

1489

drwxr-xr-x 26 root root 0 Nov 14 23:19 writeback

1490

1491

Each one of these subdirectories

1492

corresponds to a 'subsystem' and contains yet again more subdirectories,

1493

each one of those finally corresponding to a tracepoint. For example,

1494

here are the contents of the 'kmem' subsystem: ::

1495

1496

root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem

1497

root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al

1498

drwxr-xr-x 14 root root 0 Nov 14 23:19 .

1499

drwxr-xr-x 38 root root 0 Nov 14 23:19 ..

1500

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1501

-rw-r--r-- 1 root root 0 Nov 14 23:19 filter

1502

drwxr-xr-x 2 root root 0 Nov 14 23:19 kfree

1503

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc

1504

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc_node

1505

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc

1506

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc_node

1507

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_free

1508

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc

1509

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_extfrag

1510

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_zone_locked

1511

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free

1512

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free_batched

1513

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_pcpu_drain

1514

1515

Let's see what's inside the subdirectory for a

1516

specific tracepoint, in this case the one for kmalloc: ::

1517

1518

root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc

1519

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al

1520

drwxr-xr-x 2 root root 0 Nov 14 23:19 .

1521

drwxr-xr-x 14 root root 0 Nov 14 23:19 ..

1522

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1523

-rw-r--r-- 1 root root 0 Nov 14 23:19 filter

1524

-r--r--r-- 1 root root 0 Nov 14 23:19 format

1525

-r--r--r-- 1 root root 0 Nov 14 23:19 id

1526

1527

The 'format' file for the

1528

tracepoint describes the event in memory, which is used by the various

1529

tracing tools that now make use of these tracepoint to parse the event

1530

and make sense of it, along with a 'print fmt' field that allows tools

1531

like ftrace to display the event as text. Here's what the format of the

1532

kmalloc event looks like: ::

1533

1534

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format

name: kmalloc

ID: 313

format:

field:unsigned short common_type; offset:0; size:2; signed:0;

1539

field:unsigned char common_flags; offset:2; size:1; signed:0;

1540

field:unsigned char common_preempt_count; offset:3; size:1; signed:0;

1541

field:int common_pid; offset:4; size:4; signed:1;

1542

field:int common_padding; offset:8; size:4; signed:1;

1543

1544

field:unsigned long call_site; offset:16; size:8; signed:0;

1545

field:const void * ptr; offset:24; size:8; signed:0;

1546

field:size_t bytes_req; offset:32; size:8; signed:0;

1547

field:size_t bytes_alloc; offset:40; size:8; signed:0;

1548

field:gfp_t gfp_flags; offset:48; size:4; signed:0;

1549

1550

print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc,

1551

(REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1552

1553

gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | ((

1554

gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1555

gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1556

gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"},

1557

{(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)),

1558

"GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)((

1559

gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)((

1560

gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned

1561

long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"}, {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"},

1562

{(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u),

1563

"GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned

1564

long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"},

1565

{(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT"

1566

1567

The 'enable' file

1568

in the tracepoint directory is what allows the user (or tools such as

1569

trace-cmd) to actually turn the tracepoint on and off. When enabled, the

1570

corresponding tracepoint will start appearing in the ftrace 'trace' file

1571

described previously. For example, this turns on the kmalloc tracepoint: ::

1572

1573

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable

1574

1575

At the moment, we're not interested in the function tracer or

1576

some other tracer that might be in effect, so we first turn it off, but

1577

if we do that, we still need to turn tracing on in order to see the

1578

events in the output buffer: ::

1579

1580

root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer

1581

root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on

1582

Andrew Geissler

3b8a17c

2021-04-15 15:55:55 -0500

[diff] [blame^]

1583

Now, if we look at the 'trace' file, we see nothing

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1584

but the kmalloc events we just turned on: ::

1585

1586

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

1587

# tracer: nop

1588

#

1589

# entries-in-buffer/entries-written: 1897/1897 #P:8

1590

#

1591

# _-----=> irqs-off

1592

# / _----=> need-resched

1593

# | / _---=> hardirq/softirq

1594

# || / _--=> preempt-depth

1595

# ||| / delay

1596

# TASK-PID CPU# |||| TIMESTAMP FUNCTION

1597

# | | | |||| | |

1598

dropbear-1465 [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1599

<idle>-0 [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1600

<idle>-0 [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1601

matchbox-termin-1361 [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT

1602

Xorg-1264 [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1603

Xorg-1264 [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1604

Xorg-1264 [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1605

matchbox-termin-1361 [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT

1606

Xorg-1264 [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1607

Xorg-1264 [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1608

Xorg-1264 [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1609

<idle>-0 [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1610

dropbear-1465 [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1611

<idle>-0 [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1612

<idle>-0 [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1613

dropbear-1465 [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1614

<idle>-0 [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1615

<idle>-0 [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1616

dropbear-1465 [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1617

<idle>-0 [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1618

matchbox-termin-1361 [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT

1619

Xorg-1264 [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1620

Xorg-1264 [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1621

Xorg-1264 [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1622

<idle>-0 [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1623

dropbear-1465 [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1624

<idle>-0 [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1625

<idle>-0 [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1626

dropbear-1465 [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1627

<idle>-0 [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1628

matchbox-termin-1361 [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT

1629

1630

To again disable the kmalloc event, we need to send 0 to the enable file: ::

1631

1632

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable

1633

1634

You can enable any number of events or complete subsystems (by

1635

using the 'enable' file in the subsystem directory) and get an

1636

arbitrarily fine-grained idea of what's going on in the system by

1637

enabling as many of the appropriate tracepoints as applicable.

1638

1639

A number of the tools described in this HOWTO do just that, including

1640

trace-cmd and kernelshark in the next section.

1641

1642

.. admonition:: Tying it Together

1643

1644

These tracepoints and their representation are used not only by

1645

ftrace, but by many of the other tools covered in this document and

1646

they form a central point of integration for the various tracers

1647

available in Linux. They form a central part of the instrumentation

1648

for the following tools: perf, lttng, ftrace, blktrace and SystemTap

1649

1650

.. admonition:: Tying it Together

1651

1652

Eventually all the special-purpose tracers currently available in

1653

/sys/kernel/debug/tracing will be removed and replaced with

1654

equivalent tracers based on the 'trace events' subsystem.

1655

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1656

trace-cmd/kernelshark

1657

---------------------

1658

1659

trace-cmd is essentially an extensive command-line 'wrapper' interface

1660

that hides the details of all the individual files in

1661

/sys/kernel/debug/tracing, allowing users to specify specific particular

1662

events within the /sys/kernel/debug/tracing/events/ subdirectory and to

1663

collect traces and avoid having to deal with those details directly.

1664

1665

As yet another layer on top of that, kernelshark provides a GUI that

1666

allows users to start and stop traces and specify sets of events using

1667

an intuitive interface, and view the output as both trace events and as

1668

a per-CPU graphical display. It directly uses 'trace-cmd' as the

1669

plumbing that accomplishes all that underneath the covers (and actually

1670

displays the trace-cmd command it uses, as we'll see).

1671

1672

To start a trace using kernelshark, first start kernelshark: ::

1673

1674

root@sugarbay:~# kernelshark

1675

1676

Then bring up the 'Capture' dialog by

1677

choosing from the kernelshark menu: ::

Capture | Record

That will display the following dialog, which allows you to choose one or more

1682

events (or even one or more complete subsystems) to trace:

1683

1684

.. image:: figures/kernelshark-choose-events.png

1685

:align: center

1686

1687

Note that these are exactly the same sets of events described in the

1688

previous trace events subsystem section, and in fact is where trace-cmd

1689

gets them for kernelshark.

1690

1691

In the above screenshot, we've decided to explore the graphics subsystem

1692

a bit and so have chosen to trace all the tracepoints contained within

1693

the 'i915' and 'drm' subsystems.

1694

1695

After doing that, we can start and stop the trace using the 'Run' and

1696

'Stop' button on the lower right corner of the dialog (the same button

1697

will turn into the 'Stop' button after the trace has started):

1698

1699

.. image:: figures/kernelshark-output-display.png

1700

:align: center

1701

1702

Notice that the right-hand pane shows the exact trace-cmd command-line

1703

that's used to run the trace, along with the results of the trace-cmd

1704

run.

1705

1706

Once the 'Stop' button is pressed, the graphical view magically fills up

1707

with a colorful per-cpu display of the trace data, along with the

1708

detailed event listing below that:

1709

1710

.. image:: figures/kernelshark-i915-display.png

1711

:align: center

1712

1713

Here's another example, this time a display resulting from tracing 'all

1714

events':

1715

1716

.. image:: figures/kernelshark-all.png

1717

:align: center

1718

1719

The tool is pretty self-explanatory, but for more detailed information

1720

on navigating through the data, see the `kernelshark

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1721

website <https://rostedt.homelinux.com/kernelshark/>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1722

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

ftrace Documentation

--------------------

The documentation for ftrace can be found in the kernel Documentation

1727

directory: ::

1728

1729

Documentation/trace/ftrace.txt

1730

1731

The documentation for the trace event subsystem can also be found in the kernel

1732

Documentation directory: ::

1733

1734

Documentation/trace/events.txt

1735

1736

There is a nice series of articles on using ftrace and trace-cmd at LWN:

1737

1738

- `Debugging the kernel using Ftrace - part

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1739

1 <https://lwn.net/Articles/365835/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1740

1741

- `Debugging the kernel using Ftrace - part

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1742

2 <https://lwn.net/Articles/366796/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1743

1744

- `Secrets of the Ftrace function

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1745

tracer <https://lwn.net/Articles/370423/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1746

1747

- `trace-cmd: A front-end for

1748

Ftrace <https://lwn.net/Articles/410200/>`__

1749

1750

There's more detailed documentation kernelshark usage here:

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1751

`KernelShark <https://rostedt.homelinux.com/kernelshark/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1752

1753

An amusing yet useful README (a tracing mini-HOWTO) can be found in

1754

``/sys/kernel/debug/tracing/README``.

1755

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

systemtap

=========

SystemTap is a system-wide script-based tracing and profiling tool.

1760

1761

SystemTap scripts are C-like programs that are executed in the kernel to

1762

gather/print/aggregate data extracted from the context they end up being

1763

invoked under.

1764

1765

For example, this probe from the `SystemTap

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1766

tutorial <https://sourceware.org/systemtap/tutorial/>`__ simply prints a

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1767

line every time any process on the system open()s a file. For each line,

1768

it prints the executable name of the program that opened the file, along

1769

with its PID, and the name of the file it opened (or tried to open),

1770

which it extracts from the open syscall's argstr.

.. code-block:: none

probe syscall.open

{

printf ("%s(%d) open (%s)\n", execname(), pid(), argstr)

1777

}

1778

1779

probe timer.ms(4000) # after 4 seconds

{

exit ()

}

Normally, to execute this

1785

probe, you'd simply install systemtap on the system you want to probe,

1786

and directly run the probe on that system e.g. assuming the name of the

1787

file containing the above text is trace_open.stp: ::

1788

1789

# stap trace_open.stp

1790

1791

What systemtap does under the covers to run this probe is 1) parse and

1792

convert the probe to an equivalent 'C' form, 2) compile the 'C' form

1793

into a kernel module, 3) insert the module into the kernel, which arms

1794

it, and 4) collect the data generated by the probe and display it to the

1795

user.

1796

1797

In order to accomplish steps 1 and 2, the 'stap' program needs access to

1798

the kernel build system that produced the kernel that the probed system

1799

is running. In the case of a typical embedded system (the 'target'), the

1800

kernel build system unfortunately isn't typically part of the image

1801

running on the target. It is normally available on the 'host' system

1802

that produced the target image however; in such cases, steps 1 and 2 are

1803

executed on the host system, and steps 3 and 4 are executed on the

1804

target system, using only the systemtap 'runtime'.

1805

1806

The systemtap support in Yocto assumes that only steps 3 and 4 are run

1807

on the target; it is possible to do everything on the target, but this

1808

section assumes only the typical embedded use-case.

1809

1810

So basically what you need to do in order to run a systemtap script on

1811

the target is to 1) on the host system, compile the probe into a kernel

1812

module that makes sense to the target, 2) copy the module onto the

1813

target system and 3) insert the module into the target kernel, which

1814

arms it, and 4) collect the data generated by the probe and display it

1815

to the user.

1816

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

systemtap Setup

---------------

Those are a lot of steps and a lot of details, but fortunately Yocto

1821

includes a script called 'crosstap' that will take care of those

1822

details, allowing you to simply execute a systemtap script on the remote

1823

target, with arguments if necessary.

1824

1825

In order to do this from a remote host, however, you need to have access

1826

to the build for the image you booted. The 'crosstap' script provides

1827

details on how to do this if you run the script on the host without

1828

having done a build: ::

1829

1830

$ crosstap root@192.168.1.88 trace_open.stp

1831

1832

Error: No target kernel build found.

1833

Did you forget to create a local build of your image?

1834

1835

'crosstap' requires a local sdk build of the target system

1836

(or a build that includes 'tools-profile') in order to build

1837

kernel modules that can probe the target system.

1838

1839

Practically speaking, that means you need to do the following:

1840

- If you're running a pre-built image, download the release

1841

and/or BSP tarballs used to build the image.

1842

- If you're working from git sources, just clone the metadata

1843

and BSP layers needed to build the image you'll be booting.

1844

- Make sure you're properly set up to build a new image (see

1845

the BSP README and/or the widely available basic documentation

1846

that discusses how to build images).

1847

- Build an -sdk version of the image e.g.:

1848

$ bitbake core-image-sato-sdk

1849

OR

1850

- Build a non-sdk image but include the profiling tools:

1851

[ edit local.conf and add 'tools-profile' to the end of

1852

the EXTRA_IMAGE_FEATURES variable ]

1853

$ bitbake core-image-sato

1854

1855

Once you've build the image on the host system, you're ready to

1856

boot it (or the equivalent pre-built image) and use 'crosstap'

1857

to probe it (you need to source the environment as usual first):

1858

1859

$ source oe-init-build-env

1860

$ cd ~/my/systemtap/scripts

1861

$ crosstap root@192.168.1.xxx myscript.stp

.. note::

SystemTap, which uses 'crosstap', assumes you can establish an ssh

1866

connection to the remote target. Please refer to the crosstap wiki

1867

page for details on verifying ssh connections at

1868

. Also, the ability to ssh into the target system is not enabled by

1869

default in \*-minimal images.

1870

1871

So essentially what you need to

1872

do is build an SDK image or image with 'tools-profile' as detailed in

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

1873

the ":ref:`profile-manual/intro:General Setup`" section of this

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1874

manual, and boot the resulting target image.

.. note::

If you have a build directory containing multiple machines, you need

1879

to have the MACHINE you're connecting to selected in local.conf, and

1880

the kernel in that machine's build directory must match the kernel on

1881

the booted system exactly, or you'll get the above 'crosstap' message

1882

when you try to invoke a script.

1883

1884

Running a Script on a Target

1885

----------------------------

1886

1887

Once you've done that, you should be able to run a systemtap script on

the target: ::

$ cd /path/to/yocto

$ source oe-init-build-env

1892

1893

### Shell environment set up for builds. ###

1894

1895

You can now run 'bitbake <target>'

Common targets are:

core-image-minimal

core-image-sato

meta-toolchain

meta-ide-support

Andrew Geissler

2021-04-15 15:55:55 -0500

[diff] [blame^]

1903

You can also run generated QEMU images with a command like 'runqemu qemux86-64'

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1904

1905

Once you've done that, you can cd to whatever

1906

directory contains your scripts and use 'crosstap' to run the script: ::

1907

1908

$ cd /path/to/my/systemap/script

1909

$ crosstap root@192.168.7.2 trace_open.stp

1910

1911

If you get an error connecting to the target e.g.: ::

1912

1913

$ crosstap root@192.168.7.2 trace_open.stp

1914

error establishing ssh connection on remote 'root@192.168.7.2'

1915

1916

Try ssh'ing to the target and see what happens: ::

1917

1918

$ ssh root@192.168.7.2

1919

1920

A lot of the time, connection

1921

problems are due specifying a wrong IP address or having a 'host key

1922

verification error'.

1923

1924

If everything worked as planned, you should see something like this

1925

(enter the password when prompted, or press enter if it's set up to use

no password):

.. code-block:: none

$ crosstap root@192.168.7.2 trace_open.stp

1931

root@192.168.7.2's password:

1932

matchbox-termin(1036) open ("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)

1933

matchbox-termin(1036) open ("/tmp/vteJMC7LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)

1934

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1935

systemtap Documentation

1936

-----------------------

1937

1938

The SystemTap language reference can be found here: `SystemTap Language

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1939

Reference <https://sourceware.org/systemtap/langref/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1940

1941

Links to other SystemTap documents, tutorials, and examples can be found

1942

here: `SystemTap documentation

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

1943

page <https://sourceware.org/systemtap/documentation.html>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1944

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Sysprof

=======

Sysprof is a very easy to use system-wide profiler that consists of a

1949

single window with three panes and a few buttons which allow you to

1950

start, stop, and view the profile from one place.

1951

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Sysprof Setup

-------------

For this section, we'll assume you've already performed the basic setup

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

1956

outlined in the ":ref:`profile-manual/intro:General Setup`" section.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1957

1958

Sysprof is a GUI-based application that runs on the target system. For

1959

the rest of this document we assume you've ssh'ed to the host and will

1960

be running Sysprof on the target (you can use the '-X' option to ssh and

1961

have the Sysprof GUI run on the target but display remotely on the host

1962

if you want).

1963

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Basic Sysprof Usage

-------------------

To start profiling the system, you simply press the 'Start' button. To

1968

stop profiling and to start viewing the profile data in one easy step,

1969

press the 'Profile' button.

1970

1971

Once you've pressed the profile button, the three panes will fill up

1972

with profiling data:

1973

1974

.. image:: figures/sysprof-copy-to-user.png

1975

:align: center

1976

1977

The left pane shows a list of functions and processes. Selecting one of

1978

those expands that function in the right pane, showing all its callees.

1979

Note that this caller-oriented display is essentially the inverse of

1980

perf's default callee-oriented callchain display.

1981

1982

In the screenshot above, we're focusing on ``__copy_to_user_ll()`` and

1983

looking up the callchain we can see that one of the callers of

1984

``__copy_to_user_ll`` is sys_read() and the complete callpath between them.

1985

Notice that this is essentially a portion of the same information we saw

1986

in the perf display shown in the perf section of this page.

1987

1988

.. image:: figures/sysprof-copy-from-user.png

1989

:align: center

1990

1991

Similarly, the above is a snapshot of the Sysprof display of a

1992

copy-from-user callchain.

1993

1994

Finally, looking at the third Sysprof pane in the lower left, we can see

1995

a list of all the callers of a particular function selected in the top

1996

left pane. In this case, the lower pane is showing all the callers of

1997

``__mark_inode_dirty``:

1998

1999

.. image:: figures/sysprof-callers.png

2000

:align: center

2001

2002

Double-clicking on one of those functions will in turn change the focus

2003

to the selected function, and so on.

2004

2005

.. admonition:: Tying it Together

2006

2007

If you like sysprof's 'caller-oriented' display, you may be able to

2008

approximate it in other tools as well. For example, 'perf report' has

2009

the -g (--call-graph) option that you can experiment with; one of the

2010

options is 'caller' for an inverted caller-based callgraph display.

2011

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2012

Sysprof Documentation

2013

---------------------

2014

2015

There doesn't seem to be any documentation for Sysprof, but maybe that's

2016

because it's pretty self-explanatory. The Sysprof website, however, is

2017

here: `Sysprof, System-wide Performance Profiler for

2018

Linux <http://sysprof.com/>`__

2019

2020

LTTng (Linux Trace Toolkit, next generation)

2021

============================================

2022

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

LTTng Setup

-----------

For this section, we'll assume you've already performed the basic setup

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2027

outlined in the ":ref:`profile-manual/intro:General Setup`" section.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2028

LTTng is run on the target system by ssh'ing to it.

2029

2030

Collecting and Viewing Traces

2031

-----------------------------

2032

2033

Once you've applied the above commits and built and booted your image

2034

(you need to build the core-image-sato-sdk image or use one of the other

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2035

methods described in the ":ref:`profile-manual/intro:General Setup`" section), you're ready to start

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2036

tracing.

2037

2038

Collecting and viewing a trace on the target (inside a shell)

2039

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2040

2041

First, from the host, ssh to the target: ::

2042

2043

$ ssh -l root 192.168.1.47

2044

The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.

2045

RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.

2046

Are you sure you want to continue connecting (yes/no)? yes

2047

Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.

2048

root@192.168.1.47's password:

2049

2050

Once on the target, use these steps to create a trace: ::

2051

2052

root@crownbay:~# lttng create

2053

Spawning a session daemon

2054

Session auto-20121015-232120 created.

2055

Traces will be written in /home/root/lttng-traces/auto-20121015-232120

2056

2057

Enable the events you want to trace (in this case all kernel events): ::

2058

2059

root@crownbay:~# lttng enable-event --kernel --all

2060

All kernel events are enabled in channel channel0

Start the trace: ::

root@crownbay:~# lttng start

2065

Tracing started for session auto-20121015-232120

2066

2067

And then stop the trace after awhile or after running a particular workload that

2068

you want to trace: ::

2069

2070

root@crownbay:~# lttng stop

2071

Tracing stopped for session auto-20121015-232120

2072

2073

You can now view the trace in text form on the target: ::

2074

2075

root@crownbay:~# lttng view

2076

[23:21:56.989270399] (+?.?????????) sys_geteuid: { 1 }, { }

2077

[23:21:56.989278081] (+0.000007682) exit_syscall: { 1 }, { ret = 0 }

2078

[23:21:56.989286043] (+0.000007962) sys_pipe: { 1 }, { fildes = 0xB77B9E8C }

2079

[23:21:56.989321802] (+0.000035759) exit_syscall: { 1 }, { ret = 0 }

2080

[23:21:56.989329345] (+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot = 3, flags = 131362, fd = 4294967295, pgoff = 0 }

2081

[23:21:56.989351694] (+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 }

2082

[23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags = 0x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 }

2083

[23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388 }

2084

[23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm = "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 }

2085

[23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer = 3970832076, clockid = 1, mode = 1 }

2086

[23:21:56.989770462] (+0.000262044) hrtimer_cancel: { 1 }, { hrtimer = 3993865440 }

2087

[23:21:56.989771580] (+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 }

2088

[23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, { hrtimer = 3993865440, now = 79815980007057, function = 3238465232 }

2089

[23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, { hrtimer = 3993812192, now = 79815980008174, function = 3238465232 }

2090

[23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 }

2091

[23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 }

2092

[23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 }

2093

[23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098 }

2094

[23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime = 36976733240 }

2095

[23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: { 0 }, { hrtimer = 3993812192 }

2096

[23:21:56.989819631] (+0.000001117) hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 }

2097

[23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer = 3993812192, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }

2098

[23:21:56.989822984] (+0.000001118) hrtimer_start: { 1 }, { hrtimer = 3993865440, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }

2099

[23:21:56.989832762] (+0.000009778) softirq_entry: { 1 }, { vec = 1 }

2100

[23:21:56.989833879] (+0.000001117) softirq_entry: { 0 }, { vec = 1 }

2101

[23:21:56.989838069] (+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 }

2102

[23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer = 3993818708 }

2103

[23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1 }, { timer = 3993871956, now = 79515980, function = 3238277552 }

2104

[23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer = 3993818708, now = 79515980, function = 3238277552 }

2105

[23:21:56.989854831] (+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 49237, vruntime = 43368363335 }

2106

[23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 }

2107

[23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm = "kworker/1:1", tid = 21, delay = 9451318 }

2108

[23:21:56.989862374] (+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4, delay = 9958820 }

2109

[23:21:56.989868241] (+0.000005867) sched_wakeup: { 0 }, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu = 0 }

2110

[23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm = "kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 }

2111

[23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer = 3993871956 }

2112

[23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0 }, { timer = 3993818708 }

.

.

.

You can now safely destroy the trace

2118

session (note that this doesn't delete the trace - it's still there in

2119

~/lttng-traces): ::

2120

2121

root@crownbay:~# lttng destroy

2122

Session auto-20121015-232120 destroyed at /home/root

2123

2124

Note that the trace is saved in a directory of the same name as returned by

2125

'lttng create', under the ~/lttng-traces directory (note that you can change this by

2126

supplying your own name to 'lttng create'): ::

2127

2128

root@crownbay:~# ls -al ~/lttng-traces

2129

drwxrwx--- 3 root root 1024 Oct 15 23:21 .

2130

drwxr-xr-x 5 root root 1024 Oct 15 23:57 ..

2131

drwxrwx--- 3 root root 1024 Oct 15 23:21 auto-20121015-232120

2132

2133

Collecting and viewing a userspace trace on the target (inside a shell)

2134

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2135

2136

For LTTng userspace tracing, you need to have a properly instrumented

2137

userspace program. For this example, we'll use the 'hello' test program

2138

generated by the lttng-ust build.

2139

2140

The 'hello' test program isn't installed on the rootfs by the lttng-ust

2141

build, so we need to copy it over manually. First cd into the build

2142

directory that contains the hello executable: ::

2143

2144

$ cd build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs

2145

2146

Copy that over to the target machine: ::

2147

2148

$ scp hello root@192.168.1.20:

2149

2150

You now have the instrumented lttng 'hello world' test program on the

2151

target, ready to test.

2152

2153

First, from the host, ssh to the target: ::

2154

2155

$ ssh -l root 192.168.1.47

2156

The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.

2157

RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.

2158

Are you sure you want to continue connecting (yes/no)? yes

2159

Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.

2160

root@192.168.1.47's password:

2161

2162

Once on the target, use these steps to create a trace: ::

2163

2164

root@crownbay:~# lttng create

2165

Session auto-20190303-021943 created.

2166

Traces will be written in /home/root/lttng-traces/auto-20190303-021943

2167

2168

Enable the events you want to trace (in this case all userspace events): ::

2169

2170

root@crownbay:~# lttng enable-event --userspace --all

2171

All UST events are enabled in channel channel0

Start the trace: ::

root@crownbay:~# lttng start

2176

Tracing started for session auto-20190303-021943

2177

2178

Run the instrumented hello world program: ::

2179

2180

root@crownbay:~# ./hello

Hello, World!

Tracing... done.

And then stop the trace after awhile or after running a particular workload

2185

that you want to trace: ::

2186

2187

root@crownbay:~# lttng stop

2188

Tracing stopped for session auto-20190303-021943

2189

2190

You can now view the trace in text form on the target: ::

2191

2192

root@crownbay:~# lttng view

2193

[02:31:14.906146544] (+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0, netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2194

[02:31:14.906170360] (+0.000023816) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1, longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2195

[02:31:14.906183140] (+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2, netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2196

[02:31:14.906194385] (+0.000011245) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3, longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

.

.

.

You can now safely destroy the trace session (note that this doesn't delete the

2202

trace - it's still there in ~/lttng-traces): ::

2203

2204

root@crownbay:~# lttng destroy

2205

Session auto-20190303-021943 destroyed at /home/root

2206

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

LTTng Documentation

-------------------

You can find the primary LTTng Documentation on the `LTTng

2211

Documentation <https://lttng.org/docs/>`__ site. The documentation on

2212

this site is appropriate for intermediate to advanced software

2213

developers who are working in a Linux environment and are interested in

2214

efficient software tracing.

2215

2216

For information on LTTng in general, visit the `LTTng

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

2217

Project <https://lttng.org/lttng2.0>`__ site. You can find a "Getting

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2218

Started" link on this site that takes you to an LTTng Quick Start.

2219

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

blktrace

========

blktrace is a tool for tracing and reporting low-level disk I/O.

2224

blktrace provides the tracing half of the equation; its output can be

2225

piped into the blkparse program, which renders the data in a

2226

human-readable form and does some basic analysis:

2227

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

blktrace Setup

--------------

For this section, we'll assume you've already performed the basic setup

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2232

outlined in the ":ref:`profile-manual/intro:General Setup`"

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2233

section.

2234

2235

blktrace is an application that runs on the target system. You can run

2236

the entire blktrace and blkparse pipeline on the target, or you can run

2237

blktrace in 'listen' mode on the target and have blktrace and blkparse

2238

collect and analyze the data on the host (see the

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2239

":ref:`profile-manual/usage:Using blktrace Remotely`" section

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2240

below). For the rest of this section we assume you've ssh'ed to the host and

2241

will be running blkrace on the target.

2242

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Basic blktrace Usage

--------------------

To record a trace, simply run the 'blktrace' command, giving it the name

2247

of the block device you want to trace activity on: ::

2248

2249

root@crownbay:~# blktrace /dev/sdc

2250

2251

In another shell, execute a workload you want to trace. ::

2252

2253

root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2; sync

2254

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

2255

linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA

2256

2257

Press Ctrl-C in the blktrace shell to stop the trace. It

2258

will display how many events were logged, along with the per-cpu file

2259

sizes (blktrace records traces in per-cpu kernel buffers and simply

2260

dumps them to userspace for blkparse to merge and sort later). ::

2261

2262

^C=== sdc ===

2263

CPU 0: 7082 events, 332 KiB data

2264

CPU 1: 1578 events, 74 KiB data

2265

Total: 8660 events (dropped 0), 406 KiB data

2266

2267

If you examine the files saved to disk, you see multiple files, one per CPU and

2268

with the device name as the first part of the filename: ::

2269

2270

root@crownbay:~# ls -al

2271

drwxr-xr-x 6 root root 1024 Oct 27 22:39 .

2272

drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..

2273

-rw-r--r-- 1 root root 339938 Oct 27 22:40 sdc.blktrace.0

2274

-rw-r--r-- 1 root root 75753 Oct 27 22:40 sdc.blktrace.1

2275

2276

To view the trace events, simply invoke 'blkparse' in the directory

2277

containing the trace files, giving it the device name that forms the

2278

first part of the filenames: ::

2279

2280

root@crownbay:~# blkparse sdc

2281

2282

8,32 1 1 0.000000000 1225 Q WS 3417048 + 8 [jbd2/sdc-8]

2283

8,32 1 2 0.000025213 1225 G WS 3417048 + 8 [jbd2/sdc-8]

2284

8,32 1 3 0.000033384 1225 P N [jbd2/sdc-8]

2285

8,32 1 4 0.000043301 1225 I WS 3417048 + 8 [jbd2/sdc-8]

2286

8,32 1 0 0.000057270 0 m N cfq1225 insert_request

2287

8,32 1 0 0.000064813 0 m N cfq1225 add_to_rr

2288

8,32 1 5 0.000076336 1225 U N [jbd2/sdc-8] 1

2289

8,32 1 0 0.000088559 0 m N cfq workload slice:150

2290

8,32 1 0 0.000097359 0 m N cfq1225 set_active wl_prio:0 wl_type:1

2291

8,32 1 0 0.000104063 0 m N cfq1225 Not idling. st->count:1

2292

8,32 1 0 0.000112584 0 m N cfq1225 fifo= (null)

2293

8,32 1 0 0.000118730 0 m N cfq1225 dispatch_insert

2294

8,32 1 0 0.000127390 0 m N cfq1225 dispatched a request

2295

8,32 1 0 0.000133536 0 m N cfq1225 activate rq, drv=1

2296

8,32 1 6 0.000136889 1225 D WS 3417048 + 8 [jbd2/sdc-8]

2297

8,32 1 7 0.000360381 1225 Q WS 3417056 + 8 [jbd2/sdc-8]

2298

8,32 1 8 0.000377422 1225 G WS 3417056 + 8 [jbd2/sdc-8]

2299

8,32 1 9 0.000388876 1225 P N [jbd2/sdc-8]

2300

8,32 1 10 0.000397886 1225 Q WS 3417064 + 8 [jbd2/sdc-8]

2301

8,32 1 11 0.000404800 1225 M WS 3417064 + 8 [jbd2/sdc-8]

2302

8,32 1 12 0.000412343 1225 Q WS 3417072 + 8 [jbd2/sdc-8]

2303

8,32 1 13 0.000416533 1225 M WS 3417072 + 8 [jbd2/sdc-8]

2304

8,32 1 14 0.000422121 1225 Q WS 3417080 + 8 [jbd2/sdc-8]

2305

8,32 1 15 0.000425194 1225 M WS 3417080 + 8 [jbd2/sdc-8]

2306

8,32 1 16 0.000431968 1225 Q WS 3417088 + 8 [jbd2/sdc-8]

2307

8,32 1 17 0.000435251 1225 M WS 3417088 + 8 [jbd2/sdc-8]

2308

8,32 1 18 0.000440279 1225 Q WS 3417096 + 8 [jbd2/sdc-8]

2309

8,32 1 19 0.000443911 1225 M WS 3417096 + 8 [jbd2/sdc-8]

2310

8,32 1 20 0.000450336 1225 Q WS 3417104 + 8 [jbd2/sdc-8]

2311

8,32 1 21 0.000454038 1225 M WS 3417104 + 8 [jbd2/sdc-8]

2312

8,32 1 22 0.000462070 1225 Q WS 3417112 + 8 [jbd2/sdc-8]

2313

8,32 1 23 0.000465422 1225 M WS 3417112 + 8 [jbd2/sdc-8]

2314

8,32 1 24 0.000474222 1225 I WS 3417056 + 64 [jbd2/sdc-8]

2315

8,32 1 0 0.000483022 0 m N cfq1225 insert_request

2316

8,32 1 25 0.000489727 1225 U N [jbd2/sdc-8] 1

2317

8,32 1 0 0.000498457 0 m N cfq1225 Not idling. st->count:1

2318

8,32 1 0 0.000503765 0 m N cfq1225 dispatch_insert

2319

8,32 1 0 0.000512914 0 m N cfq1225 dispatched a request

2320

8,32 1 0 0.000518851 0 m N cfq1225 activate rq, drv=2

.

.

.

8,32 0 0 58.515006138 0 m N cfq3551 complete rqnoidle 1

2325

8,32 0 2024 58.516603269 3 C WS 3156992 + 16 [0]

2326

8,32 0 0 58.516626736 0 m N cfq3551 complete rqnoidle 1

2327

8,32 0 0 58.516634558 0 m N cfq3551 arm_idle: 8 group_idle: 0

2328

8,32 0 0 58.516636933 0 m N cfq schedule dispatch

2329

8,32 1 0 58.516971613 0 m N cfq3551 slice expired t=0

2330

8,32 1 0 58.516982089 0 m N cfq3551 sl_used=13 disp=6 charge=13 iops=0 sect=80

2331

8,32 1 0 58.516985511 0 m N cfq3551 del_from_rr

2332

8,32 1 0 58.516990819 0 m N cfq3551 put_queue

2333

2334

CPU0 (sdc):

2335

Reads Queued: 0, 0KiB Writes Queued: 331, 26,284KiB

2336

Read Dispatches: 0, 0KiB Write Dispatches: 485, 40,484KiB

2337

Reads Requeued: 0 Writes Requeued: 0

2338

Reads Completed: 0, 0KiB Writes Completed: 511, 41,000KiB

2339

Read Merges: 0, 0KiB Write Merges: 13, 160KiB

2340

Read depth: 0 Write depth: 2

2341

IO unplugs: 23 Timer unplugs: 0

2342

CPU1 (sdc):

2343

Reads Queued: 0, 0KiB Writes Queued: 249, 15,800KiB

2344

Read Dispatches: 0, 0KiB Write Dispatches: 42, 1,600KiB

2345

Reads Requeued: 0 Writes Requeued: 0

2346

Reads Completed: 0, 0KiB Writes Completed: 16, 1,084KiB

2347

Read Merges: 0, 0KiB Write Merges: 40, 276KiB

2348

Read depth: 0 Write depth: 2

2349

IO unplugs: 30 Timer unplugs: 1

2350

2351

Total (sdc):

2352

Reads Queued: 0, 0KiB Writes Queued: 580, 42,084KiB

2353

Read Dispatches: 0, 0KiB Write Dispatches: 527, 42,084KiB

2354

Reads Requeued: 0 Writes Requeued: 0

2355

Reads Completed: 0, 0KiB Writes Completed: 527, 42,084KiB

2356

Read Merges: 0, 0KiB Write Merges: 53, 436KiB

2357

IO unplugs: 53 Timer unplugs: 1

2358

2359

Throughput (R/W): 0KiB/s / 719KiB/s

2360

Events (sdc): 6,592 entries

2361

Skips: 0 forward (0 - 0.0%)

2362

Input file sdc.blktrace.0 added

2363

Input file sdc.blktrace.1 added

2364

2365

The report shows each event that was

2366

found in the blktrace data, along with a summary of the overall block

2367

I/O traffic during the run. You can look at the

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

2368

`blkparse <https://linux.die.net/man/1/blkparse>`__ manpage to learn the

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2369

meaning of each field displayed in the trace listing.

2370

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Live Mode

~~~~~~~~~

blktrace and blkparse are designed from the ground up to be able to

2375

operate together in a 'pipe mode' where the stdout of blktrace can be

2376

fed directly into the stdin of blkparse: ::

2377

2378

root@crownbay:~# blktrace /dev/sdc -o - | blkparse -i -

2379

2380

This enables long-lived tracing sessions

2381

to run without writing anything to disk, and allows the user to look for

2382

certain conditions in the trace data in 'real-time' by viewing the trace

2383

output as it scrolls by on the screen or by passing it along to yet

2384

another program in the pipeline such as grep which can be used to

2385

identify and capture conditions of interest.

2386

2387

There's actually another blktrace command that implements the above

2388

pipeline as a single command, so the user doesn't have to bother typing

2389

in the above command sequence: ::

2390

2391

root@crownbay:~# btrace /dev/sdc

2392

2393

Using blktrace Remotely

2394

~~~~~~~~~~~~~~~~~~~~~~~

2395

2396

Because blktrace traces block I/O and at the same time normally writes

2397

its trace data to a block device, and in general because it's not really

2398

a great idea to make the device being traced the same as the device the

2399

tracer writes to, blktrace provides a way to trace without perturbing

2400

the traced device at all by providing native support for sending all

2401

trace data over the network.

2402

2403

To have blktrace operate in this mode, start blktrace on the target

2404

system being traced with the -l option, along with the device to trace: ::

2405

2406

root@crownbay:~# blktrace -l /dev/sdc

2407

server: waiting for connections...

2408

2409

On the host system, use the -h option to connect to the target system,

2410

also passing it the device to trace: ::

2411

2412

$ blktrace -d /dev/sdc -h 192.168.1.43

2413

blktrace: connecting to 192.168.1.43

2414

blktrace: connected!

2415

2416

On the target system, you should see this: ::

2417

2418

server: connection from 192.168.1.43

2419

2420

In another shell, execute a workload you want to trace. ::

2421

2422

root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2; sync

2423

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

2424

linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA

2425

2426

When it's done, do a Ctrl-C on the host system to stop the

trace: ::

^C=== sdc ===

CPU 0: 7691 events, 361 KiB data

2431

CPU 1: 4109 events, 193 KiB data

2432

Total: 11800 events (dropped 0), 554 KiB data

2433

2434

On the target system, you should also see a trace summary for the trace

2435

just ended: ::

2436

2437

server: end of run for 192.168.1.43:sdc

2438

=== sdc ===

2439

CPU 0: 7691 events, 361 KiB data

2440

CPU 1: 4109 events, 193 KiB data

2441

Total: 11800 events (dropped 0), 554 KiB data

2442

2443

The blktrace instance on the host will

2444

save the target output inside a hostname-timestamp directory: ::

2445

2446

$ ls -al

2447

drwxr-xr-x 10 root root 1024 Oct 28 02:40 .

2448

drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..

2449

drwxr-xr-x 2 root root 1024 Oct 28 02:40 192.168.1.43-2012-10-28-02:40:56

2450

2451

cd into that directory to see the output files: ::

2452

2453

$ ls -l

2454

-rw-r--r-- 1 root root 369193 Oct 28 02:44 sdc.blktrace.0

2455

-rw-r--r-- 1 root root 197278 Oct 28 02:44 sdc.blktrace.1

2456

2457

And run blkparse on the host system using the device name: ::

$ blkparse sdc

8,32 1 1 0.000000000 1263 Q RM 6016 + 8 [ls]

2462

8,32 1 0 0.000036038 0 m N cfq1263 alloced

2463

8,32 1 2 0.000039390 1263 G RM 6016 + 8 [ls]

2464

8,32 1 3 0.000049168 1263 I RM 6016 + 8 [ls]

2465

8,32 1 0 0.000056152 0 m N cfq1263 insert_request

2466

8,32 1 0 0.000061600 0 m N cfq1263 add_to_rr

2467

8,32 1 0 0.000075498 0 m N cfq workload slice:300

.

.

.

8,32 0 0 177.266385696 0 m N cfq1267 arm_idle: 8 group_idle: 0

2472

8,32 0 0 177.266388140 0 m N cfq schedule dispatch

2473

8,32 1 0 177.266679239 0 m N cfq1267 slice expired t=0

2474

8,32 1 0 177.266689297 0 m N cfq1267 sl_used=9 disp=6 charge=9 iops=0 sect=56

2475

8,32 1 0 177.266692649 0 m N cfq1267 del_from_rr

2476

8,32 1 0 177.266696560 0 m N cfq1267 put_queue

2477

2478

CPU0 (sdc):

2479

Reads Queued: 0, 0KiB Writes Queued: 270, 21,708KiB

2480

Read Dispatches: 59, 2,628KiB Write Dispatches: 495, 39,964KiB

2481

Reads Requeued: 0 Writes Requeued: 0

2482

Reads Completed: 90, 2,752KiB Writes Completed: 543, 41,596KiB

2483

Read Merges: 0, 0KiB Write Merges: 9, 344KiB

2484

Read depth: 2 Write depth: 2

2485

IO unplugs: 20 Timer unplugs: 1

2486

CPU1 (sdc):

2487

Reads Queued: 688, 2,752KiB Writes Queued: 381, 20,652KiB

2488

Read Dispatches: 31, 124KiB Write Dispatches: 59, 2,396KiB

2489

Reads Requeued: 0 Writes Requeued: 0

2490

Reads Completed: 0, 0KiB Writes Completed: 11, 764KiB

2491

Read Merges: 598, 2,392KiB Write Merges: 88, 448KiB

2492

Read depth: 2 Write depth: 2

2493

IO unplugs: 52 Timer unplugs: 0

2494

2495

Total (sdc):

2496

Reads Queued: 688, 2,752KiB Writes Queued: 651, 42,360KiB

2497

Read Dispatches: 90, 2,752KiB Write Dispatches: 554, 42,360KiB

2498

Reads Requeued: 0 Writes Requeued: 0

2499

Reads Completed: 90, 2,752KiB Writes Completed: 554, 42,360KiB

2500

Read Merges: 598, 2,392KiB Write Merges: 97, 792KiB

2501

IO unplugs: 72 Timer unplugs: 1

2502

2503

Throughput (R/W): 15KiB/s / 238KiB/s

2504

Events (sdc): 9,301 entries

2505

Skips: 0 forward (0 - 0.0%)

2506

2507

You should see the trace events and summary just as you would have if you'd run

2508

the same command on the target.

2509

2510

Tracing Block I/O via 'ftrace'

2511

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2512

2513

It's also possible to trace block I/O using only

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2514

:ref:`profile-manual/usage:The 'trace events' Subsystem`, which

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2515

can be useful for casual tracing if you don't want to bother dealing with the

2516

userspace tools.

2517

2518

To enable tracing for a given device, use /sys/block/xxx/trace/enable,

2519

where xxx is the device name. This for example enables tracing for

2520

/dev/sdc: ::

2521

2522

root@crownbay:/sys/kernel/debug/tracing# echo 1 > /sys/block/sdc/trace/enable

2523

2524

Once you've selected the device(s) you want

2525

to trace, selecting the 'blk' tracer will turn the blk tracer on: ::

2526

2527

root@crownbay:/sys/kernel/debug/tracing# cat available_tracers

2528

blk function_graph function nop

2529

2530

root@crownbay:/sys/kernel/debug/tracing# echo blk > current_tracer

2531

2532

Execute the workload you're interested in: ::

2533

2534

root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt

2535

2536

And look at the output (note here that we're using 'trace_pipe' instead of

2537

trace to capture this trace - this allows us to wait around on the pipe

2538

for data to appear): ::

2539

2540

root@crownbay:/sys/kernel/debug/tracing# cat trace_pipe

2541

cat-3587 [001] d..1 3023.276361: 8,32 Q R 1699848 + 8 [cat]

2542

cat-3587 [001] d..1 3023.276410: 8,32 m N cfq3587 alloced

2543

cat-3587 [001] d..1 3023.276415: 8,32 G R 1699848 + 8 [cat]

2544

cat-3587 [001] d..1 3023.276424: 8,32 P N [cat]

2545

cat-3587 [001] d..2 3023.276432: 8,32 I R 1699848 + 8 [cat]

2546

cat-3587 [001] d..1 3023.276439: 8,32 m N cfq3587 insert_request

2547

cat-3587 [001] d..1 3023.276445: 8,32 m N cfq3587 add_to_rr

2548

cat-3587 [001] d..2 3023.276454: 8,32 U N [cat] 1

2549

cat-3587 [001] d..1 3023.276464: 8,32 m N cfq workload slice:150

2550

cat-3587 [001] d..1 3023.276471: 8,32 m N cfq3587 set_active wl_prio:0 wl_type:2

2551

cat-3587 [001] d..1 3023.276478: 8,32 m N cfq3587 fifo= (null)

2552

cat-3587 [001] d..1 3023.276483: 8,32 m N cfq3587 dispatch_insert

2553

cat-3587 [001] d..1 3023.276490: 8,32 m N cfq3587 dispatched a request

2554

cat-3587 [001] d..1 3023.276497: 8,32 m N cfq3587 activate rq, drv=1

2555

cat-3587 [001] d..2 3023.276500: 8,32 D R 1699848 + 8 [cat]

2556

2557

And this turns off tracing for the specified device: ::

2558

2559

root@crownbay:/sys/kernel/debug/tracing# echo 0 > /sys/block/sdc/trace/enable

2560

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2561

blktrace Documentation

2562

----------------------

2563

2564

Online versions of the man pages for the commands discussed in this

2565

section can be found here:

2566

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

2567

- https://linux.die.net/man/8/blktrace

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2568

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

2569

- https://linux.die.net/man/1/blkparse

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2570

Andrew Geissler

2021-02-12 15:35:20 -0600

[diff] [blame]

2571

- https://linux.die.net/man/8/btrace

Andrew Geissler