Blame - poky/documentation/profile-manual/usage.rst - openbmc/openbmc

2021-02-12 15:55:30 -0600

[diff] [blame^]

863

binding <http://linux.die.net/man/1/perf-script-python>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

864

865

System-Wide Tracing and Profiling

866

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

867

868

The examples so far have focused on tracing a particular program or

869

workload - in other words, every profiling run has specified the program

870

to profile in the command-line e.g. 'perf record wget ...'.

871

872

It's also possible, and more interesting in many cases, to run a

873

system-wide profile or trace while running the workload in a separate

874

shell.

875

876

To do system-wide profiling or tracing, you typically use the -a flag to

877

'perf record'.

878

879

To demonstrate this, open up one window and start the profile using the

880

-a flag (press Ctrl-C to stop tracing): ::

881

882

root@crownbay:~# perf record -g -a

883

^C[ perf record: Woken up 6 times to write data ]

884

[ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ]

885

886

In another window, run the wget test: ::

887

888

root@crownbay:~# wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2

889

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

890

linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA

891

892

Here we see entries not only for our wget load, but for

893

other processes running on the system as well:

894

895

.. image:: figures/perf-systemwide.png

896

:align: center

897

898

In the snapshot above, we can see callchains that originate in libc, and

899

a callchain from Xorg that demonstrates that we're using a proprietary X

900

driver in userspace (notice the presence of 'PVR' and some other

901

unresolvable symbols in the expanded Xorg callchain).

902

903

Note also that we have both kernel and userspace entries in the above

904

snapshot. We can also tell perf to focus on userspace but providing a

905

modifier, in this case 'u', to the 'cycles' hardware counter when we

906

record a profile: ::

907

908

root@crownbay:~# perf record -g -a -e cycles:u

909

^C[ perf record: Woken up 2 times to write data ]

910

[ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ]

911

912

.. image:: figures/perf-report-cycles-u.png

913

:align: center

914

915

Notice in the screenshot above, we see only userspace entries ([.])

916

917

Finally, we can press 'enter' on a leaf node and select the 'Zoom into

918

DSO' menu item to show only entries associated with a specific DSO. In

919

the screenshot below, we've zoomed into the 'libc' DSO which shows all

920

the entries associated with the libc-xxx.so DSO.

921

922

.. image:: figures/perf-systemwide-libc.png

923

:align: center

924

925

We can also use the system-wide -a switch to do system-wide tracing.

926

Here we'll trace a couple of scheduler events: ::

927

928

root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup

929

^C[ perf record: Woken up 38 times to write data ]

930

[ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ]

931

932

We can look at the raw output using 'perf script' with no arguments: ::

933

934

root@crownbay:~# perf script

935

936

perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

937

perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

938

kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120

939

swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000

940

swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

941

kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

942

perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

943

perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

944

kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120

945

perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

946

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Filtering

^^^^^^^^^

Notice that there are a lot of events that don't really have anything to

951

do with what we're interested in, namely events that schedule 'perf'

952

itself in and out or that wake perf up. We can get rid of those by using

953

the '--filter' option - for each event we specify using -e, we can add a

954

--filter after that to filter out trace events that contain fields with

955

specific values: ::

956

957

root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf'

958

^C[ perf record: Woken up 38 times to write data ]

959

[ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ]

960

961

962

root@crownbay:~# perf script

963

964

swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

965

kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

966

perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

967

perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

968

perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

969

perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

970

perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

971

perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

972

swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

973

swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

974

kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120

975

swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000

976

swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

977

kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

978

979

In this case, we've filtered out all events that have

980

'perf' in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice that

981

there are still events recorded for perf, but notice that those events

982

don't have values of 'perf' for the filtered fields. To completely

983

filter out anything from perf will require a bit more work, but for the

984

purpose of demonstrating how to use filters, it's close enough.

985

986

.. admonition:: Tying it Together

987

988

These are exactly the same set of event filters defined by the trace

989

event subsystem. See the ftrace/tracecmd/kernelshark section for more

990

discussion about these event filters.

991

992

.. admonition:: Tying it Together

993

994

These event filters are implemented by a special-purpose

995

pseudo-interpreter in the kernel and are an integral and

996

indispensable part of the perf design as it relates to tracing.

997

kernel-based event filters provide a mechanism to precisely throttle

998

the event stream that appears in user space, where it makes sense to

999

provide bindings to real programming languages for postprocessing the

1000

event stream. This architecture allows for the intelligent and

1001

flexible partitioning of processing between the kernel and user

1002

space. Contrast this with other tools such as SystemTap, which does

1003

all of its processing in the kernel and as such requires a special

1004

project-defined language in order to accommodate that design, or

1005

LTTng, where everything is sent to userspace and as such requires a

1006

super-efficient kernel-to-userspace transport mechanism in order to

1007

function properly. While perf certainly can benefit from for instance

1008

advances in the design of the transport, it doesn't fundamentally

1009

depend on them. Basically, if you find that your perf tracing

1010

application is causing buffer I/O overruns, it probably means that

1011

you aren't taking enough advantage of the kernel filtering engine.

1012

1013

Using Dynamic Tracepoints

1014

~~~~~~~~~~~~~~~~~~~~~~~~~

1015

1016

perf isn't restricted to the fixed set of static tracepoints listed by

1017

'perf list'. Users can also add their own 'dynamic' tracepoints anywhere

1018

in the kernel. For instance, suppose we want to define our own

1019

tracepoint on do_fork(). We can do that using the 'perf probe' perf

1020

subcommand: ::

1021

1022

root@crownbay:~# perf probe do_fork

1023

Added new event:

1024

probe:do_fork (on do_fork)

1025

1026

You can now use it in all perf tools, such as:

1027

1028

perf record -e probe:do_fork -aR sleep 1

1029

1030

Adding a new tracepoint via

1031

'perf probe' results in an event with all the expected files and format

1032

in /sys/kernel/debug/tracing/events, just the same as for static

1033

tracepoints (as discussed in more detail in the trace events subsystem

1034

section: ::

1035

1036

root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al

1037

drwxr-xr-x 2 root root 0 Oct 28 11:42 .

1038

drwxr-xr-x 3 root root 0 Oct 28 11:42 ..

1039

-rw-r--r-- 1 root root 0 Oct 28 11:42 enable

1040

-rw-r--r-- 1 root root 0 Oct 28 11:42 filter

1041

-r--r--r-- 1 root root 0 Oct 28 11:42 format

1042

-r--r--r-- 1 root root 0 Oct 28 11:42 id

1043

1044

root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format

name: do_fork

ID: 944

format:

field:unsigned short common_type; offset:0; size:2; signed:0;

1049

field:unsigned char common_flags; offset:2; size:1; signed:0;

1050

field:unsigned char common_preempt_count; offset:3; size:1; signed:0;

1051

field:int common_pid; offset:4; size:4; signed:1;

1052

field:int common_padding; offset:8; size:4; signed:1;

1053

1054

field:unsigned long __probe_ip; offset:12; size:4; signed:0;

1055

1056

print fmt: "(%lx)", REC->__probe_ip

1057

1058

We can list all dynamic tracepoints currently in

1059

existence: ::

1060

1061

root@crownbay:~# perf probe -l

1062

probe:do_fork (on do_fork)

1063

probe:schedule (on schedule)

1064

1065

Let's record system-wide ('sleep 30' is a

1066

trick for recording system-wide but basically do nothing and then wake

1067

up after 30 seconds): ::

1068

1069

root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30

1070

[ perf record: Woken up 1 times to write data ]

1071

[ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ]

1072

1073

Using 'perf script' we can see each do_fork event that fired: ::

1074

1075

root@crownbay:~# perf script

1076

1077

# ========

1078

# captured on: Sun Oct 28 11:55:18 2012

1079

# hostname : crownbay

1080

# os release : 3.4.11-yocto-standard

1081

# perf version : 3.4.11

# arch : i686

# nrcpus online : 2

# nrcpus avail : 2

# cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz

1086

# cpuid : GenuineIntel,6,38,1

1087

# total memory : 1017184 kB

1088

# cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30

1089

# event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern

1090

= 0, id = { 5, 6 }

1091

# HEADER_CPU_TOPOLOGY info available, use -I to display

1092

# ========

1093

#

1094

matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460)

1095

matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460)

1096

pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460)

1097

pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460)

1098

matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460)

1099

matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460)

1100

gthumb 1300 [001] 34217.697451: do_fork: (c1028460)

1101

gthumb 1300 [001] 34219.085734: do_fork: (c1028460)

1102

gthumb 1300 [000] 34219.121351: do_fork: (c1028460)

1103

gthumb 1300 [001] 34219.264551: do_fork: (c1028460)

1104

pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460)

1105

matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460)

1106

matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460)

1107

matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460)

1108

matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460)

1109

matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460)

1110

matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460)

1111

matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460)

1112

gaku 1312 [000] 34237.202388: do_fork: (c1028460)

1113

1114

And using 'perf report' on the same file, we can see the

1115

callgraphs from starting a few programs during those 30 seconds:

1116

1117

.. image:: figures/perf-probe-do_fork-profile.png

1118

:align: center

1119

1120

.. admonition:: Tying it Together

1121

1122

The trace events subsystem accommodate static and dynamic tracepoints

1123

in exactly the same way - there's no difference as far as the

1124

infrastructure is concerned. See the ftrace section for more details

1125

on the trace event subsystem.

1126

1127

.. admonition:: Tying it Together

1128

1129

Dynamic tracepoints are implemented under the covers by kprobes and

1130

uprobes. kprobes and uprobes are also used by and in fact are the

1131

main focus of SystemTap.

1132

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Perf Documentation

------------------

Online versions of the man pages for the commands discussed in this

1137

section can be found here:

1138

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1139

- The `'perf stat' manpage <http://linux.die.net/man/1/perf-stat>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1140

1141

- The `'perf record'

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1142

manpage <http://linux.die.net/man/1/perf-record>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1143

1144

- The `'perf report'

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1145

manpage <http://linux.die.net/man/1/perf-report>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1146

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1147

- The `'perf probe' manpage <http://linux.die.net/man/1/perf-probe>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1148

1149

- The `'perf script'

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1150

manpage <http://linux.die.net/man/1/perf-script>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1151

1152

- Documentation on using the `'perf script' python

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1153

binding <http://linux.die.net/man/1/perf-script-python>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1154

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1155

- The top-level `perf(1) manpage <http://linux.die.net/man/1/perf>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1156

1157

Normally, you should be able to invoke the man pages via perf itself

1158

e.g. 'perf help' or 'perf help record'.

1159

1160

However, by default Yocto doesn't install man pages, but perf invokes

1161

the man pages for most help functionality. This is a bug and is being

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1162

addressed by a Yocto bug: `Bug 3388 - perf: enable man pages for basic

1163

'help'

1164

functionality <https://bugzilla.yoctoproject.org/show_bug.cgi?id=3388>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1165

1166

The man pages in text form, along with some other files, such as a set

1167

of examples, can be found in the 'perf' directory of the kernel tree: ::

1168

1169

tools/perf/Documentation

1170

1171

There's also a nice perf tutorial on the perf

1172

wiki that goes into more detail than we do here in certain areas: `Perf

1173

Tutorial <https://perf.wiki.kernel.org/index.php/Tutorial>`__

1174

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

ftrace

======

'ftrace' literally refers to the 'ftrace function tracer' but in reality

1179

this encompasses a number of related tracers along with the

1180

infrastructure that they all make use of.

1181

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

ftrace Setup

------------

For this section, we'll assume you've already performed the basic setup

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

1186

outlined in the ":ref:`profile-manual/intro:General Setup`" section.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1187

1188

ftrace, trace-cmd, and kernelshark run on the target system, and are

1189

ready to go out-of-the-box - no additional setup is necessary. For the

1190

rest of this section we assume you've ssh'ed to the host and will be

1191

running ftrace on the target. kernelshark is a GUI application and if

1192

you use the '-X' option to ssh you can have the kernelshark GUI run on

1193

the target but display remotely on the host if you want.

Basic ftrace usage

------------------

'ftrace' essentially refers to everything included in the /tracing

1199

directory of the mounted debugfs filesystem (Yocto follows the standard

1200

convention and mounts it at /sys/kernel/debug). Here's a listing of all

1201

the files found in /sys/kernel/debug/tracing on a Yocto system: ::

1202

1203

root@sugarbay:/sys/kernel/debug/tracing# ls

1204

README kprobe_events trace

1205

available_events kprobe_profile trace_clock

1206

available_filter_functions options trace_marker

1207

available_tracers per_cpu trace_options

1208

buffer_size_kb printk_formats trace_pipe

1209

buffer_total_size_kb saved_cmdlines tracing_cpumask

1210

current_tracer set_event tracing_enabled

1211

dyn_ftrace_total_info set_ftrace_filter tracing_on

1212

enabled_functions set_ftrace_notrace tracing_thresh

1213

events set_ftrace_pid

1214

free_buffer set_graph_function

1215

1216

The files listed above are used for various purposes

1217

- some relate directly to the tracers themselves, others are used to set

1218

tracing options, and yet others actually contain the tracing output when

1219

a tracer is in effect. Some of the functions can be guessed from their

1220

names, others need explanation; in any case, we'll cover some of the

1221

files we see here below but for an explanation of the others, please see

1222

the ftrace documentation.

1223

1224

We'll start by looking at some of the available built-in tracers.

1225

1226

cat'ing the 'available_tracers' file lists the set of available tracers: ::

1227

1228

root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers

1229

blk function_graph function nop

1230

1231

The 'current_tracer' file contains the tracer currently in effect: ::

1232

1233

root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer

1234

nop

1235

1236

The above listing of current_tracer shows that the

1237

'nop' tracer is in effect, which is just another way of saying that

1238

there's actually no tracer currently in effect.

1239

1240

echo'ing one of the available_tracers into current_tracer makes the

1241

specified tracer the current tracer: ::

1242

1243

root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer

1244

root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer

1245

function

1246

1247

The above sets the current tracer to be the 'function tracer'. This tracer

1248

traces every function call in the kernel and makes it available as the

1249

contents of the 'trace' file. Reading the 'trace' file lists the

1250

currently buffered function calls that have been traced by the function

1251

tracer: ::

1252

1253

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

# tracer: function

#

# entries-in-buffer/entries-written: 310629/766471 #P:8

1258

#

1259

# _-----=> irqs-off

1260

# / _----=> need-resched

1261

# | / _---=> hardirq/softirq

1262

# || / _--=> preempt-depth

1263

# ||| / delay

1264

# TASK-PID CPU# |||| TIMESTAMP FUNCTION

1265

# | | | |||| | |

1266

<idle>-0 [004] d..1 470.867169: ktime_get_real <-intel_idle

1267

<idle>-0 [004] d..1 470.867170: getnstimeofday <-ktime_get_real

1268

<idle>-0 [004] d..1 470.867171: ns_to_timeval <-intel_idle

1269

<idle>-0 [004] d..1 470.867171: ns_to_timespec <-ns_to_timeval

1270

<idle>-0 [004] d..1 470.867172: smp_apic_timer_interrupt <-apic_timer_interrupt

1271

<idle>-0 [004] d..1 470.867172: native_apic_mem_write <-smp_apic_timer_interrupt

1272

<idle>-0 [004] d..1 470.867172: irq_enter <-smp_apic_timer_interrupt

1273

<idle>-0 [004] d..1 470.867172: rcu_irq_enter <-irq_enter

1274

<idle>-0 [004] d..1 470.867173: rcu_idle_exit_common.isra.33 <-rcu_irq_enter

1275

<idle>-0 [004] d..1 470.867173: local_bh_disable <-irq_enter

1276

<idle>-0 [004] d..1 470.867173: add_preempt_count <-local_bh_disable

1277

<idle>-0 [004] d.s1 470.867174: tick_check_idle <-irq_enter

1278

<idle>-0 [004] d.s1 470.867174: tick_check_oneshot_broadcast <-tick_check_idle

1279

<idle>-0 [004] d.s1 470.867174: ktime_get <-tick_check_idle

1280

<idle>-0 [004] d.s1 470.867174: tick_nohz_stop_idle <-tick_check_idle

1281

<idle>-0 [004] d.s1 470.867175: update_ts_time_stats <-tick_nohz_stop_idle

1282

<idle>-0 [004] d.s1 470.867175: nr_iowait_cpu <-update_ts_time_stats

1283

<idle>-0 [004] d.s1 470.867175: tick_do_update_jiffies64 <-tick_check_idle

1284

<idle>-0 [004] d.s1 470.867175: _raw_spin_lock <-tick_do_update_jiffies64

1285

<idle>-0 [004] d.s1 470.867176: add_preempt_count <-_raw_spin_lock

1286

<idle>-0 [004] d.s2 470.867176: do_timer <-tick_do_update_jiffies64

1287

<idle>-0 [004] d.s2 470.867176: _raw_spin_lock <-do_timer

1288

<idle>-0 [004] d.s2 470.867176: add_preempt_count <-_raw_spin_lock

1289

<idle>-0 [004] d.s3 470.867177: ntp_tick_length <-do_timer

1290

<idle>-0 [004] d.s3 470.867177: _raw_spin_lock_irqsave <-ntp_tick_length

.

.

.

Each line in the trace above shows what was happening in the kernel on a given

1296

cpu, to the level of detail of function calls. Each entry shows the function

1297

called, followed by its caller (after the arrow).

1298

1299

The function tracer gives you an extremely detailed idea of what the

1300

kernel was doing at the point in time the trace was taken, and is a

1301

great way to learn about how the kernel code works in a dynamic sense.

1302

1303

.. admonition:: Tying it Together

1304

1305

The ftrace function tracer is also available from within perf, as the

1306

ftrace:function tracepoint.

1307

1308

It is a little more difficult to follow the call chains than it needs to

1309

be - luckily there's a variant of the function tracer that displays the

1310

callchains explicitly, called the 'function_graph' tracer: ::

1311

1312

root@sugarbay:/sys/kernel/debug/tracing# echo function_graph > current_tracer

1313

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

1314

1315

tracer: function_graph

1316

1317

CPU DURATION FUNCTION CALLS

1318

| | | | | | |

1319

7) 0.046 us | pick_next_task_fair();

1320

7) 0.043 us | pick_next_task_stop();

1321

7) 0.042 us | pick_next_task_rt();

1322

7) 0.032 us | pick_next_task_fair();

1323

7) 0.030 us | pick_next_task_idle();

1324

7) | _raw_spin_unlock_irq() {

1325

7) 0.033 us | sub_preempt_count();

1326

7) 0.258 us | }

1327

7) 0.032 us | sub_preempt_count();

1328

7) + 13.341 us | } /* __schedule */

1329

7) 0.095 us | } /* sub_preempt_count */

1330

7) | schedule() {

1331

7) | __schedule() {

1332

7) 0.060 us | add_preempt_count();

1333

7) 0.044 us | rcu_note_context_switch();

1334

7) | _raw_spin_lock_irq() {

1335

7) 0.033 us | add_preempt_count();

1336

7) 0.247 us | }

1337

7) | idle_balance() {

1338

7) | _raw_spin_unlock() {

1339

7) 0.031 us | sub_preempt_count();

1340

7) 0.246 us | }

1341

7) | update_shares() {

1342

7) 0.030 us | __rcu_read_lock();

1343

7) 0.029 us | __rcu_read_unlock();

1344

7) 0.484 us | }

1345

7) 0.030 us | __rcu_read_lock();

1346

7) | load_balance() {

1347

7) | find_busiest_group() {

1348

7) 0.031 us | idle_cpu();

1349

7) 0.029 us | idle_cpu();

1350

7) 0.035 us | idle_cpu();

1351

7) 0.906 us | }

1352

7) 1.141 us | }

1353

7) 0.022 us | msecs_to_jiffies();

1354

7) | load_balance() {

1355

7) | find_busiest_group() {

1356

7) 0.031 us | idle_cpu();

.

.

.

4) 0.062 us | msecs_to_jiffies();

1361

4) 0.062 us | __rcu_read_unlock();

1362

4) | _raw_spin_lock() {

1363

4) 0.073 us | add_preempt_count();

1364

4) 0.562 us | }

1365

4) + 17.452 us | }

1366

4) 0.108 us | put_prev_task_fair();

1367

4) 0.102 us | pick_next_task_fair();

1368

4) 0.084 us | pick_next_task_stop();

1369

4) 0.075 us | pick_next_task_rt();

1370

4) 0.062 us | pick_next_task_fair();

1371

4) 0.066 us | pick_next_task_idle();

1372

------------------------------------------

1373

4) kworker-74 => <idle>-0

1374

------------------------------------------

1375

1376

4) | finish_task_switch() {

1377

4) | _raw_spin_unlock_irq() {

1378

4) 0.100 us | sub_preempt_count();

1379

4) 0.582 us | }

1380

4) 1.105 us | }

1381

4) 0.088 us | sub_preempt_count();

4) ! 100.066 us | }

.

.

.

3) | sys_ioctl() {

3) 0.083 us | fget_light();

1388

3) | security_file_ioctl() {

1389

3) 0.066 us | cap_file_ioctl();

1390

3) 0.562 us | }

1391

3) | do_vfs_ioctl() {

1392

3) | drm_ioctl() {

1393

3) 0.075 us | drm_ut_debug_printk();

1394

3) | i915_gem_pwrite_ioctl() {

1395

3) | i915_mutex_lock_interruptible() {

1396

3) 0.070 us | mutex_lock_interruptible();

1397

3) 0.570 us | }

1398

3) | drm_gem_object_lookup() {

1399

3) | _raw_spin_lock() {

1400

3) 0.080 us | add_preempt_count();

1401

3) 0.620 us | }

1402

3) | _raw_spin_unlock() {

1403

3) 0.085 us | sub_preempt_count();

1404

3) 0.562 us | }

1405

3) 2.149 us | }

1406

3) 0.133 us | i915_gem_object_pin();

1407

3) | i915_gem_object_set_to_gtt_domain() {

1408

3) 0.065 us | i915_gem_object_flush_gpu_write_domain();

1409

3) 0.065 us | i915_gem_object_wait_rendering();

1410

3) 0.062 us | i915_gem_object_flush_cpu_write_domain();

1411

3) 1.612 us | }

1412

3) | i915_gem_object_put_fence() {

1413

3) 0.097 us | i915_gem_object_flush_fence.constprop.36();

1414

3) 0.645 us | }

1415

3) 0.070 us | add_preempt_count();

1416

3) 0.070 us | sub_preempt_count();

1417

3) 0.073 us | i915_gem_object_unpin();

1418

3) 0.068 us | mutex_unlock();

3) 9.924 us | }

3) + 11.236 us | }

3) + 11.770 us | }

3) + 13.784 us | }

3) | sys_ioctl() {

As you can see, the function_graph display is much easier

1426

to follow. Also note that in addition to the function calls and

1427

associated braces, other events such as scheduler events are displayed

1428

in context. In fact, you can freely include any tracepoint available in

1429

the trace events subsystem described in the next section by simply

1430

enabling those events, and they'll appear in context in the function

1431

graph display. Quite a powerful tool for understanding kernel dynamics.

1432

1433

Also notice that there are various annotations on the left hand side of

1434

the display. For example if the total time it took for a given function

1435

to execute is above a certain threshold, an exclamation point or plus

1436

sign appears on the left hand side. Please see the ftrace documentation

1437

for details on all these fields.

1438

1439

The 'trace events' Subsystem

1440

----------------------------

1441

1442

One especially important directory contained within the

1443

/sys/kernel/debug/tracing directory is the 'events' subdirectory, which

1444

contains representations of every tracepoint in the system. Listing out

1445

the contents of the 'events' subdirectory, we see mainly another set of

1446

subdirectories: ::

1447

1448

root@sugarbay:/sys/kernel/debug/tracing# cd events

1449

root@sugarbay:/sys/kernel/debug/tracing/events# ls -al

1450

drwxr-xr-x 38 root root 0 Nov 14 23:19 .

1451

drwxr-xr-x 5 root root 0 Nov 14 23:19 ..

1452

drwxr-xr-x 19 root root 0 Nov 14 23:19 block

1453

drwxr-xr-x 32 root root 0 Nov 14 23:19 btrfs

1454

drwxr-xr-x 5 root root 0 Nov 14 23:19 drm

1455

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1456

drwxr-xr-x 40 root root 0 Nov 14 23:19 ext3

1457

drwxr-xr-x 79 root root 0 Nov 14 23:19 ext4

1458

drwxr-xr-x 14 root root 0 Nov 14 23:19 ftrace

1459

drwxr-xr-x 8 root root 0 Nov 14 23:19 hda

1460

-r--r--r-- 1 root root 0 Nov 14 23:19 header_event

1461

-r--r--r-- 1 root root 0 Nov 14 23:19 header_page

1462

drwxr-xr-x 25 root root 0 Nov 14 23:19 i915

1463

drwxr-xr-x 7 root root 0 Nov 14 23:19 irq

1464

drwxr-xr-x 12 root root 0 Nov 14 23:19 jbd

1465

drwxr-xr-x 14 root root 0 Nov 14 23:19 jbd2

1466

drwxr-xr-x 14 root root 0 Nov 14 23:19 kmem

1467

drwxr-xr-x 7 root root 0 Nov 14 23:19 module

1468

drwxr-xr-x 3 root root 0 Nov 14 23:19 napi

1469

drwxr-xr-x 6 root root 0 Nov 14 23:19 net

1470

drwxr-xr-x 3 root root 0 Nov 14 23:19 oom

1471

drwxr-xr-x 12 root root 0 Nov 14 23:19 power

1472

drwxr-xr-x 3 root root 0 Nov 14 23:19 printk

1473

drwxr-xr-x 8 root root 0 Nov 14 23:19 random

1474

drwxr-xr-x 4 root root 0 Nov 14 23:19 raw_syscalls

1475

drwxr-xr-x 3 root root 0 Nov 14 23:19 rcu

1476

drwxr-xr-x 6 root root 0 Nov 14 23:19 rpm

1477

drwxr-xr-x 20 root root 0 Nov 14 23:19 sched

1478

drwxr-xr-x 7 root root 0 Nov 14 23:19 scsi

1479

drwxr-xr-x 4 root root 0 Nov 14 23:19 signal

1480

drwxr-xr-x 5 root root 0 Nov 14 23:19 skb

1481

drwxr-xr-x 4 root root 0 Nov 14 23:19 sock

1482

drwxr-xr-x 10 root root 0 Nov 14 23:19 sunrpc

1483

drwxr-xr-x 538 root root 0 Nov 14 23:19 syscalls

1484

drwxr-xr-x 4 root root 0 Nov 14 23:19 task

1485

drwxr-xr-x 14 root root 0 Nov 14 23:19 timer

1486

drwxr-xr-x 3 root root 0 Nov 14 23:19 udp

1487

drwxr-xr-x 21 root root 0 Nov 14 23:19 vmscan

1488

drwxr-xr-x 3 root root 0 Nov 14 23:19 vsyscall

1489

drwxr-xr-x 6 root root 0 Nov 14 23:19 workqueue

1490

drwxr-xr-x 26 root root 0 Nov 14 23:19 writeback

1491

1492

Each one of these subdirectories

1493

corresponds to a 'subsystem' and contains yet again more subdirectories,

1494

each one of those finally corresponding to a tracepoint. For example,

1495

here are the contents of the 'kmem' subsystem: ::

1496

1497

root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem

1498

root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al

1499

drwxr-xr-x 14 root root 0 Nov 14 23:19 .

1500

drwxr-xr-x 38 root root 0 Nov 14 23:19 ..

1501

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1502

-rw-r--r-- 1 root root 0 Nov 14 23:19 filter

1503

drwxr-xr-x 2 root root 0 Nov 14 23:19 kfree

1504

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc

1505

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc_node

1506

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc

1507

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc_node

1508

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_free

1509

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc

1510

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_extfrag

1511

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_zone_locked

1512

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free

1513

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free_batched

1514

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_pcpu_drain

1515

1516

Let's see what's inside the subdirectory for a

1517

specific tracepoint, in this case the one for kmalloc: ::

1518

1519

root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc

1520

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al

1521

drwxr-xr-x 2 root root 0 Nov 14 23:19 .

1522

drwxr-xr-x 14 root root 0 Nov 14 23:19 ..

1523

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1524

-rw-r--r-- 1 root root 0 Nov 14 23:19 filter

1525

-r--r--r-- 1 root root 0 Nov 14 23:19 format

1526

-r--r--r-- 1 root root 0 Nov 14 23:19 id

1527

1528

The 'format' file for the

1529

tracepoint describes the event in memory, which is used by the various

1530

tracing tools that now make use of these tracepoint to parse the event

1531

and make sense of it, along with a 'print fmt' field that allows tools

1532

like ftrace to display the event as text. Here's what the format of the

1533

kmalloc event looks like: ::

1534

1535

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format

name: kmalloc

ID: 313

format:

field:unsigned short common_type; offset:0; size:2; signed:0;

1540

field:unsigned char common_flags; offset:2; size:1; signed:0;

1541

field:unsigned char common_preempt_count; offset:3; size:1; signed:0;

1542

field:int common_pid; offset:4; size:4; signed:1;

1543

field:int common_padding; offset:8; size:4; signed:1;

1544

1545

field:unsigned long call_site; offset:16; size:8; signed:0;

1546

field:const void * ptr; offset:24; size:8; signed:0;

1547

field:size_t bytes_req; offset:32; size:8; signed:0;

1548

field:size_t bytes_alloc; offset:40; size:8; signed:0;

1549

field:gfp_t gfp_flags; offset:48; size:4; signed:0;

1550

1551

print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc,

1552

(REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1553

1554

gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | ((

1555

gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1556

gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1557

gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"},

1558

{(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)),

1559

"GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)((

1560

gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)((

1561

gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned

1562

long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"}, {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"},

1563

{(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u),

1564

"GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned

1565

long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"},

1566

{(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT"

1567

1568

The 'enable' file

1569

in the tracepoint directory is what allows the user (or tools such as

1570

trace-cmd) to actually turn the tracepoint on and off. When enabled, the

1571

corresponding tracepoint will start appearing in the ftrace 'trace' file

1572

described previously. For example, this turns on the kmalloc tracepoint: ::

1573

1574

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable

1575

1576

At the moment, we're not interested in the function tracer or

1577

some other tracer that might be in effect, so we first turn it off, but

1578

if we do that, we still need to turn tracing on in order to see the

1579

events in the output buffer: ::

1580

1581

root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer

1582

root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on

1583

1584

Now, if we look at the the 'trace' file, we see nothing

1585

but the kmalloc events we just turned on: ::

1586

1587

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

1588

# tracer: nop

1589

#

1590

# entries-in-buffer/entries-written: 1897/1897 #P:8

1591

#

1592

# _-----=> irqs-off

1593

# / _----=> need-resched

1594

# | / _---=> hardirq/softirq

1595

# || / _--=> preempt-depth

1596

# ||| / delay

1597

# TASK-PID CPU# |||| TIMESTAMP FUNCTION

1598

# | | | |||| | |

1599

dropbear-1465 [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1600

<idle>-0 [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1601

<idle>-0 [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1602

matchbox-termin-1361 [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT

1603

Xorg-1264 [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1604

Xorg-1264 [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1605

Xorg-1264 [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1606

matchbox-termin-1361 [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT

1607

Xorg-1264 [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1608

Xorg-1264 [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1609

Xorg-1264 [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1610

<idle>-0 [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1611

dropbear-1465 [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1612

<idle>-0 [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1613

<idle>-0 [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1614

dropbear-1465 [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1615

<idle>-0 [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1616

<idle>-0 [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1617

dropbear-1465 [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1618

<idle>-0 [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1619

matchbox-termin-1361 [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT

1620

Xorg-1264 [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1621

Xorg-1264 [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1622

Xorg-1264 [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1623

<idle>-0 [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1624

dropbear-1465 [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1625

<idle>-0 [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1626

<idle>-0 [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1627

dropbear-1465 [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1628

<idle>-0 [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1629

matchbox-termin-1361 [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT

1630

1631

To again disable the kmalloc event, we need to send 0 to the enable file: ::

1632

1633

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable

1634

1635

You can enable any number of events or complete subsystems (by

1636

using the 'enable' file in the subsystem directory) and get an

1637

arbitrarily fine-grained idea of what's going on in the system by

1638

enabling as many of the appropriate tracepoints as applicable.

1639

1640

A number of the tools described in this HOWTO do just that, including

1641

trace-cmd and kernelshark in the next section.

1642

1643

.. admonition:: Tying it Together

1644

1645

These tracepoints and their representation are used not only by

1646

ftrace, but by many of the other tools covered in this document and

1647

they form a central point of integration for the various tracers

1648

available in Linux. They form a central part of the instrumentation

1649

for the following tools: perf, lttng, ftrace, blktrace and SystemTap

1650

1651

.. admonition:: Tying it Together

1652

1653

Eventually all the special-purpose tracers currently available in

1654

/sys/kernel/debug/tracing will be removed and replaced with

1655

equivalent tracers based on the 'trace events' subsystem.

1656

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1657

trace-cmd/kernelshark

1658

---------------------

1659

1660

trace-cmd is essentially an extensive command-line 'wrapper' interface

1661

that hides the details of all the individual files in

1662

/sys/kernel/debug/tracing, allowing users to specify specific particular

1663

events within the /sys/kernel/debug/tracing/events/ subdirectory and to

1664

collect traces and avoid having to deal with those details directly.

1665

1666

As yet another layer on top of that, kernelshark provides a GUI that

1667

allows users to start and stop traces and specify sets of events using

1668

an intuitive interface, and view the output as both trace events and as

1669

a per-CPU graphical display. It directly uses 'trace-cmd' as the

1670

plumbing that accomplishes all that underneath the covers (and actually

1671

displays the trace-cmd command it uses, as we'll see).

1672

1673

To start a trace using kernelshark, first start kernelshark: ::

1674

1675

root@sugarbay:~# kernelshark

1676

1677

Then bring up the 'Capture' dialog by

1678

choosing from the kernelshark menu: ::

Capture | Record

That will display the following dialog, which allows you to choose one or more

1683

events (or even one or more complete subsystems) to trace:

1684

1685

.. image:: figures/kernelshark-choose-events.png

1686

:align: center

1687

1688

Note that these are exactly the same sets of events described in the

1689

previous trace events subsystem section, and in fact is where trace-cmd

1690

gets them for kernelshark.

1691

1692

In the above screenshot, we've decided to explore the graphics subsystem

1693

a bit and so have chosen to trace all the tracepoints contained within

1694

the 'i915' and 'drm' subsystems.

1695

1696

After doing that, we can start and stop the trace using the 'Run' and

1697

'Stop' button on the lower right corner of the dialog (the same button

1698

will turn into the 'Stop' button after the trace has started):

1699

1700

.. image:: figures/kernelshark-output-display.png

1701

:align: center

1702

1703

Notice that the right-hand pane shows the exact trace-cmd command-line

1704

that's used to run the trace, along with the results of the trace-cmd

1705

run.

1706

1707

Once the 'Stop' button is pressed, the graphical view magically fills up

1708

with a colorful per-cpu display of the trace data, along with the

1709

detailed event listing below that:

1710

1711

.. image:: figures/kernelshark-i915-display.png

1712

:align: center

1713

1714

Here's another example, this time a display resulting from tracing 'all

1715

events':

1716

1717

.. image:: figures/kernelshark-all.png

1718

:align: center

1719

1720

The tool is pretty self-explanatory, but for more detailed information

1721

on navigating through the data, see the `kernelshark

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1722

website <http://rostedt.homelinux.com/kernelshark/>`__.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1723

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

ftrace Documentation

--------------------

The documentation for ftrace can be found in the kernel Documentation

1728

directory: ::

1729

1730

Documentation/trace/ftrace.txt

1731

1732

The documentation for the trace event subsystem can also be found in the kernel

1733

Documentation directory: ::

1734

1735

Documentation/trace/events.txt

1736

1737

There is a nice series of articles on using ftrace and trace-cmd at LWN:

1738

1739

- `Debugging the kernel using Ftrace - part

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1740

1 <http://lwn.net/Articles/365835/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1741

1742

- `Debugging the kernel using Ftrace - part

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1743

2 <http://lwn.net/Articles/366796/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1744

1745

- `Secrets of the Ftrace function

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1746

tracer <http://lwn.net/Articles/370423/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1747

1748

- `trace-cmd: A front-end for

1749

Ftrace <https://lwn.net/Articles/410200/>`__

1750

1751

There's more detailed documentation kernelshark usage here:

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1752

`KernelShark <http://rostedt.homelinux.com/kernelshark/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1753

1754

An amusing yet useful README (a tracing mini-HOWTO) can be found in

1755

``/sys/kernel/debug/tracing/README``.

1756

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

systemtap

=========

SystemTap is a system-wide script-based tracing and profiling tool.

1761

1762

SystemTap scripts are C-like programs that are executed in the kernel to

1763

gather/print/aggregate data extracted from the context they end up being

1764

invoked under.

1765

1766

For example, this probe from the `SystemTap

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1767

tutorial <http://sourceware.org/systemtap/tutorial/>`__ simply prints a

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1768

line every time any process on the system open()s a file. For each line,

1769

it prints the executable name of the program that opened the file, along

1770

with its PID, and the name of the file it opened (or tried to open),

1771

which it extracts from the open syscall's argstr.

.. code-block:: none

probe syscall.open

{

printf ("%s(%d) open (%s)\n", execname(), pid(), argstr)

1778

}

1779

1780

probe timer.ms(4000) # after 4 seconds

{

exit ()

}

Normally, to execute this

1786

probe, you'd simply install systemtap on the system you want to probe,

1787

and directly run the probe on that system e.g. assuming the name of the

1788

file containing the above text is trace_open.stp: ::

1789

1790

# stap trace_open.stp

1791

1792

What systemtap does under the covers to run this probe is 1) parse and

1793

convert the probe to an equivalent 'C' form, 2) compile the 'C' form

1794

into a kernel module, 3) insert the module into the kernel, which arms

1795

it, and 4) collect the data generated by the probe and display it to the

1796

user.

1797

1798

In order to accomplish steps 1 and 2, the 'stap' program needs access to

1799

the kernel build system that produced the kernel that the probed system

1800

is running. In the case of a typical embedded system (the 'target'), the

1801

kernel build system unfortunately isn't typically part of the image

1802

running on the target. It is normally available on the 'host' system

1803

that produced the target image however; in such cases, steps 1 and 2 are

1804

executed on the host system, and steps 3 and 4 are executed on the

1805

target system, using only the systemtap 'runtime'.

1806

1807

The systemtap support in Yocto assumes that only steps 3 and 4 are run

1808

on the target; it is possible to do everything on the target, but this

1809

section assumes only the typical embedded use-case.

1810

1811

So basically what you need to do in order to run a systemtap script on

1812

the target is to 1) on the host system, compile the probe into a kernel

1813

module that makes sense to the target, 2) copy the module onto the

1814

target system and 3) insert the module into the target kernel, which

1815

arms it, and 4) collect the data generated by the probe and display it

1816

to the user.

1817

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

systemtap Setup

---------------

Those are a lot of steps and a lot of details, but fortunately Yocto

1822

includes a script called 'crosstap' that will take care of those

1823

details, allowing you to simply execute a systemtap script on the remote

1824

target, with arguments if necessary.

1825

1826

In order to do this from a remote host, however, you need to have access

1827

to the build for the image you booted. The 'crosstap' script provides

1828

details on how to do this if you run the script on the host without

1829

having done a build: ::

1830

1831

$ crosstap root@192.168.1.88 trace_open.stp

1832

1833

Error: No target kernel build found.

1834

Did you forget to create a local build of your image?

1835

1836

'crosstap' requires a local sdk build of the target system

1837

(or a build that includes 'tools-profile') in order to build

1838

kernel modules that can probe the target system.

1839

1840

Practically speaking, that means you need to do the following:

1841

- If you're running a pre-built image, download the release

1842

and/or BSP tarballs used to build the image.

1843

- If you're working from git sources, just clone the metadata

1844

and BSP layers needed to build the image you'll be booting.

1845

- Make sure you're properly set up to build a new image (see

1846

the BSP README and/or the widely available basic documentation

1847

that discusses how to build images).

1848

- Build an -sdk version of the image e.g.:

1849

$ bitbake core-image-sato-sdk

1850

OR

1851

- Build a non-sdk image but include the profiling tools:

1852

[ edit local.conf and add 'tools-profile' to the end of

1853

the EXTRA_IMAGE_FEATURES variable ]

1854

$ bitbake core-image-sato

1855

1856

Once you've build the image on the host system, you're ready to

1857

boot it (or the equivalent pre-built image) and use 'crosstap'

1858

to probe it (you need to source the environment as usual first):

1859

1860

$ source oe-init-build-env

1861

$ cd ~/my/systemtap/scripts

1862

$ crosstap root@192.168.1.xxx myscript.stp

.. note::

SystemTap, which uses 'crosstap', assumes you can establish an ssh

1867

connection to the remote target. Please refer to the crosstap wiki

1868

page for details on verifying ssh connections at

1869

. Also, the ability to ssh into the target system is not enabled by

1870

default in \*-minimal images.

1871

1872

So essentially what you need to

1873

do is build an SDK image or image with 'tools-profile' as detailed in

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

1874

the ":ref:`profile-manual/intro:General Setup`" section of this

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1875

manual, and boot the resulting target image.

.. note::

If you have a build directory containing multiple machines, you need

1880

to have the MACHINE you're connecting to selected in local.conf, and

1881

the kernel in that machine's build directory must match the kernel on

1882

the booted system exactly, or you'll get the above 'crosstap' message

1883

when you try to invoke a script.

1884

1885

Running a Script on a Target

1886

----------------------------

1887

1888

Once you've done that, you should be able to run a systemtap script on

the target: ::

$ cd /path/to/yocto

$ source oe-init-build-env

1893

1894

### Shell environment set up for builds. ###

1895

1896

You can now run 'bitbake <target>'

Common targets are:

core-image-minimal

core-image-sato

meta-toolchain

meta-ide-support

You can also run generated qemu images with a command like 'runqemu qemux86-64'

1905

1906

Once you've done that, you can cd to whatever

1907

directory contains your scripts and use 'crosstap' to run the script: ::

1908

1909

$ cd /path/to/my/systemap/script

1910

$ crosstap root@192.168.7.2 trace_open.stp

1911

1912

If you get an error connecting to the target e.g.: ::

1913

1914

$ crosstap root@192.168.7.2 trace_open.stp

1915

error establishing ssh connection on remote 'root@192.168.7.2'

1916

1917

Try ssh'ing to the target and see what happens: ::

1918

1919

$ ssh root@192.168.7.2

1920

1921

A lot of the time, connection

1922

problems are due specifying a wrong IP address or having a 'host key

1923

verification error'.

1924

1925

If everything worked as planned, you should see something like this

1926

(enter the password when prompted, or press enter if it's set up to use

no password):

.. code-block:: none

$ crosstap root@192.168.7.2 trace_open.stp

1932

root@192.168.7.2's password:

1933

matchbox-termin(1036) open ("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)

1934

matchbox-termin(1036) open ("/tmp/vteJMC7LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)

1935

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1936

systemtap Documentation

1937

-----------------------

1938

1939

The SystemTap language reference can be found here: `SystemTap Language

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1940

Reference <http://sourceware.org/systemtap/langref/>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1941

1942

Links to other SystemTap documents, tutorials, and examples can be found

1943

here: `SystemTap documentation

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

1944

page <http://sourceware.org/systemtap/documentation.html>`__

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1945

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Sysprof

=======

Sysprof is a very easy to use system-wide profiler that consists of a

1950

single window with three panes and a few buttons which allow you to

1951

start, stop, and view the profile from one place.

1952

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Sysprof Setup

-------------

For this section, we'll assume you've already performed the basic setup

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

1957

outlined in the ":ref:`profile-manual/intro:General Setup`" section.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

1958

1959

Sysprof is a GUI-based application that runs on the target system. For

1960

the rest of this document we assume you've ssh'ed to the host and will

1961

be running Sysprof on the target (you can use the '-X' option to ssh and

1962

have the Sysprof GUI run on the target but display remotely on the host

1963

if you want).

1964

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Basic Sysprof Usage

-------------------

To start profiling the system, you simply press the 'Start' button. To

1969

stop profiling and to start viewing the profile data in one easy step,

1970

press the 'Profile' button.

1971

1972

Once you've pressed the profile button, the three panes will fill up

1973

with profiling data:

1974

1975

.. image:: figures/sysprof-copy-to-user.png

1976

:align: center

1977

1978

The left pane shows a list of functions and processes. Selecting one of

1979

those expands that function in the right pane, showing all its callees.

1980

Note that this caller-oriented display is essentially the inverse of

1981

perf's default callee-oriented callchain display.

1982

1983

In the screenshot above, we're focusing on ``__copy_to_user_ll()`` and

1984

looking up the callchain we can see that one of the callers of

1985

``__copy_to_user_ll`` is sys_read() and the complete callpath between them.

1986

Notice that this is essentially a portion of the same information we saw

1987

in the perf display shown in the perf section of this page.

1988

1989

.. image:: figures/sysprof-copy-from-user.png

1990

:align: center

1991

1992

Similarly, the above is a snapshot of the Sysprof display of a

1993

copy-from-user callchain.

1994

1995

Finally, looking at the third Sysprof pane in the lower left, we can see

1996

a list of all the callers of a particular function selected in the top

1997

left pane. In this case, the lower pane is showing all the callers of

1998

``__mark_inode_dirty``:

1999

2000

.. image:: figures/sysprof-callers.png

2001

:align: center

2002

2003

Double-clicking on one of those functions will in turn change the focus

2004

to the selected function, and so on.

2005

2006

.. admonition:: Tying it Together

2007

2008

If you like sysprof's 'caller-oriented' display, you may be able to

2009

approximate it in other tools as well. For example, 'perf report' has

2010

the -g (--call-graph) option that you can experiment with; one of the

2011

options is 'caller' for an inverted caller-based callgraph display.

2012

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2013

Sysprof Documentation

2014

---------------------

2015

2016

There doesn't seem to be any documentation for Sysprof, but maybe that's

2017

because it's pretty self-explanatory. The Sysprof website, however, is

2018

here: `Sysprof, System-wide Performance Profiler for

2019

Linux <http://sysprof.com/>`__

2020

2021

LTTng (Linux Trace Toolkit, next generation)

2022

============================================

2023

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

LTTng Setup

-----------

For this section, we'll assume you've already performed the basic setup

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2028

outlined in the ":ref:`profile-manual/intro:General Setup`" section.

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2029

LTTng is run on the target system by ssh'ing to it.

2030

2031

Collecting and Viewing Traces

2032

-----------------------------

2033

2034

Once you've applied the above commits and built and booted your image

2035

(you need to build the core-image-sato-sdk image or use one of the other

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2036

methods described in the ":ref:`profile-manual/intro:General Setup`" section), you're ready to start

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2037

tracing.

2038

2039

Collecting and viewing a trace on the target (inside a shell)

2040

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2041

2042

First, from the host, ssh to the target: ::

2043

2044

$ ssh -l root 192.168.1.47

2045

The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.

2046

RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.

2047

Are you sure you want to continue connecting (yes/no)? yes

2048

Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.

2049

root@192.168.1.47's password:

2050

2051

Once on the target, use these steps to create a trace: ::

2052

2053

root@crownbay:~# lttng create

2054

Spawning a session daemon

2055

Session auto-20121015-232120 created.

2056

Traces will be written in /home/root/lttng-traces/auto-20121015-232120

2057

2058

Enable the events you want to trace (in this case all kernel events): ::

2059

2060

root@crownbay:~# lttng enable-event --kernel --all

2061

All kernel events are enabled in channel channel0

Start the trace: ::

root@crownbay:~# lttng start

2066

Tracing started for session auto-20121015-232120

2067

2068

And then stop the trace after awhile or after running a particular workload that

2069

you want to trace: ::

2070

2071

root@crownbay:~# lttng stop

2072

Tracing stopped for session auto-20121015-232120

2073

2074

You can now view the trace in text form on the target: ::

2075

2076

root@crownbay:~# lttng view

2077

[23:21:56.989270399] (+?.?????????) sys_geteuid: { 1 }, { }

2078

[23:21:56.989278081] (+0.000007682) exit_syscall: { 1 }, { ret = 0 }

2079

[23:21:56.989286043] (+0.000007962) sys_pipe: { 1 }, { fildes = 0xB77B9E8C }

2080

[23:21:56.989321802] (+0.000035759) exit_syscall: { 1 }, { ret = 0 }

2081

[23:21:56.989329345] (+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot = 3, flags = 131362, fd = 4294967295, pgoff = 0 }

2082

[23:21:56.989351694] (+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 }

2083

[23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags = 0x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 }

2084

[23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388 }

2085

[23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm = "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 }

2086

[23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer = 3970832076, clockid = 1, mode = 1 }

2087

[23:21:56.989770462] (+0.000262044) hrtimer_cancel: { 1 }, { hrtimer = 3993865440 }

2088

[23:21:56.989771580] (+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 }

2089

[23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, { hrtimer = 3993865440, now = 79815980007057, function = 3238465232 }

2090

[23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, { hrtimer = 3993812192, now = 79815980008174, function = 3238465232 }

2091

[23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 }

2092

[23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 }

2093

[23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 }

2094

[23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098 }

2095

[23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime = 36976733240 }

2096

[23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: { 0 }, { hrtimer = 3993812192 }

2097

[23:21:56.989819631] (+0.000001117) hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 }

2098

[23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer = 3993812192, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }

2099

[23:21:56.989822984] (+0.000001118) hrtimer_start: { 1 }, { hrtimer = 3993865440, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }

2100

[23:21:56.989832762] (+0.000009778) softirq_entry: { 1 }, { vec = 1 }

2101

[23:21:56.989833879] (+0.000001117) softirq_entry: { 0 }, { vec = 1 }

2102

[23:21:56.989838069] (+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 }

2103

[23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer = 3993818708 }

2104

[23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1 }, { timer = 3993871956, now = 79515980, function = 3238277552 }

2105

[23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer = 3993818708, now = 79515980, function = 3238277552 }

2106

[23:21:56.989854831] (+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 49237, vruntime = 43368363335 }

2107

[23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 }

2108

[23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm = "kworker/1:1", tid = 21, delay = 9451318 }

2109

[23:21:56.989862374] (+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4, delay = 9958820 }

2110

[23:21:56.989868241] (+0.000005867) sched_wakeup: { 0 }, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu = 0 }

2111

[23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm = "kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 }

2112

[23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer = 3993871956 }

2113

[23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0 }, { timer = 3993818708 }

.

.

.

You can now safely destroy the trace

2119

session (note that this doesn't delete the trace - it's still there in

2120

~/lttng-traces): ::

2121

2122

root@crownbay:~# lttng destroy

2123

Session auto-20121015-232120 destroyed at /home/root

2124

2125

Note that the trace is saved in a directory of the same name as returned by

2126

'lttng create', under the ~/lttng-traces directory (note that you can change this by

2127

supplying your own name to 'lttng create'): ::

2128

2129

root@crownbay:~# ls -al ~/lttng-traces

2130

drwxrwx--- 3 root root 1024 Oct 15 23:21 .

2131

drwxr-xr-x 5 root root 1024 Oct 15 23:57 ..

2132

drwxrwx--- 3 root root 1024 Oct 15 23:21 auto-20121015-232120

2133

2134

Collecting and viewing a userspace trace on the target (inside a shell)

2135

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2136

2137

For LTTng userspace tracing, you need to have a properly instrumented

2138

userspace program. For this example, we'll use the 'hello' test program

2139

generated by the lttng-ust build.

2140

2141

The 'hello' test program isn't installed on the rootfs by the lttng-ust

2142

build, so we need to copy it over manually. First cd into the build

2143

directory that contains the hello executable: ::

2144

2145

$ cd build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs

2146

2147

Copy that over to the target machine: ::

2148

2149

$ scp hello root@192.168.1.20:

2150

2151

You now have the instrumented lttng 'hello world' test program on the

2152

target, ready to test.

2153

2154

First, from the host, ssh to the target: ::

2155

2156

$ ssh -l root 192.168.1.47

2157

The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.

2158

RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.

2159

Are you sure you want to continue connecting (yes/no)? yes

2160

Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.

2161

root@192.168.1.47's password:

2162

2163

Once on the target, use these steps to create a trace: ::

2164

2165

root@crownbay:~# lttng create

2166

Session auto-20190303-021943 created.

2167

Traces will be written in /home/root/lttng-traces/auto-20190303-021943

2168

2169

Enable the events you want to trace (in this case all userspace events): ::

2170

2171

root@crownbay:~# lttng enable-event --userspace --all

2172

All UST events are enabled in channel channel0

Start the trace: ::

root@crownbay:~# lttng start

2177

Tracing started for session auto-20190303-021943

2178

2179

Run the instrumented hello world program: ::

2180

2181

root@crownbay:~# ./hello

Hello, World!

Tracing... done.

And then stop the trace after awhile or after running a particular workload

2186

that you want to trace: ::

2187

2188

root@crownbay:~# lttng stop

2189

Tracing stopped for session auto-20190303-021943

2190

2191

You can now view the trace in text form on the target: ::

2192

2193

root@crownbay:~# lttng view

2194

[02:31:14.906146544] (+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0, netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2195

[02:31:14.906170360] (+0.000023816) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1, longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2196

[02:31:14.906183140] (+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2, netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2197

[02:31:14.906194385] (+0.000011245) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3, longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

.

.

.

You can now safely destroy the trace session (note that this doesn't delete the

2203

trace - it's still there in ~/lttng-traces): ::

2204

2205

root@crownbay:~# lttng destroy

2206

Session auto-20190303-021943 destroyed at /home/root

2207

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

LTTng Documentation

-------------------

You can find the primary LTTng Documentation on the `LTTng

2212

Documentation <https://lttng.org/docs/>`__ site. The documentation on

2213

this site is appropriate for intermediate to advanced software

2214

developers who are working in a Linux environment and are interested in

2215

efficient software tracing.

2216

2217

For information on LTTng in general, visit the `LTTng

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

2218

Project <http://lttng.org/lttng2.0>`__ site. You can find a "Getting

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2219

Started" link on this site that takes you to an LTTng Quick Start.

2220

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

blktrace

========

blktrace is a tool for tracing and reporting low-level disk I/O.

2225

blktrace provides the tracing half of the equation; its output can be

2226

piped into the blkparse program, which renders the data in a

2227

human-readable form and does some basic analysis:

2228

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

blktrace Setup

--------------

For this section, we'll assume you've already performed the basic setup

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2233

outlined in the ":ref:`profile-manual/intro:General Setup`"

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2234

section.

2235

2236

blktrace is an application that runs on the target system. You can run

2237

the entire blktrace and blkparse pipeline on the target, or you can run

2238

blktrace in 'listen' mode on the target and have blktrace and blkparse

2239

collect and analyze the data on the host (see the

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2240

":ref:`profile-manual/usage:Using blktrace Remotely`" section

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2241

below). For the rest of this section we assume you've ssh'ed to the host and

2242

will be running blkrace on the target.

2243

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Basic blktrace Usage

--------------------

To record a trace, simply run the 'blktrace' command, giving it the name

2248

of the block device you want to trace activity on: ::

2249

2250

root@crownbay:~# blktrace /dev/sdc

2251

2252

In another shell, execute a workload you want to trace. ::

2253

2254

root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2; sync

2255

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

2256

linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA

2257

2258

Press Ctrl-C in the blktrace shell to stop the trace. It

2259

will display how many events were logged, along with the per-cpu file

2260

sizes (blktrace records traces in per-cpu kernel buffers and simply

2261

dumps them to userspace for blkparse to merge and sort later). ::

2262

2263

^C=== sdc ===

2264

CPU 0: 7082 events, 332 KiB data

2265

CPU 1: 1578 events, 74 KiB data

2266

Total: 8660 events (dropped 0), 406 KiB data

2267

2268

If you examine the files saved to disk, you see multiple files, one per CPU and

2269

with the device name as the first part of the filename: ::

2270

2271

root@crownbay:~# ls -al

2272

drwxr-xr-x 6 root root 1024 Oct 27 22:39 .

2273

drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..

2274

-rw-r--r-- 1 root root 339938 Oct 27 22:40 sdc.blktrace.0

2275

-rw-r--r-- 1 root root 75753 Oct 27 22:40 sdc.blktrace.1

2276

2277

To view the trace events, simply invoke 'blkparse' in the directory

2278

containing the trace files, giving it the device name that forms the

2279

first part of the filenames: ::

2280

2281

root@crownbay:~# blkparse sdc

2282

2283

8,32 1 1 0.000000000 1225 Q WS 3417048 + 8 [jbd2/sdc-8]

2284

8,32 1 2 0.000025213 1225 G WS 3417048 + 8 [jbd2/sdc-8]

2285

8,32 1 3 0.000033384 1225 P N [jbd2/sdc-8]

2286

8,32 1 4 0.000043301 1225 I WS 3417048 + 8 [jbd2/sdc-8]

2287

8,32 1 0 0.000057270 0 m N cfq1225 insert_request

2288

8,32 1 0 0.000064813 0 m N cfq1225 add_to_rr

2289

8,32 1 5 0.000076336 1225 U N [jbd2/sdc-8] 1

2290

8,32 1 0 0.000088559 0 m N cfq workload slice:150

2291

8,32 1 0 0.000097359 0 m N cfq1225 set_active wl_prio:0 wl_type:1

2292

8,32 1 0 0.000104063 0 m N cfq1225 Not idling. st->count:1

2293

8,32 1 0 0.000112584 0 m N cfq1225 fifo= (null)

2294

8,32 1 0 0.000118730 0 m N cfq1225 dispatch_insert

2295

8,32 1 0 0.000127390 0 m N cfq1225 dispatched a request

2296

8,32 1 0 0.000133536 0 m N cfq1225 activate rq, drv=1

2297

8,32 1 6 0.000136889 1225 D WS 3417048 + 8 [jbd2/sdc-8]

2298

8,32 1 7 0.000360381 1225 Q WS 3417056 + 8 [jbd2/sdc-8]

2299

8,32 1 8 0.000377422 1225 G WS 3417056 + 8 [jbd2/sdc-8]

2300

8,32 1 9 0.000388876 1225 P N [jbd2/sdc-8]

2301

8,32 1 10 0.000397886 1225 Q WS 3417064 + 8 [jbd2/sdc-8]

2302

8,32 1 11 0.000404800 1225 M WS 3417064 + 8 [jbd2/sdc-8]

2303

8,32 1 12 0.000412343 1225 Q WS 3417072 + 8 [jbd2/sdc-8]

2304

8,32 1 13 0.000416533 1225 M WS 3417072 + 8 [jbd2/sdc-8]

2305

8,32 1 14 0.000422121 1225 Q WS 3417080 + 8 [jbd2/sdc-8]

2306

8,32 1 15 0.000425194 1225 M WS 3417080 + 8 [jbd2/sdc-8]

2307

8,32 1 16 0.000431968 1225 Q WS 3417088 + 8 [jbd2/sdc-8]

2308

8,32 1 17 0.000435251 1225 M WS 3417088 + 8 [jbd2/sdc-8]

2309

8,32 1 18 0.000440279 1225 Q WS 3417096 + 8 [jbd2/sdc-8]

2310

8,32 1 19 0.000443911 1225 M WS 3417096 + 8 [jbd2/sdc-8]

2311

8,32 1 20 0.000450336 1225 Q WS 3417104 + 8 [jbd2/sdc-8]

2312

8,32 1 21 0.000454038 1225 M WS 3417104 + 8 [jbd2/sdc-8]

2313

8,32 1 22 0.000462070 1225 Q WS 3417112 + 8 [jbd2/sdc-8]

2314

8,32 1 23 0.000465422 1225 M WS 3417112 + 8 [jbd2/sdc-8]

2315

8,32 1 24 0.000474222 1225 I WS 3417056 + 64 [jbd2/sdc-8]

2316

8,32 1 0 0.000483022 0 m N cfq1225 insert_request

2317

8,32 1 25 0.000489727 1225 U N [jbd2/sdc-8] 1

2318

8,32 1 0 0.000498457 0 m N cfq1225 Not idling. st->count:1

2319

8,32 1 0 0.000503765 0 m N cfq1225 dispatch_insert

2320

8,32 1 0 0.000512914 0 m N cfq1225 dispatched a request

2321

8,32 1 0 0.000518851 0 m N cfq1225 activate rq, drv=2

.

.

.

8,32 0 0 58.515006138 0 m N cfq3551 complete rqnoidle 1

2326

8,32 0 2024 58.516603269 3 C WS 3156992 + 16 [0]

2327

8,32 0 0 58.516626736 0 m N cfq3551 complete rqnoidle 1

2328

8,32 0 0 58.516634558 0 m N cfq3551 arm_idle: 8 group_idle: 0

2329

8,32 0 0 58.516636933 0 m N cfq schedule dispatch

2330

8,32 1 0 58.516971613 0 m N cfq3551 slice expired t=0

2331

8,32 1 0 58.516982089 0 m N cfq3551 sl_used=13 disp=6 charge=13 iops=0 sect=80

2332

8,32 1 0 58.516985511 0 m N cfq3551 del_from_rr

2333

8,32 1 0 58.516990819 0 m N cfq3551 put_queue

2334

2335

CPU0 (sdc):

2336

Reads Queued: 0, 0KiB Writes Queued: 331, 26,284KiB

2337

Read Dispatches: 0, 0KiB Write Dispatches: 485, 40,484KiB

2338

Reads Requeued: 0 Writes Requeued: 0

2339

Reads Completed: 0, 0KiB Writes Completed: 511, 41,000KiB

2340

Read Merges: 0, 0KiB Write Merges: 13, 160KiB

2341

Read depth: 0 Write depth: 2

2342

IO unplugs: 23 Timer unplugs: 0

2343

CPU1 (sdc):

2344

Reads Queued: 0, 0KiB Writes Queued: 249, 15,800KiB

2345

Read Dispatches: 0, 0KiB Write Dispatches: 42, 1,600KiB

2346

Reads Requeued: 0 Writes Requeued: 0

2347

Reads Completed: 0, 0KiB Writes Completed: 16, 1,084KiB

2348

Read Merges: 0, 0KiB Write Merges: 40, 276KiB

2349

Read depth: 0 Write depth: 2

2350

IO unplugs: 30 Timer unplugs: 1

2351

2352

Total (sdc):

2353

Reads Queued: 0, 0KiB Writes Queued: 580, 42,084KiB

2354

Read Dispatches: 0, 0KiB Write Dispatches: 527, 42,084KiB

2355

Reads Requeued: 0 Writes Requeued: 0

2356

Reads Completed: 0, 0KiB Writes Completed: 527, 42,084KiB

2357

Read Merges: 0, 0KiB Write Merges: 53, 436KiB

2358

IO unplugs: 53 Timer unplugs: 1

2359

2360

Throughput (R/W): 0KiB/s / 719KiB/s

2361

Events (sdc): 6,592 entries

2362

Skips: 0 forward (0 - 0.0%)

2363

Input file sdc.blktrace.0 added

2364

Input file sdc.blktrace.1 added

2365

2366

The report shows each event that was

2367

found in the blktrace data, along with a summary of the overall block

2368

I/O traffic during the run. You can look at the

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

2369

`blkparse <http://linux.die.net/man/1/blkparse>`__ manpage to learn the

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2370

meaning of each field displayed in the trace listing.

2371

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

Live Mode

~~~~~~~~~

blktrace and blkparse are designed from the ground up to be able to

2376

operate together in a 'pipe mode' where the stdout of blktrace can be

2377

fed directly into the stdin of blkparse: ::

2378

2379

root@crownbay:~# blktrace /dev/sdc -o - | blkparse -i -

2380

2381

This enables long-lived tracing sessions

2382

to run without writing anything to disk, and allows the user to look for

2383

certain conditions in the trace data in 'real-time' by viewing the trace

2384

output as it scrolls by on the screen or by passing it along to yet

2385

another program in the pipeline such as grep which can be used to

2386

identify and capture conditions of interest.

2387

2388

There's actually another blktrace command that implements the above

2389

pipeline as a single command, so the user doesn't have to bother typing

2390

in the above command sequence: ::

2391

2392

root@crownbay:~# btrace /dev/sdc

2393

2394

Using blktrace Remotely

2395

~~~~~~~~~~~~~~~~~~~~~~~

2396

2397

Because blktrace traces block I/O and at the same time normally writes

2398

its trace data to a block device, and in general because it's not really

2399

a great idea to make the device being traced the same as the device the

2400

tracer writes to, blktrace provides a way to trace without perturbing

2401

the traced device at all by providing native support for sending all

2402

trace data over the network.

2403

2404

To have blktrace operate in this mode, start blktrace on the target

2405

system being traced with the -l option, along with the device to trace: ::

2406

2407

root@crownbay:~# blktrace -l /dev/sdc

2408

server: waiting for connections...

2409

2410

On the host system, use the -h option to connect to the target system,

2411

also passing it the device to trace: ::

2412

2413

$ blktrace -d /dev/sdc -h 192.168.1.43

2414

blktrace: connecting to 192.168.1.43

2415

blktrace: connected!

2416

2417

On the target system, you should see this: ::

2418

2419

server: connection from 192.168.1.43

2420

2421

In another shell, execute a workload you want to trace. ::

2422

2423

root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2; sync

2424

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

2425

linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA

2426

2427

When it's done, do a Ctrl-C on the host system to stop the

trace: ::

^C=== sdc ===

CPU 0: 7691 events, 361 KiB data

2432

CPU 1: 4109 events, 193 KiB data

2433

Total: 11800 events (dropped 0), 554 KiB data

2434

2435

On the target system, you should also see a trace summary for the trace

2436

just ended: ::

2437

2438

server: end of run for 192.168.1.43:sdc

2439

=== sdc ===

2440

CPU 0: 7691 events, 361 KiB data

2441

CPU 1: 4109 events, 193 KiB data

2442

Total: 11800 events (dropped 0), 554 KiB data

2443

2444

The blktrace instance on the host will

2445

save the target output inside a hostname-timestamp directory: ::

2446

2447

$ ls -al

2448

drwxr-xr-x 10 root root 1024 Oct 28 02:40 .

2449

drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..

2450

drwxr-xr-x 2 root root 1024 Oct 28 02:40 192.168.1.43-2012-10-28-02:40:56

2451

2452

cd into that directory to see the output files: ::

2453

2454

$ ls -l

2455

-rw-r--r-- 1 root root 369193 Oct 28 02:44 sdc.blktrace.0

2456

-rw-r--r-- 1 root root 197278 Oct 28 02:44 sdc.blktrace.1

2457

2458

And run blkparse on the host system using the device name: ::

$ blkparse sdc

8,32 1 1 0.000000000 1263 Q RM 6016 + 8 [ls]

2463

8,32 1 0 0.000036038 0 m N cfq1263 alloced

2464

8,32 1 2 0.000039390 1263 G RM 6016 + 8 [ls]

2465

8,32 1 3 0.000049168 1263 I RM 6016 + 8 [ls]

2466

8,32 1 0 0.000056152 0 m N cfq1263 insert_request

2467

8,32 1 0 0.000061600 0 m N cfq1263 add_to_rr

2468

8,32 1 0 0.000075498 0 m N cfq workload slice:300

.

.

.

8,32 0 0 177.266385696 0 m N cfq1267 arm_idle: 8 group_idle: 0

2473

8,32 0 0 177.266388140 0 m N cfq schedule dispatch

2474

8,32 1 0 177.266679239 0 m N cfq1267 slice expired t=0

2475

8,32 1 0 177.266689297 0 m N cfq1267 sl_used=9 disp=6 charge=9 iops=0 sect=56

2476

8,32 1 0 177.266692649 0 m N cfq1267 del_from_rr

2477

8,32 1 0 177.266696560 0 m N cfq1267 put_queue

2478

2479

CPU0 (sdc):

2480

Reads Queued: 0, 0KiB Writes Queued: 270, 21,708KiB

2481

Read Dispatches: 59, 2,628KiB Write Dispatches: 495, 39,964KiB

2482

Reads Requeued: 0 Writes Requeued: 0

2483

Reads Completed: 90, 2,752KiB Writes Completed: 543, 41,596KiB

2484

Read Merges: 0, 0KiB Write Merges: 9, 344KiB

2485

Read depth: 2 Write depth: 2

2486

IO unplugs: 20 Timer unplugs: 1

2487

CPU1 (sdc):

2488

Reads Queued: 688, 2,752KiB Writes Queued: 381, 20,652KiB

2489

Read Dispatches: 31, 124KiB Write Dispatches: 59, 2,396KiB

2490

Reads Requeued: 0 Writes Requeued: 0

2491

Reads Completed: 0, 0KiB Writes Completed: 11, 764KiB

2492

Read Merges: 598, 2,392KiB Write Merges: 88, 448KiB

2493

Read depth: 2 Write depth: 2

2494

IO unplugs: 52 Timer unplugs: 0

2495

2496

Total (sdc):

2497

Reads Queued: 688, 2,752KiB Writes Queued: 651, 42,360KiB

2498

Read Dispatches: 90, 2,752KiB Write Dispatches: 554, 42,360KiB

2499

Reads Requeued: 0 Writes Requeued: 0

2500

Reads Completed: 90, 2,752KiB Writes Completed: 554, 42,360KiB

2501

Read Merges: 598, 2,392KiB Write Merges: 97, 792KiB

2502

IO unplugs: 72 Timer unplugs: 1

2503

2504

Throughput (R/W): 15KiB/s / 238KiB/s

2505

Events (sdc): 9,301 entries

2506

Skips: 0 forward (0 - 0.0%)

2507

2508

You should see the trace events and summary just as you would have if you'd run

2509

the same command on the target.

2510

2511

Tracing Block I/O via 'ftrace'

2512

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2513

2514

It's also possible to trace block I/O using only

Andrew Geissler

2020-12-13 08:44:15 -0600

[diff] [blame]

2515

:ref:`profile-manual/usage:The 'trace events' Subsystem`, which

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2516

can be useful for casual tracing if you don't want to bother dealing with the

2517

userspace tools.

2518

2519

To enable tracing for a given device, use /sys/block/xxx/trace/enable,

2520

where xxx is the device name. This for example enables tracing for

2521

/dev/sdc: ::

2522

2523

root@crownbay:/sys/kernel/debug/tracing# echo 1 > /sys/block/sdc/trace/enable

2524

2525

Once you've selected the device(s) you want

2526

to trace, selecting the 'blk' tracer will turn the blk tracer on: ::

2527

2528

root@crownbay:/sys/kernel/debug/tracing# cat available_tracers

2529

blk function_graph function nop

2530

2531

root@crownbay:/sys/kernel/debug/tracing# echo blk > current_tracer

2532

2533

Execute the workload you're interested in: ::

2534

2535

root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt

2536

2537

And look at the output (note here that we're using 'trace_pipe' instead of

2538

trace to capture this trace - this allows us to wait around on the pipe

2539

for data to appear): ::

2540

2541

root@crownbay:/sys/kernel/debug/tracing# cat trace_pipe

2542

cat-3587 [001] d..1 3023.276361: 8,32 Q R 1699848 + 8 [cat]

2543

cat-3587 [001] d..1 3023.276410: 8,32 m N cfq3587 alloced

2544

cat-3587 [001] d..1 3023.276415: 8,32 G R 1699848 + 8 [cat]

2545

cat-3587 [001] d..1 3023.276424: 8,32 P N [cat]

2546

cat-3587 [001] d..2 3023.276432: 8,32 I R 1699848 + 8 [cat]

2547

cat-3587 [001] d..1 3023.276439: 8,32 m N cfq3587 insert_request

2548

cat-3587 [001] d..1 3023.276445: 8,32 m N cfq3587 add_to_rr

2549

cat-3587 [001] d..2 3023.276454: 8,32 U N [cat] 1

2550

cat-3587 [001] d..1 3023.276464: 8,32 m N cfq workload slice:150

2551

cat-3587 [001] d..1 3023.276471: 8,32 m N cfq3587 set_active wl_prio:0 wl_type:2

2552

cat-3587 [001] d..1 3023.276478: 8,32 m N cfq3587 fifo= (null)

2553

cat-3587 [001] d..1 3023.276483: 8,32 m N cfq3587 dispatch_insert

2554

cat-3587 [001] d..1 3023.276490: 8,32 m N cfq3587 dispatched a request

2555

cat-3587 [001] d..1 3023.276497: 8,32 m N cfq3587 activate rq, drv=1

2556

cat-3587 [001] d..2 3023.276500: 8,32 D R 1699848 + 8 [cat]

2557

2558

And this turns off tracing for the specified device: ::

2559

2560

root@crownbay:/sys/kernel/debug/tracing# echo 0 > /sys/block/sdc/trace/enable

2561

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2562

blktrace Documentation

2563

----------------------

2564

2565

Online versions of the man pages for the commands discussed in this

2566

section can be found here:

2567

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

2568

- http://linux.die.net/man/8/blktrace

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2569

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

2570

- http://linux.die.net/man/1/blkparse

Andrew Geissler

2020-09-18 14:11:35 -0500

[diff] [blame]

2571

Andrew Geissler

2021-02-12 15:55:30 -0600

[diff] [blame^]

2572

- http://linux.die.net/man/8/btrace

Andrew Geissler