Blame - poky/documentation/profile-manual/profile-manual-usage.xml - openbmc/openbmc

wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

130

</literallayout>

131

The quickest and easiest way to get some basic overall data about

132

what's going on for a particular workload is to profile it using

133

'perf stat'. 'perf stat' basically profiles using a few default

134

counters and displays the summed counts at the end of the run:

135

136

root@crownbay:~# perf stat wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

137

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

138

linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA

139

140

Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':

141

142

4597.223902 task-clock # 0.077 CPUs utilized

143

23568 context-switches # 0.005 M/sec

144

68 CPU-migrations # 0.015 K/sec

145

241 page-faults # 0.052 K/sec

146

3045817293 cycles # 0.663 GHz

147

<not supported> stalled-cycles-frontend

148

<not supported> stalled-cycles-backend

149

858909167 instructions # 0.28 insns per cycle

150

165441165 branches # 35.987 M/sec

151

19550329 branch-misses # 11.82% of all branches

152

153

59.836627620 seconds time elapsed

154

</literallayout>

155

Many times such a simple-minded test doesn't yield much of

156

interest, but sometimes it does (see Real-world Yocto bug

157

(slow loop-mounted write speed)).

</para>

<para>

Also, note that 'perf stat' isn't restricted to a fixed set of

162

counters - basically any event listed in the output of 'perf list'

163

can be tallied by 'perf stat'. For example, suppose we wanted to

164

see a summary of all the events related to kernel memory

165

allocation/freeing along with cache hits and misses:

166

167

root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

168

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

169

linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA

170

171

Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':

172

173

5566 kmem:kmalloc

174

125517 kmem:kmem_cache_alloc

175

0 kmem:kmalloc_node

176

0 kmem:kmem_cache_alloc_node

177

34401 kmem:kfree

178

69920 kmem:kmem_cache_free

179

133 kmem:mm_page_free

180

41 kmem:mm_page_free_batched

181

11502 kmem:mm_page_alloc

182

11375 kmem:mm_page_alloc_zone_locked

183

0 kmem:mm_page_pcpu_drain

184

0 kmem:mm_page_alloc_extfrag

185

66848602 cache-references

186

2917740 cache-misses # 4.365 % of all cache refs

187

188

44.831023415 seconds time elapsed

189

</literallayout>

190

So 'perf stat' gives us a nice easy way to get a quick overview of

191

what might be happening for a set of events, but normally we'd

192

need a little more detail in order to understand what's going on

193

in a way that we can act on in a useful way.

</para>

<para>

To dive down into a next level of detail, we can use 'perf

198

record'/'perf report' which will collect profiling data and

199

present it to use using an interactive text-based UI (or

200

simply as text if we specify --stdio to 'perf report').

</para>

<para>

As our first attempt at profiling this workload, we'll simply

205

run 'perf record', handing it the workload we want to profile

206

(everything after 'perf record' and any perf options we hand

207

it - here none - will be executed in a new shell). perf collects

208

samples until the process exits and records them in a file named

209

'perf.data' in the current working directory.

210

211

root@crownbay:~# perf record wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

212

213

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

214

linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA

215

[ perf record: Woken up 1 times to write data ]

216

[ perf record: Captured and wrote 0.176 MB perf.data (~7700 samples) ]

217

</literallayout>

218

To see the results in a 'text-based UI' (tui), simply run

219

'perf report', which will read the perf.data file in the current

220

working directory and display the results in an interactive UI:

221

222

root@crownbay:~# perf report

</literallayout>

</para>

<para>

</para>

<para>

The above screenshot displays a 'flat' profile, one entry for

232

each 'bucket' corresponding to the functions that were profiled

233

during the profiling run, ordered from the most popular to the

234

least (perf has options to sort in various orders and keys as

235

well as display entries only above a certain threshold and so

236

on - see the perf documentation for details). Note that this

237

includes both userspace functions (entries containing a [.]) and

238

kernel functions accounted to the process (entries containing

239

a [k]). (perf has command-line modifiers that can be used to

240

restrict the profiling to kernel or userspace, among others).

</para>

<para>

Notice also that the above report shows an entry for 'busybox',

245

which is the executable that implements 'wget' in Yocto, but that

246

instead of a useful function name in that entry, it displays

247

a not-so-friendly hex value instead. The steps below will show

248

how to fix that problem.

</para>

<para>

Before we do that, however, let's try running a different profile,

253

one which shows something a little more interesting. The only

254

difference between the new profile and the previous one is that

255

we'll add the -g option, which will record not just the address

256

of a sampled function, but the entire callchain to the sampled

257

function as well:

258

259

root@crownbay:~# perf record -g wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

260

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

261

linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA

262

[ perf record: Woken up 3 times to write data ]

263

[ perf record: Captured and wrote 0.652 MB perf.data (~28476 samples) ]

264

265

266

root@crownbay:~# perf report

</literallayout>

</para>

<para>

</para>

<para>

Using the callgraph view, we can actually see not only which

276

functions took the most time, but we can also see a summary of

277

how those functions were called and learn something about how the

278

program interacts with the kernel in the process.

</para>

<para>

Notice that each entry in the above screenshot now contains a '+'

283

on the left-hand side. This means that we can expand the entry and

284

drill down into the callchains that feed into that entry.

285

Pressing 'enter' on any one of them will expand the callchain

286

(you can also press 'E' to expand them all at the same time or 'C'

287

to collapse them all).

</para>

<para>

In the screenshot above, we've toggled the __copy_to_user_ll()

292

entry and several subnodes all the way down. This lets us see

293

which callchains contributed to the profiled __copy_to_user_ll()

294

function which contributed 1.77% to the total profile.

</para>

<para>

As a bit of background explanation for these callchains, think

299

about what happens at a high level when you run wget to get a file

300

out on the network. Basically what happens is that the data comes

301

into the kernel via the network connection (socket) and is passed

302

to the userspace program 'wget' (which is actually a part of

303

busybox, but that's not important for now), which takes the buffers

304

the kernel passes to it and writes it to a disk file to save it.

</para>

<para>

The part of this process that we're looking at in the above call

309

stacks is the part where the kernel passes the data it's read from

310

the socket down to wget i.e. a copy-to-user.

</para>

<para>

Notice also that here there's also a case where the hex value

315

is displayed in the callstack, here in the expanded

316

sys_clock_gettime() function. Later we'll see it resolve to a

317

userspace function call in busybox.

</para>

<para>

</para>

<para>

The above screenshot shows the other half of the journey for the

326

data - from the wget program's userspace buffers to disk. To get

327

the buffers to disk, the wget program issues a write(2), which

328

does a copy-from-user to the kernel, which then takes care via

329

some circuitous path (probably also present somewhere in the

330

profile data), to get it safely to disk.

</para>

<para>

Now that we've seen the basic layout of the profile data and the

335

basics of how to extract useful information out of it, let's get

336

back to the task at hand and see if we can get some basic idea

337

about where the time is spent in the program we're profiling,

338

wget. Remember that wget is actually implemented as an applet

339

in busybox, so while the process name is 'wget', the executable

340

we're actually interested in is busybox. So let's expand the

341

first entry containing busybox:

</para>

<para>

</para>

<para>

Again, before we expanded we saw that the function was labeled

350

with a hex value instead of a symbol as with most of the kernel

351

entries. Expanding the busybox entry doesn't make it any better.

</para>

<para>

The problem is that perf can't find the symbol information for the

356

busybox binary, which is actually stripped out by the Yocto build

system.

</para>

<para>

Patrick Williams

c0f7c04

2017-02-23 20:41:17 -0600

[diff] [blame]

361

One way around that is to put the following in your

362

<filename>local.conf</filename> file when you build the image:

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

363

Patrick Williams

c0f7c04

2017-02-23 20:41:17 -0600

[diff] [blame]

364

<ulink url='&YOCTO_DOCS_REF_URL;#var-INHIBIT_PACKAGE_STRIP'>INHIBIT_PACKAGE_STRIP</ulink> = "1"

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

365

</literallayout>

366

However, we already have an image with the binaries stripped,

367

so what can we do to get perf to resolve the symbols? Basically

368

we need to install the debuginfo for the busybox package.

</para>

<para>

To generate the debug info for the packages in the image, we can

373

add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example:

374

375

EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs"

376

</literallayout>

377

Additionally, in order to generate the type of debuginfo that

Brad Bishop

1a4b7ee

2018-12-16 17:11:34 -0800

[diff] [blame^]

378

perf understands, we also need to set

379

<ulink url='&YOCTO_DOCS_REF_URL;#var-PACKAGE_DEBUG_SPLIT_STYLE'><filename>PACKAGE_DEBUG_SPLIT_STYLE</filename></ulink>

380

in the <filename>local.conf</filename> file:

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

381

382

PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory'

383

</literallayout>

384

Once we've done that, we can install the debuginfo for busybox.

385

The debug packages once built can be found in

386

build/tmp/deploy/rpm/* on the host system. Find the

387

busybox-dbg-...rpm file and copy it to the target. For example:

388

389

[trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2_32/busybox-dbg-1.20.2-r2.core2_32.rpm root@192.168.1.31:

390

root@192.168.1.31's password:

391

busybox-dbg-1.20.2-r2.core2_32.rpm 100% 1826KB 1.8MB/s 00:01

392

</literallayout>

393

Now install the debug rpm on the target:

394

395

root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2_32.rpm

396

</literallayout>

397

Now that the debuginfo is installed, we see that the busybox

398

entries now display their functions symbolically:

</para>

<para>

</para>

<para>

If we expand one of the entries and press 'enter' on a leaf node,

407

we're presented with a menu of actions we can take to get more

408

information related to that entry:

</para>

<para>

</para>

<para>

One of these actions allows us to show a view that displays a

417

busybox-centric view of the profiled functions (in this case we've

418

also expanded all the nodes using the 'E' key):

</para>

<para>

</para>

<para>

Finally, we can see that now that the busybox debuginfo is

427

installed, the previously unresolved symbol in the

428

sys_clock_gettime() entry mentioned previously is now resolved,

429

and shows that the sys_clock_gettime system call that was the

430

source of 6.75% of the copy-to-user overhead was initiated by

431

the handle_input() busybox function:

</para>

<para>

</para>

<para>

At the lowest level of detail, we can dive down to the assembly

440

level and see which instructions caused the most overhead in a

441

function. Pressing 'enter' on the 'udhcpc_main' function, we're

442

again presented with a menu:

</para>

<para>

</para>

<para>

Selecting 'Annotate udhcpc_main', we get a detailed listing of

451

percentages by instruction for the udhcpc_main function. From the

452

display, we can see that over 50% of the time spent in this

453

function is taken up by a couple tests and the move of a

454

constant (1) to a register:

</para>

<para>

</para>

<para>

As a segue into tracing, let's try another profile using a

463

different counter, something other than the default 'cycles'.

</para>

<para>

The tracing and profiling infrastructure in Linux has become

468

unified in a way that allows us to use the same tool with a

469

completely different set of counters, not just the standard

470

hardware counters that traditional tools have had to restrict

471

themselves to (of course the traditional tools can also make use

472

of the expanded possibilities now available to them, and in some

473

cases have, as mentioned previously).

</para>

<para>

We can get a list of the available events that can be used to

478

profile a workload via 'perf list':

479

480

root@crownbay:~# perf list

481

482

List of pre-defined events (to be used in -e):

483

cpu-cycles OR cycles [Hardware event]

484

stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]

485

stalled-cycles-backend OR idle-cycles-backend [Hardware event]

486

instructions [Hardware event]

487

cache-references [Hardware event]

488

cache-misses [Hardware event]

489

branch-instructions OR branches [Hardware event]

490

branch-misses [Hardware event]

491

bus-cycles [Hardware event]

492

ref-cycles [Hardware event]

493

494

cpu-clock [Software event]

495

task-clock [Software event]

496

page-faults OR faults [Software event]

497

minor-faults [Software event]

498

major-faults [Software event]

499

context-switches OR cs [Software event]

500

cpu-migrations OR migrations [Software event]

501

alignment-faults [Software event]

502

emulation-faults [Software event]

503

504

L1-dcache-loads [Hardware cache event]

505

L1-dcache-load-misses [Hardware cache event]

506

L1-dcache-prefetch-misses [Hardware cache event]

507

L1-icache-loads [Hardware cache event]

508

L1-icache-load-misses [Hardware cache event]

.

.

.

rNNN [Raw hardware event descriptor]

513

cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]

514

(see 'perf list --help' on how to encode it)

515

516

mem:<addr>[:access] [Hardware breakpoint]

517

518

sunrpc:rpc_call_status [Tracepoint event]

519

sunrpc:rpc_bind_status [Tracepoint event]

520

sunrpc:rpc_connect_status [Tracepoint event]

521

sunrpc:rpc_task_begin [Tracepoint event]

522

skb:kfree_skb [Tracepoint event]

523

skb:consume_skb [Tracepoint event]

524

skb:skb_copy_datagram_iovec [Tracepoint event]

525

net:net_dev_xmit [Tracepoint event]

526

net:net_dev_queue [Tracepoint event]

527

net:netif_receive_skb [Tracepoint event]

528

net:netif_rx [Tracepoint event]

529

napi:napi_poll [Tracepoint event]

530

sock:sock_rcvqueue_full [Tracepoint event]

531

sock:sock_exceed_buf_limit [Tracepoint event]

532

udp:udp_fail_queue_rcv_skb [Tracepoint event]

533

hda:hda_send_cmd [Tracepoint event]

534

hda:hda_get_response [Tracepoint event]

535

hda:hda_bus_reset [Tracepoint event]

536

scsi:scsi_dispatch_cmd_start [Tracepoint event]

537

scsi:scsi_dispatch_cmd_error [Tracepoint event]

538

scsi:scsi_eh_wakeup [Tracepoint event]

539

drm:drm_vblank_event [Tracepoint event]

540

drm:drm_vblank_event_queued [Tracepoint event]

541

drm:drm_vblank_event_delivered [Tracepoint event]

542

random:mix_pool_bytes [Tracepoint event]

543

random:mix_pool_bytes_nolock [Tracepoint event]

544

random:credit_entropy_bits [Tracepoint event]

545

gpio:gpio_direction [Tracepoint event]

546

gpio:gpio_value [Tracepoint event]

547

block:block_rq_abort [Tracepoint event]

548

block:block_rq_requeue [Tracepoint event]

549

block:block_rq_issue [Tracepoint event]

550

block:block_bio_bounce [Tracepoint event]

551

block:block_bio_complete [Tracepoint event]

552

block:block_bio_backmerge [Tracepoint event]

553

.

554

.

555

writeback:writeback_wake_thread [Tracepoint event]

556

writeback:writeback_wake_forker_thread [Tracepoint event]

557

writeback:writeback_bdi_register [Tracepoint event]

558

.

559

.

560

writeback:writeback_single_inode_requeue [Tracepoint event]

561

writeback:writeback_single_inode [Tracepoint event]

562

kmem:kmalloc [Tracepoint event]

563

kmem:kmem_cache_alloc [Tracepoint event]

564

kmem:mm_page_alloc [Tracepoint event]

565

kmem:mm_page_alloc_zone_locked [Tracepoint event]

566

kmem:mm_page_pcpu_drain [Tracepoint event]

567

kmem:mm_page_alloc_extfrag [Tracepoint event]

568

vmscan:mm_vmscan_kswapd_sleep [Tracepoint event]

569

vmscan:mm_vmscan_kswapd_wake [Tracepoint event]

570

vmscan:mm_vmscan_wakeup_kswapd [Tracepoint event]

571

vmscan:mm_vmscan_direct_reclaim_begin [Tracepoint event]

572

.

573

.

574

module:module_get [Tracepoint event]

575

module:module_put [Tracepoint event]

576

module:module_request [Tracepoint event]

577

sched:sched_kthread_stop [Tracepoint event]

578

sched:sched_wakeup [Tracepoint event]

579

sched:sched_wakeup_new [Tracepoint event]

580

sched:sched_process_fork [Tracepoint event]

581

sched:sched_process_exec [Tracepoint event]

582

sched:sched_stat_runtime [Tracepoint event]

583

rcu:rcu_utilization [Tracepoint event]

584

workqueue:workqueue_queue_work [Tracepoint event]

585

workqueue:workqueue_execute_end [Tracepoint event]

586

signal:signal_generate [Tracepoint event]

587

signal:signal_deliver [Tracepoint event]

588

timer:timer_init [Tracepoint event]

589

timer:timer_start [Tracepoint event]

590

timer:hrtimer_cancel [Tracepoint event]

591

timer:itimer_state [Tracepoint event]

592

timer:itimer_expire [Tracepoint event]

593

irq:irq_handler_entry [Tracepoint event]

594

irq:irq_handler_exit [Tracepoint event]

595

irq:softirq_entry [Tracepoint event]

596

irq:softirq_exit [Tracepoint event]

597

irq:softirq_raise [Tracepoint event]

598

printk:console [Tracepoint event]

599

task:task_newtask [Tracepoint event]

600

task:task_rename [Tracepoint event]

601

syscalls:sys_enter_socketcall [Tracepoint event]

602

syscalls:sys_exit_socketcall [Tracepoint event]

.

.

.

syscalls:sys_enter_unshare [Tracepoint event]

607

syscalls:sys_exit_unshare [Tracepoint event]

608

raw_syscalls:sys_enter [Tracepoint event]

609

raw_syscalls:sys_exit [Tracepoint event]

</literallayout>

</para>

<emphasis>Tying it Together:</emphasis> These are exactly the same set of events defined

615

by the trace event subsystem and exposed by

616

ftrace/tracecmd/kernelshark as files in

617

/sys/kernel/debug/tracing/events, by SystemTap as

618

kernel.trace("tracepoint_name") and (partially) accessed by LTTng.

</informalexample>

<para>

Only a subset of these would be of interest to us when looking at

623

this workload, so let's choose the most likely subsystems

624

(identified by the string before the colon in the Tracepoint events)

625

and do a 'perf stat' run using only those wildcarded subsystems:

626

627

root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

628

Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':

23323 skb:kfree_skb

0 skb:consume_skb

49897 skb:skb_copy_datagram_iovec

633

6217 net:net_dev_xmit

634

6217 net:net_dev_queue

635

7962 net:netif_receive_skb

636

2 net:netif_rx

637

8340 napi:napi_poll

638

0 sched:sched_kthread_stop

639

0 sched:sched_kthread_stop_ret

640

3749 sched:sched_wakeup

641

0 sched:sched_wakeup_new

642

0 sched:sched_switch

643

29 sched:sched_migrate_task

644

0 sched:sched_process_free

645

1 sched:sched_process_exit

646

0 sched:sched_wait_task

647

0 sched:sched_process_wait

648

0 sched:sched_process_fork

649

1 sched:sched_process_exec

650

0 sched:sched_stat_wait

651

2106519415641 sched:sched_stat_sleep

652

0 sched:sched_stat_iowait

653

147453613 sched:sched_stat_blocked

654

12903026955 sched:sched_stat_runtime

655

0 sched:sched_pi_setprio

656

3574 workqueue:workqueue_queue_work

657

3574 workqueue:workqueue_activate_work

658

0 workqueue:workqueue_execute_start

659

0 workqueue:workqueue_execute_end

660

16631 irq:irq_handler_entry

661

16631 irq:irq_handler_exit

662

28521 irq:softirq_entry

663

28521 irq:softirq_exit

664

28728 irq:softirq_raise

665

1 syscalls:sys_enter_sendmmsg

666

1 syscalls:sys_exit_sendmmsg

667

0 syscalls:sys_enter_recvmmsg

668

0 syscalls:sys_exit_recvmmsg

669

14 syscalls:sys_enter_socketcall

670

14 syscalls:sys_exit_socketcall

.

.

.

16965 syscalls:sys_enter_read

675

16965 syscalls:sys_exit_read

676

12854 syscalls:sys_enter_write

677

12854 syscalls:sys_exit_write

.

.

.

58.029710972 seconds time elapsed

683

</literallayout>

684

Let's pick one of these tracepoints and tell perf to do a profile

685

using it as the sampling event:

686

687

root@crownbay:~# perf record -g -e sched:sched_wakeup wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

</literallayout>

</para>

<para>

</para>

<para>

The screenshot above shows the results of running a profile using

697

sched:sched_switch tracepoint, which shows the relative costs of

698

various paths to sched_wakeup (note that sched_wakeup is the

699

name of the tracepoint - it's actually defined just inside

700

ttwu_do_wakeup(), which accounts for the function name actually

701

displayed in the profile:

702

703

/*

704

* Mark the task runnable and perform wakeup-preemption.

705

*/

706

static void

707

ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)

708

{

709

trace_sched_wakeup(p, true);

.

.

.

}

</literallayout>

A couple of the more interesting callchains are expanded and

716

displayed above, basically some network receive paths that

717

presumably end up waking up wget (busybox) when network data is

ready.

</para>

<para>

Note that because tracepoints are normally used for tracing,

723

the default sampling period for tracepoints is 1 i.e. for

724

tracepoints perf will sample on every event occurrence (this

725

can be changed using the -c option). This is in contrast to

726

hardware counters such as for example the default 'cycles'

727

hardware counter used for normal profiling, where sampling

728

periods are much higher (in the thousands) because profiling should

729

have as low an overhead as possible and sampling on every cycle

730

would be prohibitively expensive.

</para>

</section>

<title>Using perf to do Basic Tracing</title>

736

737

<para>

738

Profiling is a great tool for solving many problems or for

739

getting a high-level view of what's going on with a workload or

740

across the system. It is however by definition an approximation,

741

as suggested by the most prominent word associated with it,

742

'sampling'. On the one hand, it allows a representative picture of

743

what's going on in the system to be cheaply taken, but on the other

744

hand, that cheapness limits its utility when that data suggests a

745

need to 'dive down' more deeply to discover what's really going

746

on. In such cases, the only way to see what's really going on is

747

to be able to look at (or summarize more intelligently) the

748

individual steps that go into the higher-level behavior exposed

749

by the coarse-grained profiling data.

</para>

<para>

As a concrete example, we can trace all the events we think might

754

be applicable to our workload:

755

756

root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:*

757

-e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write

758

wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

759

</literallayout>

760

We can look at the raw trace output using 'perf script' with no

761

arguments:

762

763

root@crownbay:~# perf script

764

765

perf 1262 [000] 11624.857082: sys_exit_read: 0x0

766

perf 1262 [000] 11624.857193: sched_wakeup: comm=migration/0 pid=6 prio=0 success=1 target_cpu=000

767

wget 1262 [001] 11624.858021: softirq_raise: vec=1 [action=TIMER]

768

wget 1262 [001] 11624.858074: softirq_entry: vec=1 [action=TIMER]

769

wget 1262 [001] 11624.858081: softirq_exit: vec=1 [action=TIMER]

770

wget 1262 [001] 11624.858166: sys_enter_read: fd: 0x0003, buf: 0xbf82c940, count: 0x0200

771

wget 1262 [001] 11624.858177: sys_exit_read: 0x200

772

wget 1262 [001] 11624.858878: kfree_skb: skbaddr=0xeb248d80 protocol=0 location=0xc15a5308

773

wget 1262 [001] 11624.858945: kfree_skb: skbaddr=0xeb248000 protocol=0 location=0xc15a5308

774

wget 1262 [001] 11624.859020: softirq_raise: vec=1 [action=TIMER]

775

wget 1262 [001] 11624.859076: softirq_entry: vec=1 [action=TIMER]

776

wget 1262 [001] 11624.859083: softirq_exit: vec=1 [action=TIMER]

777

wget 1262 [001] 11624.859167: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400

778

wget 1262 [001] 11624.859192: sys_exit_read: 0x1d7

779

wget 1262 [001] 11624.859228: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400

780

wget 1262 [001] 11624.859233: sys_exit_read: 0x0

781

wget 1262 [001] 11624.859573: sys_enter_read: fd: 0x0003, buf: 0xbf82c580, count: 0x0200

782

wget 1262 [001] 11624.859584: sys_exit_read: 0x200

783

wget 1262 [001] 11624.859864: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400

784

wget 1262 [001] 11624.859888: sys_exit_read: 0x400

785

wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400

786

wget 1262 [001] 11624.859944: sys_exit_read: 0x400

787

</literallayout>

788

This gives us a detailed timestamped sequence of events that

789

occurred within the workload with respect to those events.

</para>

<para>

In many ways, profiling can be viewed as a subset of tracing -

794

theoretically, if you have a set of trace events that's sufficient

795

to capture all the important aspects of a workload, you can derive

796

any of the results or views that a profiling run can.

</para>

<para>

Another aspect of traditional profiling is that while powerful in

801

many ways, it's limited by the granularity of the underlying data.

802

Profiling tools offer various ways of sorting and presenting the

803

sample data, which make it much more useful and amenable to user

804

experimentation, but in the end it can't be used in an open-ended

805

way to extract data that just isn't present as a consequence of

806

the fact that conceptually, most of it has been thrown away.

</para>

<para>

Full-blown detailed tracing data does however offer the opportunity

811

to manipulate and present the information collected during a

812

tracing run in an infinite variety of ways.

</para>

<para>

Another way to look at it is that there are only so many ways that

817

the 'primitive' counters can be used on their own to generate

818

interesting output; to get anything more complicated than simple

819

counts requires some amount of additional logic, which is typically

820

very specific to the problem at hand. For example, if we wanted to

821

make use of a 'counter' that maps to the value of the time

822

difference between when a process was scheduled to run on a

823

processor and the time it actually ran, we wouldn't expect such

824

a counter to exist on its own, but we could derive one called say

825

'wakeup_latency' and use it to extract a useful view of that metric

826

from trace data. Likewise, we really can't figure out from standard

827

profiling tools how much data every process on the system reads and

828

writes, along with how many of those reads and writes fail

829

completely. If we have sufficient trace data, however, we could

830

with the right tools easily extract and present that information,

831

but we'd need something other than pre-canned profiling tools to

do that.

</para>

<para>

Luckily, there is a general-purpose way to handle such needs,

837

called 'programming languages'. Making programming languages

838

easily available to apply to such problems given the specific

839

format of data is called a 'programming language binding' for

840

that data and language. Perf supports two programming language

841

bindings, one for Python and one for Perl.

</para>

<emphasis>Tying it Together:</emphasis> Language bindings for manipulating and

846

aggregating trace data are of course not a new

847

idea. One of the first projects to do this was IBM's DProbes

848

dpcc compiler, an ANSI C compiler which targeted a low-level

849

assembly language running on an in-kernel interpreter on the

850

target system. This is exactly analogous to what Sun's DTrace

851

did, except that DTrace invented its own language for the purpose.

852

Systemtap, heavily inspired by DTrace, also created its own

853

one-off language, but rather than running the product on an

854

in-kernel interpreter, created an elaborate compiler-based

855

machinery to translate its language into kernel modules written

in C.

</informalexample>

<para>

Now that we have the trace data in perf.data, we can use

861

'perf script -g' to generate a skeleton script with handlers

862

for the read/write entry/exit events we recorded:

863

864

root@crownbay:~# perf script -g python

865

generated Python script: perf-script.py

866

</literallayout>

867

The skeleton script simply creates a python function for each

868

event type in the perf.data file. The body of each function simply

869

prints the event name along with its parameters. For example:

870

871

def net__netif_rx(event_name, context, common_cpu,

872

common_secs, common_nsecs, common_pid, common_comm,

873

skbaddr, len, name):

874

print_header(event_name, common_cpu, common_secs, common_nsecs,

875

common_pid, common_comm)

876

877

print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name),

878

</literallayout>

879

We can run that script directly to print all of the events

880

contained in the perf.data file:

881

882

root@crownbay:~# perf script -s perf-script.py

883

884

in trace_begin

885

syscalls__sys_exit_read 0 11624.857082795 1262 perf nr=3, ret=0

886

sched__sched_wakeup 0 11624.857193498 1262 perf comm=migration/0, pid=6, prio=0, success=1, target_cpu=0

887

irq__softirq_raise 1 11624.858021635 1262 wget vec=TIMER

888

irq__softirq_entry 1 11624.858074075 1262 wget vec=TIMER

889

irq__softirq_exit 1 11624.858081389 1262 wget vec=TIMER

890

syscalls__sys_enter_read 1 11624.858166434 1262 wget nr=3, fd=3, buf=3213019456, count=512

891

syscalls__sys_exit_read 1 11624.858177924 1262 wget nr=3, ret=512

892

skb__kfree_skb 1 11624.858878188 1262 wget skbaddr=3945041280, location=3243922184, protocol=0

893

skb__kfree_skb 1 11624.858945608 1262 wget skbaddr=3945037824, location=3243922184, protocol=0

894

irq__softirq_raise 1 11624.859020942 1262 wget vec=TIMER

895

irq__softirq_entry 1 11624.859076935 1262 wget vec=TIMER

896

irq__softirq_exit 1 11624.859083469 1262 wget vec=TIMER

897

syscalls__sys_enter_read 1 11624.859167565 1262 wget nr=3, fd=3, buf=3077701632, count=1024

898

syscalls__sys_exit_read 1 11624.859192533 1262 wget nr=3, ret=471

899

syscalls__sys_enter_read 1 11624.859228072 1262 wget nr=3, fd=3, buf=3077701632, count=1024

900

syscalls__sys_exit_read 1 11624.859233707 1262 wget nr=3, ret=0

901

syscalls__sys_enter_read 1 11624.859573008 1262 wget nr=3, fd=3, buf=3213018496, count=512

902

syscalls__sys_exit_read 1 11624.859584818 1262 wget nr=3, ret=512

903

syscalls__sys_enter_read 1 11624.859864562 1262 wget nr=3, fd=3, buf=3077701632, count=1024

904

syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024

905

syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024

906

syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024

907

</literallayout>

908

That in itself isn't very useful; after all, we can accomplish

909

pretty much the same thing by simply running 'perf script'

910

without arguments in the same directory as the perf.data file.

</para>

<para>

We can however replace the print statements in the generated

915

function bodies with whatever we want, and thereby make it

916

infinitely more useful.

</para>

<para>

As a simple example, let's just replace the print statements in

921

the function bodies with a simple function that does nothing but

922

increment a per-event count. When the program is run against a

923

perf.data file, each time a particular event is encountered,

924

a tally is incremented for that event. For example:

925

926

def net__netif_rx(event_name, context, common_cpu,

927

common_secs, common_nsecs, common_pid, common_comm,

928

skbaddr, len, name):

929

inc_counts(event_name)

930

</literallayout>

931

Each event handler function in the generated code is modified

932

to do this. For convenience, we define a common function called

933

inc_counts() that each handler calls; inc_counts() simply tallies

934

a count for each event using the 'counts' hash, which is a

935

specialized hash function that does Perl-like autovivification, a

936

capability that's extremely useful for kinds of multi-level

937

aggregation commonly used in processing traces (see perf's

938

documentation on the Python language binding for details):

counts = autodict()

def inc_counts(event_name):

943

try:

944

counts[event_name] += 1

945

except TypeError:

946

counts[event_name] = 1

947

</literallayout>

948

Finally, at the end of the trace processing run, we want to

949

print the result of all the per-event tallies. For that, we

950

use the special 'trace_end()' function:

951

952

def trace_end():

953

for event_name, count in counts.iteritems():

954

print "%-40s %10s\n" % (event_name, count)

955

</literallayout>

956

The end result is a summary of all the events recorded in the

957

trace:

958

959

skb__skb_copy_datagram_iovec 13148

960

irq__softirq_entry 4796

961

irq__irq_handler_exit 3805

962

irq__softirq_exit 4795

963

syscalls__sys_enter_write 8990

964

net__net_dev_xmit 652

965

skb__kfree_skb 4047

966

sched__sched_wakeup 1155

967

irq__irq_handler_entry 3804

968

irq__softirq_raise 4799

969

net__net_dev_queue 652

970

syscalls__sys_enter_read 17599

971

net__netif_receive_skb 1743

972

syscalls__sys_exit_read 17598

973

net__netif_rx 2

974

napi__napi_poll 1877

975

syscalls__sys_exit_write 8990

976

</literallayout>

977

Note that this is pretty much exactly the same information we get

978

from 'perf stat', which goes a little way to support the idea

979

mentioned previously that given the right kind of trace data,

980

higher-level profiling-type summaries can be derived from it.

</para>

<para>

Documentation on using the

985

<ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>.

</para>

</section>

<title>System-Wide Tracing and Profiling</title>

991

992

<para>

993

The examples so far have focused on tracing a particular program or

994

workload - in other words, every profiling run has specified the

995

program to profile in the command-line e.g. 'perf record wget ...'.

</para>

<para>

It's also possible, and more interesting in many cases, to run a

1000

system-wide profile or trace while running the workload in a

separate shell.

</para>

<para>

To do system-wide profiling or tracing, you typically use

1006

the -a flag to 'perf record'.

</para>

<para>

To demonstrate this, open up one window and start the profile

1011

using the -a flag (press Ctrl-C to stop tracing):

1012

1013

root@crownbay:~# perf record -g -a

1014

^C[ perf record: Woken up 6 times to write data ]

1015

[ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ]

1016

</literallayout>

1017

In another window, run the wget test:

1018

1019

root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

1020

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

1021

linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA

1022

</literallayout>

1023

Here we see entries not only for our wget load, but for other

1024

processes running on the system as well:

</para>

<para>

</para>

<para>

In the snapshot above, we can see callchains that originate in

1033

libc, and a callchain from Xorg that demonstrates that we're

1034

using a proprietary X driver in userspace (notice the presence

1035

of 'PVR' and some other unresolvable symbols in the expanded

Xorg callchain).

</para>

<para>

Note also that we have both kernel and userspace entries in the

1041

above snapshot. We can also tell perf to focus on userspace but

1042

providing a modifier, in this case 'u', to the 'cycles' hardware

1043

counter when we record a profile:

1044

1045

root@crownbay:~# perf record -g -a -e cycles:u

1046

^C[ perf record: Woken up 2 times to write data ]

1047

[ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ]

</literallayout>

</para>

<para>

</para>

<para>

Notice in the screenshot above, we see only userspace entries ([.])

</para>

<para>

Finally, we can press 'enter' on a leaf node and select the 'Zoom

1061

into DSO' menu item to show only entries associated with a

1062

specific DSO. In the screenshot below, we've zoomed into the

1063

'libc' DSO which shows all the entries associated with the

libc-xxx.so DSO.

</para>

<para>

</para>

<para>

We can also use the system-wide -a switch to do system-wide

1073

tracing. Here we'll trace a couple of scheduler events:

1074

1075

root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup

1076

^C[ perf record: Woken up 38 times to write data ]

1077

[ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ]

1078

</literallayout>

1079

We can look at the raw output using 'perf script' with no

1080

arguments:

1081

1082

root@crownbay:~# perf script

1083

1084

perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1085

perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

1086

kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120

1087

swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000

1088

swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

1089

kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

1090

perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1091

perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

1092

kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120

1093

perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

</literallayout>

</para>

<title>Filtering</title>

1099

1100

<para>

1101

Notice that there are a lot of events that don't really have

1102

anything to do with what we're interested in, namely events

1103

that schedule 'perf' itself in and out or that wake perf up.

1104

We can get rid of those by using the '--filter' option -

1105

for each event we specify using -e, we can add a --filter

1106

after that to filter out trace events that contain fields

1107

with specific values:

1108

1109

root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf'

1110

^C[ perf record: Woken up 38 times to write data ]

1111

[ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ]

1112

1113

1114

root@crownbay:~# perf script

1115

1116

swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

1117

kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

1118

perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1119

perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1120

perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1121

perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1122

perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1123

perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1124

swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1125

swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

1126

kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120

1127

swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000

1128

swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

1129

kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

1130

</literallayout>

1131

In this case, we've filtered out all events that have 'perf'

1132

in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice

1133

that there are still events recorded for perf, but notice

1134

that those events don't have values of 'perf' for the filtered

1135

fields. To completely filter out anything from perf will

1136

require a bit more work, but for the purpose of demonstrating

1137

how to use filters, it's close enough.

</para>

<emphasis>Tying it Together:</emphasis> These are exactly the same set of event

1142

filters defined by the trace event subsystem. See the

1143

ftrace/tracecmd/kernelshark section for more discussion about

these event filters.

</informalexample>

<emphasis>Tying it Together:</emphasis> These event filters are implemented by a

1149

special-purpose pseudo-interpreter in the kernel and are an

1150

integral and indispensable part of the perf design as it

1151

relates to tracing. kernel-based event filters provide a

1152

mechanism to precisely throttle the event stream that appears

1153

in user space, where it makes sense to provide bindings to real

1154

programming languages for postprocessing the event stream.

1155

This architecture allows for the intelligent and flexible

1156

partitioning of processing between the kernel and user space.

1157

Contrast this with other tools such as SystemTap, which does

1158

all of its processing in the kernel and as such requires a

1159

special project-defined language in order to accommodate that

1160

design, or LTTng, where everything is sent to userspace and

1161

as such requires a super-efficient kernel-to-userspace

1162

transport mechanism in order to function properly. While

1163

perf certainly can benefit from for instance advances in

1164

the design of the transport, it doesn't fundamentally depend

1165

on them. Basically, if you find that your perf tracing

1166

application is causing buffer I/O overruns, it probably

1167

means that you aren't taking enough advantage of the

1168

kernel filtering engine.

</informalexample>

</section>

</section>

<title>Using Dynamic Tracepoints</title>

1175

1176

<para>

1177

perf isn't restricted to the fixed set of static tracepoints

1178

listed by 'perf list'. Users can also add their own 'dynamic'

1179

tracepoints anywhere in the kernel. For instance, suppose we

1180

want to define our own tracepoint on do_fork(). We can do that

1181

using the 'perf probe' perf subcommand:

1182

1183

root@crownbay:~# perf probe do_fork

1184

Added new event:

1185

probe:do_fork (on do_fork)

1186

1187

You can now use it in all perf tools, such as:

1188

1189

perf record -e probe:do_fork -aR sleep 1

1190

</literallayout>

1191

Adding a new tracepoint via 'perf probe' results in an event

1192

with all the expected files and format in

1193

/sys/kernel/debug/tracing/events, just the same as for static

1194

tracepoints (as discussed in more detail in the trace events

1195

subsystem section:

1196

1197

root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al

1198

drwxr-xr-x 2 root root 0 Oct 28 11:42 .

1199

drwxr-xr-x 3 root root 0 Oct 28 11:42 ..

1200

-rw-r--r-- 1 root root 0 Oct 28 11:42 enable

1201

-rw-r--r-- 1 root root 0 Oct 28 11:42 filter

1202

-r--r--r-- 1 root root 0 Oct 28 11:42 format

1203

-r--r--r-- 1 root root 0 Oct 28 11:42 id

1204

1205

root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format

name: do_fork

ID: 944

format:

field:unsigned short common_type; offset:0; size:2; signed:0;

1210

field:unsigned char common_flags; offset:2; size:1; signed:0;

1211

field:unsigned char common_preempt_count; offset:3; size:1; signed:0;

1212

field:int common_pid; offset:4; size:4; signed:1;

1213

field:int common_padding; offset:8; size:4; signed:1;

1214

1215

field:unsigned long __probe_ip; offset:12; size:4; signed:0;

1216

1217

print fmt: "(%lx)", REC->__probe_ip

1218

</literallayout>

1219

We can list all dynamic tracepoints currently in existence:

1220

1221

root@crownbay:~# perf probe -l

1222

probe:do_fork (on do_fork)

1223

probe:schedule (on schedule)

1224

</literallayout>

1225

Let's record system-wide ('sleep 30' is a trick for recording

1226

system-wide but basically do nothing and then wake up after

1227

30 seconds):

1228

1229

root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30

1230

[ perf record: Woken up 1 times to write data ]

1231

[ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ]

1232

</literallayout>

1233

Using 'perf script' we can see each do_fork event that fired:

1234

1235

root@crownbay:~# perf script

1236

1237

# ========

1238

# captured on: Sun Oct 28 11:55:18 2012

1239

# hostname : crownbay

1240

# os release : 3.4.11-yocto-standard

1241

# perf version : 3.4.11

# arch : i686

# nrcpus online : 2

# nrcpus avail : 2

# cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz

1246

# cpuid : GenuineIntel,6,38,1

1247

# total memory : 1017184 kB

1248

# cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30

1249

# event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern

1250

= 0, id = { 5, 6 }

1251

# HEADER_CPU_TOPOLOGY info available, use -I to display

1252

# ========

1253

#

1254

matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460)

1255

matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460)

1256

pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460)

1257

pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460)

1258

matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460)

1259

matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460)

1260

gthumb 1300 [001] 34217.697451: do_fork: (c1028460)

1261

gthumb 1300 [001] 34219.085734: do_fork: (c1028460)

1262

gthumb 1300 [000] 34219.121351: do_fork: (c1028460)

1263

gthumb 1300 [001] 34219.264551: do_fork: (c1028460)

1264

pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460)

1265

matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460)

1266

matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460)

1267

matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460)

1268

matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460)

1269

matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460)

1270

matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460)

1271

matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460)

1272

gaku 1312 [000] 34237.202388: do_fork: (c1028460)

1273

</literallayout>

1274

And using 'perf report' on the same file, we can see the

1275

callgraphs from starting a few programs during those 30 seconds:

</para>

<para>

</para>

<emphasis>Tying it Together:</emphasis> The trace events subsystem accommodate static

1284

and dynamic tracepoints in exactly the same way - there's no

1285

difference as far as the infrastructure is concerned. See the

1286

ftrace section for more details on the trace event subsystem.

</informalexample>

<emphasis>Tying it Together:</emphasis> Dynamic tracepoints are implemented under the

1291

covers by kprobes and uprobes. kprobes and uprobes are also used

1292

by and in fact are the main focus of SystemTap.

</informalexample>

</section>

</section>

<title>Documentation</title>

1299

1300

<para>

1301

Online versions of the man pages for the commands discussed in this

1302

section can be found here:

1303

1304

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-stat'>'perf stat' manpage</ulink>.

1305

</para></listitem>

1306

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-record'>'perf record' manpage</ulink>.

1307

</para></listitem>

1308

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-report'>'perf report' manpage</ulink>.

1309

</para></listitem>

1310

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-probe'>'perf probe' manpage</ulink>.

1311

</para></listitem>

1312

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-script'>'perf script' manpage</ulink>.

1313

</para></listitem>

1314

<listitem><para>Documentation on using the

1315

<ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>.

1316

</para></listitem>

1317

<listitem><para>The top-level

1318

<ulink url='http://linux.die.net/man/1/perf'>perf(1) manpage</ulink>.

</para></listitem>

</itemizedlist>

</para>

<para>

Normally, you should be able to invoke the man pages via perf

1325

itself e.g. 'perf help' or 'perf help record'.

</para>

<para>

However, by default Yocto doesn't install man pages, but perf

1330

invokes the man pages for most help functionality. This is a bug

1331

and is being addressed by a Yocto bug:

1332

<ulink url='https://bugzilla.yoctoproject.org/show_bug.cgi?id=3388'>Bug 3388 - perf: enable man pages for basic 'help' functionality</ulink>.

</para>

<para>

The man pages in text form, along with some other files, such as

1337

a set of examples, can be found in the 'perf' directory of the

1338

kernel tree:

1339

1340

tools/perf/Documentation

1341

</literallayout>

1342

There's also a nice perf tutorial on the perf wiki that goes

1343

into more detail than we do here in certain areas:

1344

<ulink url='https://perf.wiki.kernel.org/index.php/Tutorial'>Perf Tutorial</ulink>

</para>

</section>

</section>

<title>ftrace</title>

1351

1352

<para>

1353

'ftrace' literally refers to the 'ftrace function tracer' but in

1354

reality this encompasses a number of related tracers along with

1355

the infrastructure that they all make use of.

</para>

<title>Setup</title>

<para>

For this section, we'll assume you've already performed the basic

1363

setup outlined in the General Setup section.

</para>

<para>

ftrace, trace-cmd, and kernelshark run on the target system,

1368

and are ready to go out-of-the-box - no additional setup is

1369

necessary. For the rest of this section we assume you've ssh'ed

1370

to the host and will be running ftrace on the target. kernelshark

1371

is a GUI application and if you use the '-X' option to ssh you

1372

can have the kernelshark GUI run on the target but display

1373

remotely on the host if you want.

</para>

</section>

<title>Basic ftrace usage</title>

1379

1380

<para>

1381

'ftrace' essentially refers to everything included in

1382

the /tracing directory of the mounted debugfs filesystem

1383

(Yocto follows the standard convention and mounts it

1384

at /sys/kernel/debug). Here's a listing of all the files

1385

found in /sys/kernel/debug/tracing on a Yocto system:

1386

1387

root@sugarbay:/sys/kernel/debug/tracing# ls

1388

README kprobe_events trace

1389

available_events kprobe_profile trace_clock

1390

available_filter_functions options trace_marker

1391

available_tracers per_cpu trace_options

1392

buffer_size_kb printk_formats trace_pipe

1393

buffer_total_size_kb saved_cmdlines tracing_cpumask

1394

current_tracer set_event tracing_enabled

1395

dyn_ftrace_total_info set_ftrace_filter tracing_on

1396

enabled_functions set_ftrace_notrace tracing_thresh

1397

events set_ftrace_pid

1398

free_buffer set_graph_function

1399

</literallayout>

1400

The files listed above are used for various purposes -

1401

some relate directly to the tracers themselves, others are

1402

used to set tracing options, and yet others actually contain

1403

the tracing output when a tracer is in effect. Some of the

1404

functions can be guessed from their names, others need

1405

explanation; in any case, we'll cover some of the files we

1406

see here below but for an explanation of the others, please

1407

see the ftrace documentation.

</para>

<para>

We'll start by looking at some of the available built-in

tracers.

</para>

<para>

cat'ing the 'available_tracers' file lists the set of

1417

available tracers:

1418

1419

root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers

1420

blk function_graph function nop

1421

</literallayout>

1422

The 'current_tracer' file contains the tracer currently in

1423

effect:

1424

1425

root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer

1426

nop

1427

</literallayout>

1428

The above listing of current_tracer shows that

1429

the 'nop' tracer is in effect, which is just another

1430

way of saying that there's actually no tracer

currently in effect.

</para>

<para>

echo'ing one of the available_tracers into current_tracer

1436

makes the specified tracer the current tracer:

1437

1438

root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer

1439

root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer

1440

function

1441

</literallayout>

1442

The above sets the current tracer to be the

1443

'function tracer'. This tracer traces every function

1444

call in the kernel and makes it available as the

1445

contents of the 'trace' file. Reading the 'trace' file

1446

lists the currently buffered function calls that have been

1447

traced by the function tracer:

1448

1449

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

# tracer: function

#

# entries-in-buffer/entries-written: 310629/766471 #P:8

1454

#

1455

# _-----=> irqs-off

1456

# / _----=> need-resched

1457

# | / _---=> hardirq/softirq

1458

# || / _--=> preempt-depth

1459

# ||| / delay

1460

# TASK-PID CPU# |||| TIMESTAMP FUNCTION

1461

# | | | |||| | |

1462

<idle>-0 [004] d..1 470.867169: ktime_get_real <-intel_idle

1463

<idle>-0 [004] d..1 470.867170: getnstimeofday <-ktime_get_real

1464

<idle>-0 [004] d..1 470.867171: ns_to_timeval <-intel_idle

1465

<idle>-0 [004] d..1 470.867171: ns_to_timespec <-ns_to_timeval

1466

<idle>-0 [004] d..1 470.867172: smp_apic_timer_interrupt <-apic_timer_interrupt

1467

<idle>-0 [004] d..1 470.867172: native_apic_mem_write <-smp_apic_timer_interrupt

1468

<idle>-0 [004] d..1 470.867172: irq_enter <-smp_apic_timer_interrupt

1469

<idle>-0 [004] d..1 470.867172: rcu_irq_enter <-irq_enter

1470

<idle>-0 [004] d..1 470.867173: rcu_idle_exit_common.isra.33 <-rcu_irq_enter

1471

<idle>-0 [004] d..1 470.867173: local_bh_disable <-irq_enter

1472

<idle>-0 [004] d..1 470.867173: add_preempt_count <-local_bh_disable

1473

<idle>-0 [004] d.s1 470.867174: tick_check_idle <-irq_enter

1474

<idle>-0 [004] d.s1 470.867174: tick_check_oneshot_broadcast <-tick_check_idle

1475

<idle>-0 [004] d.s1 470.867174: ktime_get <-tick_check_idle

1476

<idle>-0 [004] d.s1 470.867174: tick_nohz_stop_idle <-tick_check_idle

1477

<idle>-0 [004] d.s1 470.867175: update_ts_time_stats <-tick_nohz_stop_idle

1478

<idle>-0 [004] d.s1 470.867175: nr_iowait_cpu <-update_ts_time_stats

1479

<idle>-0 [004] d.s1 470.867175: tick_do_update_jiffies64 <-tick_check_idle

1480

<idle>-0 [004] d.s1 470.867175: _raw_spin_lock <-tick_do_update_jiffies64

1481

<idle>-0 [004] d.s1 470.867176: add_preempt_count <-_raw_spin_lock

1482

<idle>-0 [004] d.s2 470.867176: do_timer <-tick_do_update_jiffies64

1483

<idle>-0 [004] d.s2 470.867176: _raw_spin_lock <-do_timer

1484

<idle>-0 [004] d.s2 470.867176: add_preempt_count <-_raw_spin_lock

1485

<idle>-0 [004] d.s3 470.867177: ntp_tick_length <-do_timer

1486

<idle>-0 [004] d.s3 470.867177: _raw_spin_lock_irqsave <-ntp_tick_length

.

.

.

</literallayout>

Each line in the trace above shows what was happening in

1492

the kernel on a given cpu, to the level of detail of

1493

function calls. Each entry shows the function called,

1494

followed by its caller (after the arrow).

</para>

<para>

The function tracer gives you an extremely detailed idea

1499

of what the kernel was doing at the point in time the trace

1500

was taken, and is a great way to learn about how the kernel

1501

code works in a dynamic sense.

</para>

<emphasis>Tying it Together:</emphasis> The ftrace function tracer is also

1506

available from within perf, as the ftrace:function tracepoint.

</informalexample>

<para>

It is a little more difficult to follow the call chains than

1511

it needs to be - luckily there's a variant of the function

1512

tracer that displays the callchains explicitly, called the

1513

'function_graph' tracer:

1514

1515

root@sugarbay:/sys/kernel/debug/tracing# echo function_graph > current_tracer

1516

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

1517

1518

tracer: function_graph

1519

1520

CPU DURATION FUNCTION CALLS

1521

| | | | | | |

1522

7) 0.046 us | pick_next_task_fair();

1523

7) 0.043 us | pick_next_task_stop();

1524

7) 0.042 us | pick_next_task_rt();

1525

7) 0.032 us | pick_next_task_fair();

1526

7) 0.030 us | pick_next_task_idle();

1527

7) | _raw_spin_unlock_irq() {

1528

7) 0.033 us | sub_preempt_count();

1529

7) 0.258 us | }

1530

7) 0.032 us | sub_preempt_count();

1531

7) + 13.341 us | } /* __schedule */

1532

7) 0.095 us | } /* sub_preempt_count */

1533

7) | schedule() {

1534

7) | __schedule() {

1535

7) 0.060 us | add_preempt_count();

1536

7) 0.044 us | rcu_note_context_switch();

1537

7) | _raw_spin_lock_irq() {

1538

7) 0.033 us | add_preempt_count();

1539

7) 0.247 us | }

1540

7) | idle_balance() {

1541

7) | _raw_spin_unlock() {

1542

7) 0.031 us | sub_preempt_count();

1543

7) 0.246 us | }

1544

7) | update_shares() {

1545

7) 0.030 us | __rcu_read_lock();

1546

7) 0.029 us | __rcu_read_unlock();

1547

7) 0.484 us | }

1548

7) 0.030 us | __rcu_read_lock();

1549

7) | load_balance() {

1550

7) | find_busiest_group() {

1551

7) 0.031 us | idle_cpu();

1552

7) 0.029 us | idle_cpu();

1553

7) 0.035 us | idle_cpu();

1554

7) 0.906 us | }

1555

7) 1.141 us | }

1556

7) 0.022 us | msecs_to_jiffies();

1557

7) | load_balance() {

1558

7) | find_busiest_group() {

1559

7) 0.031 us | idle_cpu();

.

.

.

4) 0.062 us | msecs_to_jiffies();

1564

4) 0.062 us | __rcu_read_unlock();

1565

4) | _raw_spin_lock() {

1566

4) 0.073 us | add_preempt_count();

1567

4) 0.562 us | }

1568

4) + 17.452 us | }

1569

4) 0.108 us | put_prev_task_fair();

1570

4) 0.102 us | pick_next_task_fair();

1571

4) 0.084 us | pick_next_task_stop();

1572

4) 0.075 us | pick_next_task_rt();

1573

4) 0.062 us | pick_next_task_fair();

1574

4) 0.066 us | pick_next_task_idle();

1575

------------------------------------------

1576

4) kworker-74 => <idle>-0

1577

------------------------------------------

1578

1579

4) | finish_task_switch() {

1580

4) | _raw_spin_unlock_irq() {

1581

4) 0.100 us | sub_preempt_count();

1582

4) 0.582 us | }

1583

4) 1.105 us | }

1584

4) 0.088 us | sub_preempt_count();

4) ! 100.066 us | }

.

.

.

3) | sys_ioctl() {

3) 0.083 us | fget_light();

1591

3) | security_file_ioctl() {

1592

3) 0.066 us | cap_file_ioctl();

1593

3) 0.562 us | }

1594

3) | do_vfs_ioctl() {

1595

3) | drm_ioctl() {

1596

3) 0.075 us | drm_ut_debug_printk();

1597

3) | i915_gem_pwrite_ioctl() {

1598

3) | i915_mutex_lock_interruptible() {

1599

3) 0.070 us | mutex_lock_interruptible();

1600

3) 0.570 us | }

1601

3) | drm_gem_object_lookup() {

1602

3) | _raw_spin_lock() {

1603

3) 0.080 us | add_preempt_count();

1604

3) 0.620 us | }

1605

3) | _raw_spin_unlock() {

1606

3) 0.085 us | sub_preempt_count();

1607

3) 0.562 us | }

1608

3) 2.149 us | }

1609

3) 0.133 us | i915_gem_object_pin();

1610

3) | i915_gem_object_set_to_gtt_domain() {

1611

3) 0.065 us | i915_gem_object_flush_gpu_write_domain();

1612

3) 0.065 us | i915_gem_object_wait_rendering();

1613

3) 0.062 us | i915_gem_object_flush_cpu_write_domain();

1614

3) 1.612 us | }

1615

3) | i915_gem_object_put_fence() {

1616

3) 0.097 us | i915_gem_object_flush_fence.constprop.36();

1617

3) 0.645 us | }

1618

3) 0.070 us | add_preempt_count();

1619

3) 0.070 us | sub_preempt_count();

1620

3) 0.073 us | i915_gem_object_unpin();

1621

3) 0.068 us | mutex_unlock();

3) 9.924 us | }

3) + 11.236 us | }

3) + 11.770 us | }

3) + 13.784 us | }

3) | sys_ioctl() {

</literallayout>

As you can see, the function_graph display is much easier to

1629

follow. Also note that in addition to the function calls and

1630

associated braces, other events such as scheduler events

1631

are displayed in context. In fact, you can freely include

1632

any tracepoint available in the trace events subsystem described

1633

in the next section by simply enabling those events, and they'll

1634

appear in context in the function graph display. Quite a

1635

powerful tool for understanding kernel dynamics.

</para>

<para>

Also notice that there are various annotations on the left

1640

hand side of the display. For example if the total time it

1641

took for a given function to execute is above a certain

1642

threshold, an exclamation point or plus sign appears on the

1643

left hand side. Please see the ftrace documentation for

1644

details on all these fields.

</para>

</section>

<title>The 'trace events' Subsystem</title>

1650

1651

<para>

1652

One especially important directory contained within

1653

the /sys/kernel/debug/tracing directory is the 'events'

1654

subdirectory, which contains representations of every

1655

tracepoint in the system. Listing out the contents of

1656

the 'events' subdirectory, we see mainly another set of

1657

subdirectories:

1658

1659

root@sugarbay:/sys/kernel/debug/tracing# cd events

1660

root@sugarbay:/sys/kernel/debug/tracing/events# ls -al

1661

drwxr-xr-x 38 root root 0 Nov 14 23:19 .

1662

drwxr-xr-x 5 root root 0 Nov 14 23:19 ..

1663

drwxr-xr-x 19 root root 0 Nov 14 23:19 block

1664

drwxr-xr-x 32 root root 0 Nov 14 23:19 btrfs

1665

drwxr-xr-x 5 root root 0 Nov 14 23:19 drm

1666

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1667

drwxr-xr-x 40 root root 0 Nov 14 23:19 ext3

1668

drwxr-xr-x 79 root root 0 Nov 14 23:19 ext4

1669

drwxr-xr-x 14 root root 0 Nov 14 23:19 ftrace

1670

drwxr-xr-x 8 root root 0 Nov 14 23:19 hda

1671

-r--r--r-- 1 root root 0 Nov 14 23:19 header_event

1672

-r--r--r-- 1 root root 0 Nov 14 23:19 header_page

1673

drwxr-xr-x 25 root root 0 Nov 14 23:19 i915

1674

drwxr-xr-x 7 root root 0 Nov 14 23:19 irq

1675

drwxr-xr-x 12 root root 0 Nov 14 23:19 jbd

1676

drwxr-xr-x 14 root root 0 Nov 14 23:19 jbd2

1677

drwxr-xr-x 14 root root 0 Nov 14 23:19 kmem

1678

drwxr-xr-x 7 root root 0 Nov 14 23:19 module

1679

drwxr-xr-x 3 root root 0 Nov 14 23:19 napi

1680

drwxr-xr-x 6 root root 0 Nov 14 23:19 net

1681

drwxr-xr-x 3 root root 0 Nov 14 23:19 oom

1682

drwxr-xr-x 12 root root 0 Nov 14 23:19 power

1683

drwxr-xr-x 3 root root 0 Nov 14 23:19 printk

1684

drwxr-xr-x 8 root root 0 Nov 14 23:19 random

1685

drwxr-xr-x 4 root root 0 Nov 14 23:19 raw_syscalls

1686

drwxr-xr-x 3 root root 0 Nov 14 23:19 rcu

1687

drwxr-xr-x 6 root root 0 Nov 14 23:19 rpm

1688

drwxr-xr-x 20 root root 0 Nov 14 23:19 sched

1689

drwxr-xr-x 7 root root 0 Nov 14 23:19 scsi

1690

drwxr-xr-x 4 root root 0 Nov 14 23:19 signal

1691

drwxr-xr-x 5 root root 0 Nov 14 23:19 skb

1692

drwxr-xr-x 4 root root 0 Nov 14 23:19 sock

1693

drwxr-xr-x 10 root root 0 Nov 14 23:19 sunrpc

1694

drwxr-xr-x 538 root root 0 Nov 14 23:19 syscalls

1695

drwxr-xr-x 4 root root 0 Nov 14 23:19 task

1696

drwxr-xr-x 14 root root 0 Nov 14 23:19 timer

1697

drwxr-xr-x 3 root root 0 Nov 14 23:19 udp

1698

drwxr-xr-x 21 root root 0 Nov 14 23:19 vmscan

1699

drwxr-xr-x 3 root root 0 Nov 14 23:19 vsyscall

1700

drwxr-xr-x 6 root root 0 Nov 14 23:19 workqueue

1701

drwxr-xr-x 26 root root 0 Nov 14 23:19 writeback

1702

</literallayout>

1703

Each one of these subdirectories corresponds to a

1704

'subsystem' and contains yet again more subdirectories,

1705

each one of those finally corresponding to a tracepoint.

1706

For example, here are the contents of the 'kmem' subsystem:

1707

1708

root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem

1709

root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al

1710

drwxr-xr-x 14 root root 0 Nov 14 23:19 .

1711

drwxr-xr-x 38 root root 0 Nov 14 23:19 ..

1712

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1713

-rw-r--r-- 1 root root 0 Nov 14 23:19 filter

1714

drwxr-xr-x 2 root root 0 Nov 14 23:19 kfree

1715

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc

1716

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc_node

1717

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc

1718

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc_node

1719

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_free

1720

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc

1721

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_extfrag

1722

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_zone_locked

1723

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free

1724

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free_batched

1725

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_pcpu_drain

1726

</literallayout>

1727

Let's see what's inside the subdirectory for a specific

1728

tracepoint, in this case the one for kmalloc:

1729

1730

root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc

1731

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al

1732

drwxr-xr-x 2 root root 0 Nov 14 23:19 .

1733

drwxr-xr-x 14 root root 0 Nov 14 23:19 ..

1734

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1735

-rw-r--r-- 1 root root 0 Nov 14 23:19 filter

1736

-r--r--r-- 1 root root 0 Nov 14 23:19 format

1737

-r--r--r-- 1 root root 0 Nov 14 23:19 id

1738

</literallayout>

1739

The 'format' file for the tracepoint describes the event

1740

in memory, which is used by the various tracing tools

1741

that now make use of these tracepoint to parse the event

1742

and make sense of it, along with a 'print fmt' field that

1743

allows tools like ftrace to display the event as text.

1744

Here's what the format of the kmalloc event looks like:

1745

1746

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format

name: kmalloc

ID: 313

format:

field:unsigned short common_type; offset:0; size:2; signed:0;

1751

field:unsigned char common_flags; offset:2; size:1; signed:0;

1752

field:unsigned char common_preempt_count; offset:3; size:1; signed:0;

1753

field:int common_pid; offset:4; size:4; signed:1;

1754

field:int common_padding; offset:8; size:4; signed:1;

1755

1756

field:unsigned long call_site; offset:16; size:8; signed:0;

1757

field:const void * ptr; offset:24; size:8; signed:0;

1758

field:size_t bytes_req; offset:32; size:8; signed:0;

1759

field:size_t bytes_alloc; offset:40; size:8; signed:0;

1760

field:gfp_t gfp_flags; offset:48; size:4; signed:0;

1761

1762

print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc,

1763

(REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1764

1765

gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | ((

1766

gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1767

gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1768

gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"},

1769

{(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)),

1770

"GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)((

1771

gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)((

1772

gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned

1773

long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"}, {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"},

1774

{(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u),

1775

"GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned

1776

long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"},

1777

{(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT"

1778

</literallayout>

1779

The 'enable' file in the tracepoint directory is what allows

1780

the user (or tools such as trace-cmd) to actually turn the

1781

tracepoint on and off. When enabled, the corresponding

1782

tracepoint will start appearing in the ftrace 'trace'

1783

file described previously. For example, this turns on the

1784

kmalloc tracepoint:

1785

1786

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable

1787

</literallayout>

1788

At the moment, we're not interested in the function tracer or

1789

some other tracer that might be in effect, so we first turn

1790

it off, but if we do that, we still need to turn tracing on in

1791

order to see the events in the output buffer:

1792

1793

root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer

1794

root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on

1795

</literallayout>

1796

Now, if we look at the the 'trace' file, we see nothing

1797

but the kmalloc events we just turned on:

1798

1799

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

1800

# tracer: nop

1801

#

1802

# entries-in-buffer/entries-written: 1897/1897 #P:8

1803

#

1804

# _-----=> irqs-off

1805

# / _----=> need-resched

1806

# | / _---=> hardirq/softirq

1807

# || / _--=> preempt-depth

1808

# ||| / delay

1809

# TASK-PID CPU# |||| TIMESTAMP FUNCTION

1810

# | | | |||| | |

1811

dropbear-1465 [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1812

<idle>-0 [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1813

<idle>-0 [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1814

matchbox-termin-1361 [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT

1815

Xorg-1264 [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1816

Xorg-1264 [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1817

Xorg-1264 [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1818

matchbox-termin-1361 [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT

1819

Xorg-1264 [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1820

Xorg-1264 [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1821

Xorg-1264 [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1822

<idle>-0 [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1823

dropbear-1465 [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1824

<idle>-0 [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1825

<idle>-0 [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1826

dropbear-1465 [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1827

<idle>-0 [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1828

<idle>-0 [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1829

dropbear-1465 [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1830

<idle>-0 [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1831

matchbox-termin-1361 [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT

1832

Xorg-1264 [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1833

Xorg-1264 [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1834

Xorg-1264 [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1835

<idle>-0 [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1836

dropbear-1465 [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1837

<idle>-0 [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1838

<idle>-0 [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1839

dropbear-1465 [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1840

<idle>-0 [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1841

matchbox-termin-1361 [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT

1842

</literallayout>

1843

To again disable the kmalloc event, we need to send 0 to the

1844

enable file:

1845

1846

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable

1847

</literallayout>

1848

You can enable any number of events or complete subsystems

1849

(by using the 'enable' file in the subsystem directory) and

1850

get an arbitrarily fine-grained idea of what's going on in the

1851

system by enabling as many of the appropriate tracepoints

as applicable.

</para>

<para>

A number of the tools described in this HOWTO do just that,

1857

including trace-cmd and kernelshark in the next section.

</para>

<emphasis>Tying it Together:</emphasis> These tracepoints and their representation

1862

are used not only by ftrace, but by many of the other tools

1863

covered in this document and they form a central point of

1864

integration for the various tracers available in Linux.

1865

They form a central part of the instrumentation for the

1866

following tools: perf, lttng, ftrace, blktrace and SystemTap

</informalexample>

<emphasis>Tying it Together:</emphasis> Eventually all the special-purpose tracers

1871

currently available in /sys/kernel/debug/tracing will be

1872

removed and replaced with equivalent tracers based on the

1873

'trace events' subsystem.

</informalexample>

</section>

<title>trace-cmd/kernelshark</title>

1879

1880

<para>

1881

trace-cmd is essentially an extensive command-line 'wrapper'

1882

interface that hides the details of all the individual files

1883

in /sys/kernel/debug/tracing, allowing users to specify

1884

specific particular events within the

1885

/sys/kernel/debug/tracing/events/ subdirectory and to collect

1886

traces and avoid having to deal with those details directly.

</para>

<para>

As yet another layer on top of that, kernelshark provides a GUI

1891

that allows users to start and stop traces and specify sets

1892

of events using an intuitive interface, and view the

1893

output as both trace events and as a per-CPU graphical

1894

display. It directly uses 'trace-cmd' as the plumbing

1895

that accomplishes all that underneath the covers (and

1896

actually displays the trace-cmd command it uses, as we'll see).

</para>

<para>

To start a trace using kernelshark, first start kernelshark:

1901

1902

root@sugarbay:~# kernelshark

1903

</literallayout>

1904

Then bring up the 'Capture' dialog by choosing from the

kernelshark menu:

Capture | Record

</literallayout>

That will display the following dialog, which allows you to

1910

choose one or more events (or even one or more complete

1911

subsystems) to trace:

</para>

<para>

</para>

<para>

Note that these are exactly the same sets of events described

1920

in the previous trace events subsystem section, and in fact

1921

is where trace-cmd gets them for kernelshark.

</para>

<para>

In the above screenshot, we've decided to explore the

1926

graphics subsystem a bit and so have chosen to trace all

1927

the tracepoints contained within the 'i915' and 'drm'

subsystems.

</para>

<para>

After doing that, we can start and stop the trace using

1933

the 'Run' and 'Stop' button on the lower right corner of

1934

the dialog (the same button will turn into the 'Stop'

1935

button after the trace has started):

</para>

<para>

</para>

<para>

Notice that the right-hand pane shows the exact trace-cmd

1944

command-line that's used to run the trace, along with the

1945

results of the trace-cmd run.

</para>

<para>

Once the 'Stop' button is pressed, the graphical view magically

1950

fills up with a colorful per-cpu display of the trace data,

1951

along with the detailed event listing below that:

</para>

<para>

</para>

<para>

Here's another example, this time a display resulting

1960

from tracing 'all events':

</para>

<para>

</para>

<para>

The tool is pretty self-explanatory, but for more detailed

1969

information on navigating through the data, see the

1970

<ulink url='http://rostedt.homelinux.com/kernelshark/'>kernelshark website</ulink>.

</para>

</section>

<title>Documentation</title>

1976

1977

<para>

1978

The documentation for ftrace can be found in the kernel

1979

Documentation directory:

1980

1981

Documentation/trace/ftrace.txt

1982

</literallayout>

1983

The documentation for the trace event subsystem can also

1984

be found in the kernel Documentation directory:

1985

1986

Documentation/trace/events.txt

1987

</literallayout>

1988

There is a nice series of articles on using

1989

ftrace and trace-cmd at LWN:

1990

1991

<listitem><para><ulink url='http://lwn.net/Articles/365835/'>Debugging the kernel using Ftrace - part 1</ulink>

1992

</para></listitem>

1993

<listitem><para><ulink url='http://lwn.net/Articles/366796/'>Debugging the kernel using Ftrace - part 2</ulink>

1994

</para></listitem>

1995

<listitem><para><ulink url='http://lwn.net/Articles/370423/'>Secrets of the Ftrace function tracer</ulink>

1996

</para></listitem>

1997

<listitem><para><ulink url='https://lwn.net/Articles/410200/'>trace-cmd: A front-end for Ftrace</ulink>

</para></listitem>

</itemizedlist>

</para>

<para>

There's more detailed documentation kernelshark usage here:

2004

<ulink url='http://rostedt.homelinux.com/kernelshark/'>KernelShark</ulink>

</para>

<para>

An amusing yet useful README (a tracing mini-HOWTO) can be

2009

found in /sys/kernel/debug/tracing/README.

</para>

</section>

</section>

<title>systemtap</title>

2016

2017

<para>

2018

SystemTap is a system-wide script-based tracing and profiling tool.

</para>

<para>

SystemTap scripts are C-like programs that are executed in the

2023

kernel to gather/print/aggregate data extracted from the context

2024

they end up being invoked under.

</para>

<para>

For example, this probe from the

2029

<ulink url='http://sourceware.org/systemtap/tutorial/'>SystemTap tutorial</ulink>

2030

simply prints a line every time any process on the system open()s

2031

a file. For each line, it prints the executable name of the

2032

program that opened the file, along with its PID, and the name

2033

of the file it opened (or tried to open), which it extracts

2034

from the open syscall's argstr.

probe syscall.open

{

printf ("%s(%d) open (%s)\n", execname(), pid(), argstr)

2039

}

2040

2041

probe timer.ms(4000) # after 4 seconds

{

exit ()

}

</literallayout>

Normally, to execute this probe, you'd simply install

2047

systemtap on the system you want to probe, and directly run

2048

the probe on that system e.g. assuming the name of the file

2049

containing the above text is trace_open.stp:

2050

2051

# stap trace_open.stp

2052

</literallayout>

2053

What systemtap does under the covers to run this probe is 1)

2054

parse and convert the probe to an equivalent 'C' form, 2)

2055

compile the 'C' form into a kernel module, 3) insert the

2056

module into the kernel, which arms it, and 4) collect the data

2057

generated by the probe and display it to the user.

</para>

<para>

In order to accomplish steps 1 and 2, the 'stap' program needs

2062

access to the kernel build system that produced the kernel

2063

that the probed system is running. In the case of a typical

2064

embedded system (the 'target'), the kernel build system

2065

unfortunately isn't typically part of the image running on

2066

the target. It is normally available on the 'host' system

2067

that produced the target image however; in such cases,

2068

steps 1 and 2 are executed on the host system, and steps

2069

3 and 4 are executed on the target system, using only the

systemtap 'runtime'.

</para>

<para>

The systemtap support in Yocto assumes that only steps

2075

3 and 4 are run on the target; it is possible to do

2076

everything on the target, but this section assumes only

2077

the typical embedded use-case.

</para>

<para>

So basically what you need to do in order to run a systemtap

2082

script on the target is to 1) on the host system, compile the

2083

probe into a kernel module that makes sense to the target, 2)

2084

copy the module onto the target system and 3) insert the

2085

module into the target kernel, which arms it, and 4) collect

2086

the data generated by the probe and display it to the user.

</para>

<title>Setup</title>

<para>

Those are a lot of steps and a lot of details, but

2094

fortunately Yocto includes a script called 'crosstap'

2095

that will take care of those details, allowing you to

2096

simply execute a systemtap script on the remote target,

2097

with arguments if necessary.

</para>

<para>

In order to do this from a remote host, however, you

2102

need to have access to the build for the image you

2103

booted. The 'crosstap' script provides details on how

2104

to do this if you run the script on the host without having

2105

done a build:

2106

<note>

2107

SystemTap, which uses 'crosstap', assumes you can establish an

2108

ssh connection to the remote target.

2109

Please refer to the crosstap wiki page for details on verifying

2110

ssh connections at

2111

<ulink url='https://wiki.yoctoproject.org/wiki/Tracing_and_Profiling#systemtap'></ulink>.

2112

Also, the ability to ssh into the target system is not enabled

2113

by default in *-minimal images.

2114

</note>

2115

2116

$ crosstap root@192.168.1.88 trace_open.stp

2117

2118

Error: No target kernel build found.

2119

Did you forget to create a local build of your image?

2120

2121

'crosstap' requires a local sdk build of the target system

2122

(or a build that includes 'tools-profile') in order to build

2123

kernel modules that can probe the target system.

2124

2125

Practically speaking, that means you need to do the following:

2126

- If you're running a pre-built image, download the release

2127

and/or BSP tarballs used to build the image.

2128

- If you're working from git sources, just clone the metadata

2129

and BSP layers needed to build the image you'll be booting.

2130

- Make sure you're properly set up to build a new image (see

2131

the BSP README and/or the widely available basic documentation

2132

that discusses how to build images).

2133

- Build an -sdk version of the image e.g.:

2134

$ bitbake core-image-sato-sdk

2135

OR

2136

- Build a non-sdk image but include the profiling tools:

2137

[ edit local.conf and add 'tools-profile' to the end of

2138

the EXTRA_IMAGE_FEATURES variable ]

2139

$ bitbake core-image-sato

2140

2141

Once you've build the image on the host system, you're ready to

2142

boot it (or the equivalent pre-built image) and use 'crosstap'

2143

to probe it (you need to source the environment as usual first):

2144

2145

$ source oe-init-build-env

2146

$ cd ~/my/systemtap/scripts

2147

$ crosstap root@192.168.1.xxx myscript.stp

2148

</literallayout>

2149

So essentially what you need to do is build an SDK image or

2150

image with 'tools-profile' as detailed in the

2151

"<link linkend='profile-manual-general-setup'>General Setup</link>"

2152

section of this manual, and boot the resulting target image.

</para>

<note>

If you have a build directory containing multiple machines,

2157

you need to have the MACHINE you're connecting to selected

2158

in local.conf, and the kernel in that machine's build

2159

directory must match the kernel on the booted system exactly,

2160

or you'll get the above 'crosstap' message when you try to

invoke a script.

</note>

</section>

<title>Running a Script on a Target</title>

2167

2168

<para>

2169

Once you've done that, you should be able to run a systemtap

2170

script on the target:

2171

2172

$ cd /path/to/yocto

2173

$ source oe-init-build-env

2174

2175

### Shell environment set up for builds. ###

2176

Patrick Williams

d8c66bc

2016-06-20 12:57:21 -0500

[diff] [blame]

2177

You can now run 'bitbake <target>'

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

2178

2179

Common targets are:

Patrick Williams

d8c66bc

2016-06-20 12:57:21 -0500

[diff] [blame]

core-image-minimal

core-image-sato

meta-toolchain

meta-ide-support

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

2184

2185

You can also run generated qemu images with a command like 'runqemu qemux86'

Patrick Williams

d8c66bc

2016-06-20 12:57:21 -0500

[diff] [blame]

2186

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

2187

</literallayout>

2188

Once you've done that, you can cd to whatever directory

2189

contains your scripts and use 'crosstap' to run the script:

2190

2191

$ cd /path/to/my/systemap/script

2192

$ crosstap root@192.168.7.2 trace_open.stp

2193

</literallayout>

2194

If you get an error connecting to the target e.g.:

2195

2196

$ crosstap root@192.168.7.2 trace_open.stp

2197

error establishing ssh connection on remote 'root@192.168.7.2'

2198

</literallayout>

2199

Try ssh'ing to the target and see what happens:

2200

2201

$ ssh root@192.168.7.2

2202

</literallayout>

2203

A lot of the time, connection problems are due specifying a

2204

wrong IP address or having a 'host key verification error'.

</para>

<para>

If everything worked as planned, you should see something

2209

like this (enter the password when prompted, or press enter

2210

if it's set up to use no password):

2211

2212

$ crosstap root@192.168.7.2 trace_open.stp

2213

root@192.168.7.2's password:

2214

matchbox-termin(1036) open ("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)

2215

matchbox-termin(1036) open ("/tmp/vteJMC7LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)

</literallayout>

</para>

</section>

<title>Documentation</title>

2222

2223

<para>

2224

The SystemTap language reference can be found here:

2225

<ulink url='http://sourceware.org/systemtap/langref/'>SystemTap Language Reference</ulink>

</para>

<para>

Links to other SystemTap documents, tutorials, and examples can be

2230

found here:

2231

<ulink url='http://sourceware.org/systemtap/documentation.html'>SystemTap documentation page</ulink>

</para>

</section>

</section>

Patrick Williams

2015-09-15 14:41:29 -0500

[diff] [blame]

2236

2237

<title>Sysprof</title>

2238

2239

<para>

2240

Sysprof is a very easy to use system-wide profiler that consists

2241

of a single window with three panes and a few buttons which allow

2242

you to start, stop, and view the profile from one place.

</para>

<title>Setup</title>

<para>

For this section, we'll assume you've already performed the

2250

basic setup outlined in the General Setup section.

</para>

<para>

Sysprof is a GUI-based application that runs on the target

2255

system. For the rest of this document we assume you've

2256

ssh'ed to the host and will be running Sysprof on the

2257

target (you can use the '-X' option to ssh and have the

2258

Sysprof GUI run on the target but display remotely on the

host if you want).

</para>

</section>

<title>Basic Usage</title>

2265

2266

<para>

2267

To start profiling the system, you simply press the 'Start'

2268

button. To stop profiling and to start viewing the profile data

2269

in one easy step, press the 'Profile' button.

</para>

<para>

Once you've pressed the profile button, the three panes will

2274

fill up with profiling data:

</para>

<para>

</para>

<para>

The left pane shows a list of functions and processes.

2283

Selecting one of those expands that function in the right

2284

pane, showing all its callees. Note that this caller-oriented

2285

display is essentially the inverse of perf's default

2286

callee-oriented callchain display.

</para>

<para>

In the screenshot above, we're focusing on __copy_to_user_ll()

2291

and looking up the callchain we can see that one of the callers

2292

of __copy_to_user_ll is sys_read() and the complete callpath

2293

between them. Notice that this is essentially a portion of the

2294

same information we saw in the perf display shown in the perf

2295

section of this page.

</para>

<para>

</para>

<para>

Similarly, the above is a snapshot of the Sysprof display of a

2304

copy-from-user callchain.

</para>

<para>

Finally, looking at the third Sysprof pane in the lower left,

2309

we can see a list of all the callers of a particular function

2310

selected in the top left pane. In this case, the lower pane is

2311

showing all the callers of __mark_inode_dirty:

</para>

<para>

</para>

<para>

Double-clicking on one of those functions will in turn change the

2320

focus to the selected function, and so on.

</para>

<emphasis>Tying it Together:</emphasis> If you like sysprof's 'caller-oriented'

2325

display, you may be able to approximate it in other tools as

2326

well. For example, 'perf report' has the -g (--call-graph)

2327

option that you can experiment with; one of the options is

2328

'caller' for an inverted caller-based callgraph display.

</informalexample>

</section>

<title>Documentation</title>

2334

2335

<para>

2336

There doesn't seem to be any documentation for Sysprof, but

2337

maybe that's because it's pretty self-explanatory.

2338

The Sysprof website, however, is here:

2339

<ulink url='http://sysprof.com/'>Sysprof, System-wide Performance Profiler for Linux</ulink>

</para>

</section>

</section>

<title>LTTng (Linux Trace Toolkit, next generation)</title>

<title>Setup</title>

<para>

For this section, we'll assume you've already performed the

2352

basic setup outlined in the General Setup section.

</para>

<para>

LTTng is run on the target system by ssh'ing to it.

2357

However, if you want to see the traces graphically,

2358

install Eclipse as described in section

2359

"<link linkend='manually-copying-a-trace-to-the-host-and-viewing-it-in-eclipse'>Manually copying a trace to the host and viewing it in Eclipse (i.e. using Eclipse without network support)</link>"

2360

and follow the directions to manually copy traces to the host and

2361

view them in Eclipse (i.e. using Eclipse without network support).

</para>

<note>

Be sure to download and install/run the 'SR1' or later Juno release

2366

of eclipse e.g.:

2367

<ulink url='http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz'>http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz</ulink>

</note>

</section>

<title>Collecting and Viewing Traces</title>

2373

2374

<para>

2375

Once you've applied the above commits and built and booted your

2376

image (you need to build the core-image-sato-sdk image or use one of the

2377

other methods described in the General Setup section), you're

2378

ready to start tracing.

</para>

<title>Collecting and viewing a trace on the target (inside a shell)</title>

2383

2384

<para>

2385

First, from the host, ssh to the target:

2386

2387

$ ssh -l root 192.168.1.47

2388

The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.

2389

RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.

2390

Are you sure you want to continue connecting (yes/no)? yes

2391

Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.

2392

root@192.168.1.47's password:

2393

</literallayout>

2394

Once on the target, use these steps to create a trace:

2395

2396

root@crownbay:~# lttng create

2397

Spawning a session daemon

2398

Session auto-20121015-232120 created.

2399

Traces will be written in /home/root/lttng-traces/auto-20121015-232120

2400

</literallayout>

2401

Enable the events you want to trace (in this case all

2402

kernel events):

2403

2404

root@crownbay:~# lttng enable-event --kernel --all

2405

All kernel events are enabled in channel channel0

</literallayout>

Start the trace:

root@crownbay:~# lttng start

2410

Tracing started for session auto-20121015-232120

2411

</literallayout>

2412

And then stop the trace after awhile or after running

2413

a particular workload that you want to trace:

2414

2415

root@crownbay:~# lttng stop

2416

Tracing stopped for session auto-20121015-232120

2417

</literallayout>

2418

You can now view the trace in text form on the target:

2419

2420

root@crownbay:~# lttng view

2421

[23:21:56.989270399] (+?.?????????) sys_geteuid: { 1 }, { }

2422

[23:21:56.989278081] (+0.000007682) exit_syscall: { 1 }, { ret = 0 }

2423

[23:21:56.989286043] (+0.000007962) sys_pipe: { 1 }, { fildes = 0xB77B9E8C }

2424

[23:21:56.989321802] (+0.000035759) exit_syscall: { 1 }, { ret = 0 }

2425

[23:21:56.989329345] (+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot = 3, flags = 131362, fd = 4294967295, pgoff = 0 }

2426

[23:21:56.989351694] (+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 }

2427

[23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags = 0x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 }

2428

[23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388 }

2429

[23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm = "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 }

2430

[23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer = 3970832076, clockid = 1, mode = 1 }

2431

[23:21:56.989770462] (+0.000262044) hrtimer_cancel: { 1 }, { hrtimer = 3993865440 }

2432

[23:21:56.989771580] (+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 }

2433

[23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, { hrtimer = 3993865440, now = 79815980007057, function = 3238465232 }

2434

[23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, { hrtimer = 3993812192, now = 79815980008174, function = 3238465232 }

2435

[23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 }

2436

[23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 }

2437

[23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 }

2438

[23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098 }

2439

[23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime = 36976733240 }

2440

[23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: { 0 }, { hrtimer = 3993812192 }

2441

[23:21:56.989819631] (+0.000001117) hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 }

2442

[23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer = 3993812192, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }

2443

[23:21:56.989822984] (+0.000001118) hrtimer_start: { 1 }, { hrtimer = 3993865440, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }

2444

[23:21:56.989832762] (+0.000009778) softirq_entry: { 1 }, { vec = 1 }

2445

[23:21:56.989833879] (+0.000001117) softirq_entry: { 0 }, { vec = 1 }

2446

[23:21:56.989838069] (+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 }

2447

[23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer = 3993818708 }

2448

[23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1 }, { timer = 3993871956, now = 79515980, function = 3238277552 }

2449

[23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer = 3993818708, now = 79515980, function = 3238277552 }

2450

[23:21:56.989854831] (+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 49237, vruntime = 43368363335 }

2451

[23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 }

2452

[23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm = "kworker/1:1", tid = 21, delay = 9451318 }

2453

[23:21:56.989862374] (+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4, delay = 9958820 }

2454

[23:21:56.989868241] (+0.000005867) sched_wakeup: { 0 }, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu = 0 }

2455

[23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm = "kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 }

2456

[23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer = 3993871956 }

2457

[23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0 }, { timer = 3993818708 }

.

.

.

</literallayout>

You can now safely destroy the trace session (note that

2463

this doesn't delete the trace - it's still there

2464

in ~/lttng-traces):

2465

2466

root@crownbay:~# lttng destroy

2467

Session auto-20121015-232120 destroyed at /home/root

2468

</literallayout>

2469

Note that the trace is saved in a directory of the same

2470

name as returned by 'lttng create', under the ~/lttng-traces

2471

directory (note that you can change this by supplying your

2472

own name to 'lttng create'):

2473

2474

root@crownbay:~# ls -al ~/lttng-traces

2475

drwxrwx--- 3 root root 1024 Oct 15 23:21 .

2476

drwxr-xr-x 5 root root 1024 Oct 15 23:57 ..

2477

drwxrwx--- 3 root root 1024 Oct 15 23:21 auto-20121015-232120

</literallayout>

</para>

</section>

<title>Collecting and viewing a userspace trace on the target (inside a shell)</title>

2484

2485

<para>

2486

For LTTng userspace tracing, you need to have a properly

2487

instrumented userspace program. For this example, we'll use

2488

the 'hello' test program generated by the lttng-ust build.

</para>

<para>

The 'hello' test program isn't installed on the rootfs by

2493

the lttng-ust build, so we need to copy it over manually.

2494

First cd into the build directory that contains the hello

2495

executable:

2496

2497

$ cd build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs

2498

</literallayout>

2499

Copy that over to the target machine:

2500

2501

$ scp hello root@192.168.1.20:

2502

</literallayout>

2503

You now have the instrumented lttng 'hello world' test

2504

program on the target, ready to test.

</para>

<para>

First, from the host, ssh to the target:

2509

2510

$ ssh -l root 192.168.1.47

2511

The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.

2512

RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.

2513

Are you sure you want to continue connecting (yes/no)? yes

2514

Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.

2515

root@192.168.1.47's password:

2516

</literallayout>

2517

Once on the target, use these steps to create a trace:

2518

2519

root@crownbay:~# lttng create

2520

Session auto-20190303-021943 created.

2521

Traces will be written in /home/root/lttng-traces/auto-20190303-021943

2522

</literallayout>

2523

Enable the events you want to trace (in this case all

2524

userspace events):

2525

2526

root@crownbay:~# lttng enable-event --userspace --all

2527

All UST events are enabled in channel channel0

</literallayout>

Start the trace:

root@crownbay:~# lttng start

2532

Tracing started for session auto-20190303-021943

2533

</literallayout>

2534

Run the instrumented hello world program:

2535

2536

root@crownbay:~# ./hello

Hello, World!

Tracing... done.

</literallayout>

And then stop the trace after awhile or after running a

2541

particular workload that you want to trace:

2542

2543

root@crownbay:~# lttng stop

2544

Tracing stopped for session auto-20190303-021943

2545

</literallayout>

2546

You can now view the trace in text form on the target:

2547

2548

root@crownbay:~# lttng view

2549

[02:31:14.906146544] (+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0, netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2550

[02:31:14.906170360] (+0.000023816) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1, longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2551

[02:31:14.906183140] (+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2, netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2552

[02:31:14.906194385] (+0.000011245) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3, longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

.

.

.

</literallayout>

You can now safely destroy the trace session (note that

2558

this doesn't delete the trace - it's still

2559

there in ~/lttng-traces):

2560

2561

root@crownbay:~# lttng destroy

2562

Session auto-20190303-021943 destroyed at /home/root

</literallayout>

</para>

</section>

<title>Manually copying a trace to the host and viewing it in Eclipse (i.e. using Eclipse without network support)</title>

2569

2570

<para>

2571

If you already have an LTTng trace on a remote target and

2572

would like to view it in Eclipse on the host, you can easily

2573

copy it from the target to the host and import it into

2574

Eclipse to view it using the LTTng Eclipse plug-in already

2575

bundled in the Eclipse (Juno SR1 or greater).

</para>

<para>

Using the trace we created in the previous section, archive

2580

it and copy it to your host system:

2581

2582

root@crownbay:~/lttng-traces# tar zcvf auto-20121015-232120.tar.gz auto-20121015-232120

2583

auto-20121015-232120/

2584

auto-20121015-232120/kernel/

2585

auto-20121015-232120/kernel/metadata

2586

auto-20121015-232120/kernel/channel0_1

2587

auto-20121015-232120/kernel/channel0_0

2588

2589

$ scp root@192.168.1.47:lttng-traces/auto-20121015-232120.tar.gz .

2590

root@192.168.1.47's password:

2591

auto-20121015-232120.tar.gz 100% 1566KB 1.5MB/s 00:01

2592

</literallayout>

2593

Unarchive it on the host:

2594

2595

$ gunzip -c auto-20121015-232120.tar.gz | tar xvf -

2596

auto-20121015-232120/

2597

auto-20121015-232120/kernel/

2598

auto-20121015-232120/kernel/metadata

2599

auto-20121015-232120/kernel/channel0_1

2600

auto-20121015-232120/kernel/channel0_0

2601

</literallayout>

2602

We can now import the trace into Eclipse and view it:

2603

2604

<listitem><para>First, start eclipse and open the

2605

'LTTng Kernel' perspective by selecting the following

2606

menu item:

2607

2608

Window | Open Perspective | Other...

2609

</literallayout></para></listitem>

2610

<listitem><para>In the dialog box that opens, select

2611

'LTTng Kernel' from the list.</para></listitem>

2612

<listitem><para>Back at the main menu, select the

2613

following menu item:

2614

2615

File | New | Project...

2616

</literallayout></para></listitem>

2617

<listitem><para>In the dialog box that opens, select

2618

the 'Tracing | Tracing Project' wizard and press

2619

'Next>'.</para></listitem>

2620

<listitem><para>Give the project a name and press

2621

'Finish'.</para></listitem>

2622

<listitem><para>In the 'Project Explorer' pane under

2623

the project you created, right click on the

2624

'Traces' item.</para></listitem>

2625

<listitem><para>Select 'Import..." and in the dialog

2626

that's displayed:</para></listitem>

2627

<listitem><para>Browse the filesystem and find the

2628

select the 'kernel' directory containing the trace

2629

you copied from the target

2630

e.g. auto-20121015-232120/kernel</para></listitem>

2631

<listitem><para>'Checkmark' the directory in the tree

2632

that's displayed for the trace</para></listitem>

2633

<listitem><para>Below that, select 'Common Trace Format:

2634

Kernel Trace' for the 'Trace Type'</para></listitem>

2635

<listitem><para>Press 'Finish' to close the dialog

2636

</para></listitem>

2637

<listitem><para>Back in the 'Project Explorer' pane,

2638

double-click on the 'kernel' item for the

2639

trace you just imported under 'Traces'

2640

</para></listitem>

2641

</orderedlist>

2642

You should now see your trace data displayed graphically

2643

in several different views in Eclipse:

</para>

<para>

</para>

<para>

You can access extensive help information on how to use

2652

the LTTng plug-in to search and analyze captured traces via

2653

the Eclipse help system:

2654

2655

Help | Help Contents | LTTng Plug-in User Guide

</literallayout>

</para>

</section>

<title>Collecting and viewing a trace in Eclipse</title>

2662

2663

<note>

2664

This section on collecting traces remotely doesn't currently

2665

work because of Eclipse 'RSE' connectivity problems. Manually

2666

tracing on the target, copying the trace files to the host,

2667

and viewing the trace in Eclipse on the host as outlined in

2668

previous steps does work however - please use the manual

2669

steps outlined above to view traces in Eclipse.

</note>

<para>

In order to trace a remote target, you also need to add

2674

a 'tracing' group on the target and connect as a user

2675

who's part of that group e.g:

2676

2677

# adduser tomz

2678

# groupadd -r tracing

2679

# usermod -a -G tracing tomz

2680

</literallayout>

2681

2682

<listitem><para>First, start eclipse and open the

2683

'LTTng Kernel' perspective by selecting the following

2684

menu item:

2685

2686

Window | Open Perspective | Other...

2687

</literallayout></para></listitem>

2688

<listitem><para>In the dialog box that opens, select

2689

'LTTng Kernel' from the list.</para></listitem>

2690

<listitem><para>Back at the main menu, select the

2691

following menu item:

2692

2693

File | New | Project...

2694

</literallayout></para></listitem>

2695

<listitem><para>In the dialog box that opens, select

2696

the 'Tracing | Tracing Project' wizard and

2697

press 'Next>'.</para></listitem>

2698

<listitem><para>Give the project a name and press

2699

'Finish'. That should result in an entry in the

2700

'Project' subwindow.</para></listitem>

2701

<listitem><para>In the 'Control' subwindow just below

2702

it, press 'New Connection'.</para></listitem>

2703

<listitem><para>Add a new connection, giving it the

2704

hostname or IP address of the target system.

2705

</para></listitem>

2706

<listitem><para>Provide the username and password

2707

of a qualified user (a member of the 'tracing' group)

2708

or root account on the target system.

2709

</para></listitem>

2710

<listitem><para>Provide appropriate answers to whatever

2711

else is asked for e.g. 'secure storage password'

2712

can be anything you want.

2713

If you get an 'RSE Error' it may be due to proxies.

2714

It may be possible to get around the problem by

2715

changing the following setting:

2716

2717

Window | Preferences | Network Connections

2718

</literallayout>

2719

Switch 'Active Provider' to 'Direct'

</para></listitem>

</orderedlist>

</para>

</section>

</section>

<title>Documentation</title>

2728

2729

<para>

2730

You can find the primary LTTng Documentation on the

2731

<ulink url='https://lttng.org/docs/'>LTTng Documentation</ulink>

2732

site.

2733

The documentation on this site is appropriate for intermediate to

2734

advanced software developers who are working in a Linux environment

2735

and are interested in efficient software tracing.

</para>

<para>

For information on LTTng in general, visit the

2740

<ulink url='http://lttng.org/lttng2.0'>LTTng Project</ulink>

2741

site.

2742

You can find a "Getting Started" link on this site that takes

2743

you to an LTTng Quick Start.

</para>

<para>

Finally, you can access extensive help information on how to use

2748

the LTTng plug-in to search and analyze captured traces via the

2749

Eclipse help system:

2750

2751

Help | Help Contents | LTTng Plug-in User Guide

</literallayout>

</para>

</section>

</section>

<title>blktrace</title>

2759

2760

<para>

2761

blktrace is a tool for tracing and reporting low-level disk I/O.

2762

blktrace provides the tracing half of the equation; its output can

2763

be piped into the blkparse program, which renders the data in a

2764

human-readable form and does some basic analysis:

</para>

<title>Setup</title>

<para>

For this section, we'll assume you've already performed the

2772

basic setup outlined in the

2773

"<link linkend='profile-manual-general-setup'>General Setup</link>"

section.

</para>

<para>

blktrace is an application that runs on the target system.

2779

You can run the entire blktrace and blkparse pipeline on the

2780

target, or you can run blktrace in 'listen' mode on the target

2781

and have blktrace and blkparse collect and analyze the data on

2782

the host (see the

2783

"<link linkend='using-blktrace-remotely'>Using blktrace Remotely</link>"

2784

section below).

2785

For the rest of this section we assume you've ssh'ed to the

2786

host and will be running blkrace on the target.

</para>

</section>

<title>Basic Usage</title>

2792

2793

<para>

2794

To record a trace, simply run the 'blktrace' command, giving it

2795

the name of the block device you want to trace activity on:

2796

2797

root@crownbay:~# blktrace /dev/sdc

2798

</literallayout>

2799

In another shell, execute a workload you want to trace.

2800

2801

root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>; sync

2802

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

2803

linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA

2804

</literallayout>

2805

Press Ctrl-C in the blktrace shell to stop the trace. It will

2806

display how many events were logged, along with the per-cpu file

2807

sizes (blktrace records traces in per-cpu kernel buffers and

2808

simply dumps them to userspace for blkparse to merge and sort

later).

^C=== sdc ===

CPU 0: 7082 events, 332 KiB data

2813

CPU 1: 1578 events, 74 KiB data

2814

Total: 8660 events (dropped 0), 406 KiB data

2815

</literallayout>

2816

If you examine the files saved to disk, you see multiple files,

2817

one per CPU and with the device name as the first part of the

2818

filename:

2819

2820

root@crownbay:~# ls -al

2821

drwxr-xr-x 6 root root 1024 Oct 27 22:39 .

2822

drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..

2823

-rw-r--r-- 1 root root 339938 Oct 27 22:40 sdc.blktrace.0

2824

-rw-r--r-- 1 root root 75753 Oct 27 22:40 sdc.blktrace.1

2825

</literallayout>

2826

To view the trace events, simply invoke 'blkparse' in the

2827

directory containing the trace files, giving it the device name

2828

that forms the first part of the filenames:

2829

2830

root@crownbay:~# blkparse sdc

2831

2832

8,32 1 1 0.000000000 1225 Q WS 3417048 + 8 [jbd2/sdc-8]

2833

8,32 1 2 0.000025213 1225 G WS 3417048 + 8 [jbd2/sdc-8]

2834

8,32 1 3 0.000033384 1225 P N [jbd2/sdc-8]

2835

8,32 1 4 0.000043301 1225 I WS 3417048 + 8 [jbd2/sdc-8]

2836

8,32 1 0 0.000057270 0 m N cfq1225 insert_request

2837

8,32 1 0 0.000064813 0 m N cfq1225 add_to_rr

2838

8,32 1 5 0.000076336 1225 U N [jbd2/sdc-8] 1

2839

8,32 1 0 0.000088559 0 m N cfq workload slice:150

2840

8,32 1 0 0.000097359 0 m N cfq1225 set_active wl_prio:0 wl_type:1

2841

8,32 1 0 0.000104063 0 m N cfq1225 Not idling. st->count:1

2842

8,32 1 0 0.000112584 0 m N cfq1225 fifo= (null)

2843

8,32 1 0 0.000118730 0 m N cfq1225 dispatch_insert

2844

8,32 1 0 0.000127390 0 m N cfq1225 dispatched a request

2845

8,32 1 0 0.000133536 0 m N cfq1225 activate rq, drv=1

2846

8,32 1 6 0.000136889 1225 D WS 3417048 + 8 [jbd2/sdc-8]

2847

8,32 1 7 0.000360381 1225 Q WS 3417056 + 8 [jbd2/sdc-8]

2848

8,32 1 8 0.000377422 1225 G WS 3417056 + 8 [jbd2/sdc-8]

2849

8,32 1 9 0.000388876 1225 P N [jbd2/sdc-8]

2850

8,32 1 10 0.000397886 1225 Q WS 3417064 + 8 [jbd2/sdc-8]

2851

8,32 1 11 0.000404800 1225 M WS 3417064 + 8 [jbd2/sdc-8]

2852

8,32 1 12 0.000412343 1225 Q WS 3417072 + 8 [jbd2/sdc-8]

2853

8,32 1 13 0.000416533 1225 M WS 3417072 + 8 [jbd2/sdc-8]

2854

8,32 1 14 0.000422121 1225 Q WS 3417080 + 8 [jbd2/sdc-8]

2855

8,32 1 15 0.000425194 1225 M WS 3417080 + 8 [jbd2/sdc-8]

2856

8,32 1 16 0.000431968 1225 Q WS 3417088 + 8 [jbd2/sdc-8]

2857

8,32 1 17 0.000435251 1225 M WS 3417088 + 8 [jbd2/sdc-8]

2858

8,32 1 18 0.000440279 1225 Q WS 3417096 + 8 [jbd2/sdc-8]

2859

8,32 1 19 0.000443911 1225 M WS 3417096 + 8 [jbd2/sdc-8]

2860

8,32 1 20 0.000450336 1225 Q WS 3417104 + 8 [jbd2/sdc-8]

2861

8,32 1 21 0.000454038 1225 M WS 3417104 + 8 [jbd2/sdc-8]

2862

8,32 1 22 0.000462070 1225 Q WS 3417112 + 8 [jbd2/sdc-8]

2863

8,32 1 23 0.000465422 1225 M WS 3417112 + 8 [jbd2/sdc-8]

2864

8,32 1 24 0.000474222 1225 I WS 3417056 + 64 [jbd2/sdc-8]

2865

8,32 1 0 0.000483022 0 m N cfq1225 insert_request

2866

8,32 1 25 0.000489727 1225 U N [jbd2/sdc-8] 1

2867

8,32 1 0 0.000498457 0 m N cfq1225 Not idling. st->count:1

2868

8,32 1 0 0.000503765 0 m N cfq1225 dispatch_insert

2869

8,32 1 0 0.000512914 0 m N cfq1225 dispatched a request

2870

8,32 1 0 0.000518851 0 m N cfq1225 activate rq, drv=2

.

.

.

8,32 0 0 58.515006138 0 m N cfq3551 complete rqnoidle 1

2875

8,32 0 2024 58.516603269 3 C WS 3156992 + 16 [0]

2876

8,32 0 0 58.516626736 0 m N cfq3551 complete rqnoidle 1

2877

8,32 0 0 58.516634558 0 m N cfq3551 arm_idle: 8 group_idle: 0

2878

8,32 0 0 58.516636933 0 m N cfq schedule dispatch

2879

8,32 1 0 58.516971613 0 m N cfq3551 slice expired t=0

2880

8,32 1 0 58.516982089 0 m N cfq3551 sl_used=13 disp=6 charge=13 iops=0 sect=80

2881

8,32 1 0 58.516985511 0 m N cfq3551 del_from_rr

2882

8,32 1 0 58.516990819 0 m N cfq3551 put_queue

2883

2884

CPU0 (sdc):

2885

Reads Queued: 0, 0KiB Writes Queued: 331, 26,284KiB

2886

Read Dispatches: 0, 0KiB Write Dispatches: 485, 40,484KiB

2887

Reads Requeued: 0 Writes Requeued: 0

2888

Reads Completed: 0, 0KiB Writes Completed: 511, 41,000KiB

2889

Read Merges: 0, 0KiB Write Merges: 13, 160KiB

2890

Read depth: 0 Write depth: 2

2891

IO unplugs: 23 Timer unplugs: 0

2892

CPU1 (sdc):

2893

Reads Queued: 0, 0KiB Writes Queued: 249, 15,800KiB

2894

Read Dispatches: 0, 0KiB Write Dispatches: 42, 1,600KiB

2895

Reads Requeued: 0 Writes Requeued: 0

2896

Reads Completed: 0, 0KiB Writes Completed: 16, 1,084KiB

2897

Read Merges: 0, 0KiB Write Merges: 40, 276KiB

2898

Read depth: 0 Write depth: 2

2899

IO unplugs: 30 Timer unplugs: 1

2900

2901

Total (sdc):

2902

Reads Queued: 0, 0KiB Writes Queued: 580, 42,084KiB

2903

Read Dispatches: 0, 0KiB Write Dispatches: 527, 42,084KiB

2904

Reads Requeued: 0 Writes Requeued: 0

2905

Reads Completed: 0, 0KiB Writes Completed: 527, 42,084KiB

2906

Read Merges: 0, 0KiB Write Merges: 53, 436KiB

2907

IO unplugs: 53 Timer unplugs: 1

2908

2909

Throughput (R/W): 0KiB/s / 719KiB/s

2910

Events (sdc): 6,592 entries

2911

Skips: 0 forward (0 - 0.0%)

2912

Input file sdc.blktrace.0 added

2913

Input file sdc.blktrace.1 added

2914

</literallayout>

2915

The report shows each event that was found in the blktrace data,

2916

along with a summary of the overall block I/O traffic during

2917

the run. You can look at the

2918

<ulink url='http://linux.die.net/man/1/blkparse'>blkparse</ulink>

2919

manpage to learn the

2920

meaning of each field displayed in the trace listing.

</para>

<para>

blktrace and blkparse are designed from the ground up to

2928

be able to operate together in a 'pipe mode' where the

2929

stdout of blktrace can be fed directly into the stdin of

2930

blkparse:

2931

2932

root@crownbay:~# blktrace /dev/sdc -o - | blkparse -i -

2933

</literallayout>

2934

This enables long-lived tracing sessions to run without

2935

writing anything to disk, and allows the user to look for

2936

certain conditions in the trace data in 'real-time' by

2937

viewing the trace output as it scrolls by on the screen or

2938

by passing it along to yet another program in the pipeline

2939

such as grep which can be used to identify and capture

2940

conditions of interest.

</para>

<para>

There's actually another blktrace command that implements

2945

the above pipeline as a single command, so the user doesn't

2946

have to bother typing in the above command sequence:

2947

2948

root@crownbay:~# btrace /dev/sdc

</literallayout>

</para>

</section>

<title>Using blktrace Remotely</title>

2955

2956

<para>

2957

Because blktrace traces block I/O and at the same time

2958

normally writes its trace data to a block device, and

2959

in general because it's not really a great idea to make

2960

the device being traced the same as the device the tracer

2961

writes to, blktrace provides a way to trace without

2962

perturbing the traced device at all by providing native

2963

support for sending all trace data over the network.

</para>

<para>

To have blktrace operate in this mode, start blktrace on

2968

the target system being traced with the -l option, along with

2969

the device to trace:

2970

2971

root@crownbay:~# blktrace -l /dev/sdc

2972

server: waiting for connections...

2973

</literallayout>

2974

On the host system, use the -h option to connect to the

2975

target system, also passing it the device to trace:

2976

2977

$ blktrace -d /dev/sdc -h 192.168.1.43

2978

blktrace: connecting to 192.168.1.43

2979

blktrace: connected!

2980

</literallayout>

2981

On the target system, you should see this:

2982

2983

server: connection from 192.168.1.43

2984

</literallayout>

2985

In another shell, execute a workload you want to trace.

2986

2987

root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>; sync

2988

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

2989

linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA

2990

</literallayout>

2991

When it's done, do a Ctrl-C on the host system to

stop the trace:

^C=== sdc ===

CPU 0: 7691 events, 361 KiB data

2996

CPU 1: 4109 events, 193 KiB data

2997

Total: 11800 events (dropped 0), 554 KiB data

2998

</literallayout>

2999

On the target system, you should also see a trace

3000

summary for the trace just ended:

3001

3002

server: end of run for 192.168.1.43:sdc

3003

=== sdc ===

3004

CPU 0: 7691 events, 361 KiB data

3005

CPU 1: 4109 events, 193 KiB data

3006

Total: 11800 events (dropped 0), 554 KiB data

3007

</literallayout>

3008

The blktrace instance on the host will save the target

3009

output inside a hostname-timestamp directory:

3010

3011

$ ls -al

3012

drwxr-xr-x 10 root root 1024 Oct 28 02:40 .

3013

drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..

3014

drwxr-xr-x 2 root root 1024 Oct 28 02:40 192.168.1.43-2012-10-28-02:40:56

3015

</literallayout>

3016

cd into that directory to see the output files:

3017

3018

$ ls -l

3019

-rw-r--r-- 1 root root 369193 Oct 28 02:44 sdc.blktrace.0

3020

-rw-r--r-- 1 root root 197278 Oct 28 02:44 sdc.blktrace.1

3021

</literallayout>

3022

And run blkparse on the host system using the device name:

$ blkparse sdc

8,32 1 1 0.000000000 1263 Q RM 6016 + 8 [ls]

3027

8,32 1 0 0.000036038 0 m N cfq1263 alloced

3028

8,32 1 2 0.000039390 1263 G RM 6016 + 8 [ls]

3029

8,32 1 3 0.000049168 1263 I RM 6016 + 8 [ls]

3030

8,32 1 0 0.000056152 0 m N cfq1263 insert_request

3031

8,32 1 0 0.000061600 0 m N cfq1263 add_to_rr

3032

8,32 1 0 0.000075498 0 m N cfq workload slice:300

.

.

.

8,32 0 0 177.266385696 0 m N cfq1267 arm_idle: 8 group_idle: 0

3037

8,32 0 0 177.266388140 0 m N cfq schedule dispatch

3038

8,32 1 0 177.266679239 0 m N cfq1267 slice expired t=0

3039

8,32 1 0 177.266689297 0 m N cfq1267 sl_used=9 disp=6 charge=9 iops=0 sect=56

3040

8,32 1 0 177.266692649 0 m N cfq1267 del_from_rr

3041

8,32 1 0 177.266696560 0 m N cfq1267 put_queue

3042

3043

CPU0 (sdc):

3044

Reads Queued: 0, 0KiB Writes Queued: 270, 21,708KiB

3045

Read Dispatches: 59, 2,628KiB Write Dispatches: 495, 39,964KiB

3046

Reads Requeued: 0 Writes Requeued: 0

3047

Reads Completed: 90, 2,752KiB Writes Completed: 543, 41,596KiB

3048

Read Merges: 0, 0KiB Write Merges: 9, 344KiB

3049

Read depth: 2 Write depth: 2

3050

IO unplugs: 20 Timer unplugs: 1

3051

CPU1 (sdc):

3052

Reads Queued: 688, 2,752KiB Writes Queued: 381, 20,652KiB

3053

Read Dispatches: 31, 124KiB Write Dispatches: 59, 2,396KiB

3054

Reads Requeued: 0 Writes Requeued: 0

3055

Reads Completed: 0, 0KiB Writes Completed: 11, 764KiB

3056

Read Merges: 598, 2,392KiB Write Merges: 88, 448KiB

3057

Read depth: 2 Write depth: 2

3058

IO unplugs: 52 Timer unplugs: 0

3059

3060

Total (sdc):

3061

Reads Queued: 688, 2,752KiB Writes Queued: 651, 42,360KiB

3062

Read Dispatches: 90, 2,752KiB Write Dispatches: 554, 42,360KiB

3063

Reads Requeued: 0 Writes Requeued: 0

3064

Reads Completed: 90, 2,752KiB Writes Completed: 554, 42,360KiB

3065

Read Merges: 598, 2,392KiB Write Merges: 97, 792KiB

3066

IO unplugs: 72 Timer unplugs: 1

3067

3068

Throughput (R/W): 15KiB/s / 238KiB/s

3069

Events (sdc): 9,301 entries

3070

Skips: 0 forward (0 - 0.0%)

3071

</literallayout>

3072

You should see the trace events and summary just as

3073

you would have if you'd run the same command on the target.

</para>

</section>

<title>Tracing Block I/O via 'ftrace'</title>

3079

3080

<para>

3081

It's also possible to trace block I/O using only

3082

3083

which can be useful for casual tracing

3084

if you don't want to bother dealing with the userspace tools.

</para>

<para>

To enable tracing for a given device, use

3089

/sys/block/xxx/trace/enable, where xxx is the device name.

3090

This for example enables tracing for /dev/sdc:

3091

3092

root@crownbay:/sys/kernel/debug/tracing# echo 1 > /sys/block/sdc/trace/enable

3093

</literallayout>

3094

Once you've selected the device(s) you want to trace,

3095

selecting the 'blk' tracer will turn the blk tracer on:

3096

3097

root@crownbay:/sys/kernel/debug/tracing# cat available_tracers

3098

blk function_graph function nop

3099

3100

root@crownbay:/sys/kernel/debug/tracing# echo blk > current_tracer

3101

</literallayout>

3102

Execute the workload you're interested in:

3103

3104

root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt

3105

</literallayout>

3106

And look at the output (note here that we're using

3107

'trace_pipe' instead of trace to capture this trace -

3108

this allows us to wait around on the pipe for data to

3109

appear):

3110

3111

root@crownbay:/sys/kernel/debug/tracing# cat trace_pipe

3112

cat-3587 [001] d..1 3023.276361: 8,32 Q R 1699848 + 8 [cat]

3113

cat-3587 [001] d..1 3023.276410: 8,32 m N cfq3587 alloced

3114

cat-3587 [001] d..1 3023.276415: 8,32 G R 1699848 + 8 [cat]

3115

cat-3587 [001] d..1 3023.276424: 8,32 P N [cat]

3116

cat-3587 [001] d..2 3023.276432: 8,32 I R 1699848 + 8 [cat]

3117

cat-3587 [001] d..1 3023.276439: 8,32 m N cfq3587 insert_request

3118

cat-3587 [001] d..1 3023.276445: 8,32 m N cfq3587 add_to_rr

3119

cat-3587 [001] d..2 3023.276454: 8,32 U N [cat] 1

3120

cat-3587 [001] d..1 3023.276464: 8,32 m N cfq workload slice:150

3121

cat-3587 [001] d..1 3023.276471: 8,32 m N cfq3587 set_active wl_prio:0 wl_type:2

3122

cat-3587 [001] d..1 3023.276478: 8,32 m N cfq3587 fifo= (null)

3123

cat-3587 [001] d..1 3023.276483: 8,32 m N cfq3587 dispatch_insert

3124

cat-3587 [001] d..1 3023.276490: 8,32 m N cfq3587 dispatched a request

3125

cat-3587 [001] d..1 3023.276497: 8,32 m N cfq3587 activate rq, drv=1

3126

cat-3587 [001] d..2 3023.276500: 8,32 D R 1699848 + 8 [cat]

3127

</literallayout>

3128

And this turns off tracing for the specified device:

3129

3130

root@crownbay:/sys/kernel/debug/tracing# echo 0 > /sys/block/sdc/trace/enable

</literallayout>

</para>

</section>

</section>

<title>Documentation</title>

3138

3139

<para>

3140

Online versions of the man pages for the commands discussed

3141

in this section can be found here:

3142

3143

<listitem><para><ulink url='http://linux.die.net/man/8/blktrace'>http://linux.die.net/man/8/blktrace</ulink>

3144

</para></listitem>

3145

<listitem><para><ulink url='http://linux.die.net/man/1/blkparse'>http://linux.die.net/man/1/blkparse</ulink>

3146

</para></listitem>

3147

<listitem><para><ulink url='http://linux.die.net/man/8/btrace'>http://linux.die.net/man/8/btrace</ulink>

</para></listitem>

</itemizedlist>

</para>

<para>

The above manpages, along with manpages for the other

3154

blktrace utilities (btt, blkiomon, etc) can be found in the

3155

/doc directory of the blktrace tools git repo:

3156

3157

$ git clone git://git.kernel.dk/blktrace.git

</literallayout>

</para>

</section>

</section>

</chapter>

<!--

vim: expandtab tw=80 ts=4

3165

-->