Blame - import-layers/yocto-poky/documentation/profile-manual/profile-manual-usage.xml - openbmc/openbmc

wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

130

</literallayout>

131

The quickest and easiest way to get some basic overall data about

132

what's going on for a particular workload is to profile it using

133

'perf stat'. 'perf stat' basically profiles using a few default

134

counters and displays the summed counts at the end of the run:

135

136

root@crownbay:~# perf stat wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

137

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

138

linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA

139

140

Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':

141

142

4597.223902 task-clock # 0.077 CPUs utilized

143

23568 context-switches # 0.005 M/sec

144

68 CPU-migrations # 0.015 K/sec

145

241 page-faults # 0.052 K/sec

146

3045817293 cycles # 0.663 GHz

147

<not supported> stalled-cycles-frontend

148

<not supported> stalled-cycles-backend

149

858909167 instructions # 0.28 insns per cycle

150

165441165 branches # 35.987 M/sec

151

19550329 branch-misses # 11.82% of all branches

152

153

59.836627620 seconds time elapsed

154

</literallayout>

155

Many times such a simple-minded test doesn't yield much of

156

interest, but sometimes it does (see Real-world Yocto bug

157

(slow loop-mounted write speed)).

</para>

<para>

Also, note that 'perf stat' isn't restricted to a fixed set of

162

counters - basically any event listed in the output of 'perf list'

163

can be tallied by 'perf stat'. For example, suppose we wanted to

164

see a summary of all the events related to kernel memory

165

allocation/freeing along with cache hits and misses:

166

167

root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

168

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

169

linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA

170

171

Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':

172

173

5566 kmem:kmalloc

174

125517 kmem:kmem_cache_alloc

175

0 kmem:kmalloc_node

176

0 kmem:kmem_cache_alloc_node

177

34401 kmem:kfree

178

69920 kmem:kmem_cache_free

179

133 kmem:mm_page_free

180

41 kmem:mm_page_free_batched

181

11502 kmem:mm_page_alloc

182

11375 kmem:mm_page_alloc_zone_locked

183

0 kmem:mm_page_pcpu_drain

184

0 kmem:mm_page_alloc_extfrag

185

66848602 cache-references

186

2917740 cache-misses # 4.365 % of all cache refs

187

188

44.831023415 seconds time elapsed

189

</literallayout>

190

So 'perf stat' gives us a nice easy way to get a quick overview of

191

what might be happening for a set of events, but normally we'd

192

need a little more detail in order to understand what's going on

193

in a way that we can act on in a useful way.

</para>

<para>

To dive down into a next level of detail, we can use 'perf

198

record'/'perf report' which will collect profiling data and

199

present it to use using an interactive text-based UI (or

200

simply as text if we specify --stdio to 'perf report').

</para>

<para>

As our first attempt at profiling this workload, we'll simply

205

run 'perf record', handing it the workload we want to profile

206

(everything after 'perf record' and any perf options we hand

207

it - here none - will be executed in a new shell). perf collects

208

samples until the process exits and records them in a file named

209

'perf.data' in the current working directory.

210

211

root@crownbay:~# perf record wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

212

213

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

214

linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA

215

[ perf record: Woken up 1 times to write data ]

216

[ perf record: Captured and wrote 0.176 MB perf.data (~7700 samples) ]

217

</literallayout>

218

To see the results in a 'text-based UI' (tui), simply run

219

'perf report', which will read the perf.data file in the current

220

working directory and display the results in an interactive UI:

221

222

root@crownbay:~# perf report

</literallayout>

</para>

<para>

</para>

<para>

The above screenshot displays a 'flat' profile, one entry for

232

each 'bucket' corresponding to the functions that were profiled

233

during the profiling run, ordered from the most popular to the

234

least (perf has options to sort in various orders and keys as

235

well as display entries only above a certain threshold and so

236

on - see the perf documentation for details). Note that this

237

includes both userspace functions (entries containing a [.]) and

238

kernel functions accounted to the process (entries containing

239

a [k]). (perf has command-line modifiers that can be used to

240

restrict the profiling to kernel or userspace, among others).

</para>

<para>

Notice also that the above report shows an entry for 'busybox',

245

which is the executable that implements 'wget' in Yocto, but that

246

instead of a useful function name in that entry, it displays

247

a not-so-friendly hex value instead. The steps below will show

248

how to fix that problem.

</para>

<para>

Before we do that, however, let's try running a different profile,

253

one which shows something a little more interesting. The only

254

difference between the new profile and the previous one is that

255

we'll add the -g option, which will record not just the address

256

of a sampled function, but the entire callchain to the sampled

257

function as well:

258

259

root@crownbay:~# perf record -g wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

260

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

261

linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA

262

[ perf record: Woken up 3 times to write data ]

263

[ perf record: Captured and wrote 0.652 MB perf.data (~28476 samples) ]

264

265

266

root@crownbay:~# perf report

</literallayout>

</para>

<para>

</para>

<para>

Using the callgraph view, we can actually see not only which

276

functions took the most time, but we can also see a summary of

277

how those functions were called and learn something about how the

278

program interacts with the kernel in the process.

</para>

<para>

Notice that each entry in the above screenshot now contains a '+'

283

on the left-hand side. This means that we can expand the entry and

284

drill down into the callchains that feed into that entry.

285

Pressing 'enter' on any one of them will expand the callchain

286

(you can also press 'E' to expand them all at the same time or 'C'

287

to collapse them all).

</para>

<para>

In the screenshot above, we've toggled the __copy_to_user_ll()

292

entry and several subnodes all the way down. This lets us see

293

which callchains contributed to the profiled __copy_to_user_ll()

294

function which contributed 1.77% to the total profile.

</para>

<para>

As a bit of background explanation for these callchains, think

299

about what happens at a high level when you run wget to get a file

300

out on the network. Basically what happens is that the data comes

301

into the kernel via the network connection (socket) and is passed

302

to the userspace program 'wget' (which is actually a part of

303

busybox, but that's not important for now), which takes the buffers

304

the kernel passes to it and writes it to a disk file to save it.

</para>

<para>

The part of this process that we're looking at in the above call

309

stacks is the part where the kernel passes the data it's read from

310

the socket down to wget i.e. a copy-to-user.

</para>

<para>

Notice also that here there's also a case where the hex value

315

is displayed in the callstack, here in the expanded

316

sys_clock_gettime() function. Later we'll see it resolve to a

317

userspace function call in busybox.

</para>

<para>

</para>

<para>

The above screenshot shows the other half of the journey for the

326

data - from the wget program's userspace buffers to disk. To get

327

the buffers to disk, the wget program issues a write(2), which

328

does a copy-from-user to the kernel, which then takes care via

329

some circuitous path (probably also present somewhere in the

330

profile data), to get it safely to disk.

</para>

<para>

Now that we've seen the basic layout of the profile data and the

335

basics of how to extract useful information out of it, let's get

336

back to the task at hand and see if we can get some basic idea

337

about where the time is spent in the program we're profiling,

338

wget. Remember that wget is actually implemented as an applet

339

in busybox, so while the process name is 'wget', the executable

340

we're actually interested in is busybox. So let's expand the

341

first entry containing busybox:

</para>

<para>

</para>

<para>

Again, before we expanded we saw that the function was labeled

350

with a hex value instead of a symbol as with most of the kernel

351

entries. Expanding the busybox entry doesn't make it any better.

</para>

<para>

The problem is that perf can't find the symbol information for the

356

busybox binary, which is actually stripped out by the Yocto build

system.

</para>

<para>

Patrick Williams

c0f7c04

2017-02-23 20:41:17 -0600

[diff] [blame^]

361

One way around that is to put the following in your

362

<filename>local.conf</filename> file when you build the image:

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

363

Patrick Williams

c0f7c04

2017-02-23 20:41:17 -0600

[diff] [blame^]

364

<ulink url='&YOCTO_DOCS_REF_URL;#var-INHIBIT_PACKAGE_STRIP'>INHIBIT_PACKAGE_STRIP</ulink> = "1"

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

365

</literallayout>

366

However, we already have an image with the binaries stripped,

367

so what can we do to get perf to resolve the symbols? Basically

368

we need to install the debuginfo for the busybox package.

</para>

<para>

To generate the debug info for the packages in the image, we can

373

add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example:

374

375

EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs"

376

</literallayout>

377

Additionally, in order to generate the type of debuginfo that

378

perf understands, we also need to add the following to local.conf:

379

380

PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory'

381

</literallayout>

382

Once we've done that, we can install the debuginfo for busybox.

383

The debug packages once built can be found in

384

build/tmp/deploy/rpm/* on the host system. Find the

385

busybox-dbg-...rpm file and copy it to the target. For example:

386

387

[trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2_32/busybox-dbg-1.20.2-r2.core2_32.rpm root@192.168.1.31:

388

root@192.168.1.31's password:

389

busybox-dbg-1.20.2-r2.core2_32.rpm 100% 1826KB 1.8MB/s 00:01

390

</literallayout>

391

Now install the debug rpm on the target:

392

393

root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2_32.rpm

394

</literallayout>

395

Now that the debuginfo is installed, we see that the busybox

396

entries now display their functions symbolically:

</para>

<para>

</para>

<para>

If we expand one of the entries and press 'enter' on a leaf node,

405

we're presented with a menu of actions we can take to get more

406

information related to that entry:

</para>

<para>

</para>

<para>

One of these actions allows us to show a view that displays a

415

busybox-centric view of the profiled functions (in this case we've

416

also expanded all the nodes using the 'E' key):

</para>

<para>

</para>

<para>

Finally, we can see that now that the busybox debuginfo is

425

installed, the previously unresolved symbol in the

426

sys_clock_gettime() entry mentioned previously is now resolved,

427

and shows that the sys_clock_gettime system call that was the

428

source of 6.75% of the copy-to-user overhead was initiated by

429

the handle_input() busybox function:

</para>

<para>

</para>

<para>

At the lowest level of detail, we can dive down to the assembly

438

level and see which instructions caused the most overhead in a

439

function. Pressing 'enter' on the 'udhcpc_main' function, we're

440

again presented with a menu:

</para>

<para>

</para>

<para>

Selecting 'Annotate udhcpc_main', we get a detailed listing of

449

percentages by instruction for the udhcpc_main function. From the

450

display, we can see that over 50% of the time spent in this

451

function is taken up by a couple tests and the move of a

452

constant (1) to a register:

</para>

<para>

</para>

<para>

As a segue into tracing, let's try another profile using a

461

different counter, something other than the default 'cycles'.

</para>

<para>

The tracing and profiling infrastructure in Linux has become

466

unified in a way that allows us to use the same tool with a

467

completely different set of counters, not just the standard

468

hardware counters that traditional tools have had to restrict

469

themselves to (of course the traditional tools can also make use

470

of the expanded possibilities now available to them, and in some

471

cases have, as mentioned previously).

</para>

<para>

We can get a list of the available events that can be used to

476

profile a workload via 'perf list':

477

478

root@crownbay:~# perf list

479

480

List of pre-defined events (to be used in -e):

481

cpu-cycles OR cycles [Hardware event]

482

stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]

483

stalled-cycles-backend OR idle-cycles-backend [Hardware event]

484

instructions [Hardware event]

485

cache-references [Hardware event]

486

cache-misses [Hardware event]

487

branch-instructions OR branches [Hardware event]

488

branch-misses [Hardware event]

489

bus-cycles [Hardware event]

490

ref-cycles [Hardware event]

491

492

cpu-clock [Software event]

493

task-clock [Software event]

494

page-faults OR faults [Software event]

495

minor-faults [Software event]

496

major-faults [Software event]

497

context-switches OR cs [Software event]

498

cpu-migrations OR migrations [Software event]

499

alignment-faults [Software event]

500

emulation-faults [Software event]

501

502

L1-dcache-loads [Hardware cache event]

503

L1-dcache-load-misses [Hardware cache event]

504

L1-dcache-prefetch-misses [Hardware cache event]

505

L1-icache-loads [Hardware cache event]

506

L1-icache-load-misses [Hardware cache event]

.

.

.

rNNN [Raw hardware event descriptor]

511

cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]

512

(see 'perf list --help' on how to encode it)

513

514

mem:<addr>[:access] [Hardware breakpoint]

515

516

sunrpc:rpc_call_status [Tracepoint event]

517

sunrpc:rpc_bind_status [Tracepoint event]

518

sunrpc:rpc_connect_status [Tracepoint event]

519

sunrpc:rpc_task_begin [Tracepoint event]

520

skb:kfree_skb [Tracepoint event]

521

skb:consume_skb [Tracepoint event]

522

skb:skb_copy_datagram_iovec [Tracepoint event]

523

net:net_dev_xmit [Tracepoint event]

524

net:net_dev_queue [Tracepoint event]

525

net:netif_receive_skb [Tracepoint event]

526

net:netif_rx [Tracepoint event]

527

napi:napi_poll [Tracepoint event]

528

sock:sock_rcvqueue_full [Tracepoint event]

529

sock:sock_exceed_buf_limit [Tracepoint event]

530

udp:udp_fail_queue_rcv_skb [Tracepoint event]

531

hda:hda_send_cmd [Tracepoint event]

532

hda:hda_get_response [Tracepoint event]

533

hda:hda_bus_reset [Tracepoint event]

534

scsi:scsi_dispatch_cmd_start [Tracepoint event]

535

scsi:scsi_dispatch_cmd_error [Tracepoint event]

536

scsi:scsi_eh_wakeup [Tracepoint event]

537

drm:drm_vblank_event [Tracepoint event]

538

drm:drm_vblank_event_queued [Tracepoint event]

539

drm:drm_vblank_event_delivered [Tracepoint event]

540

random:mix_pool_bytes [Tracepoint event]

541

random:mix_pool_bytes_nolock [Tracepoint event]

542

random:credit_entropy_bits [Tracepoint event]

543

gpio:gpio_direction [Tracepoint event]

544

gpio:gpio_value [Tracepoint event]

545

block:block_rq_abort [Tracepoint event]

546

block:block_rq_requeue [Tracepoint event]

547

block:block_rq_issue [Tracepoint event]

548

block:block_bio_bounce [Tracepoint event]

549

block:block_bio_complete [Tracepoint event]

550

block:block_bio_backmerge [Tracepoint event]

551

.

552

.

553

writeback:writeback_wake_thread [Tracepoint event]

554

writeback:writeback_wake_forker_thread [Tracepoint event]

555

writeback:writeback_bdi_register [Tracepoint event]

556

.

557

.

558

writeback:writeback_single_inode_requeue [Tracepoint event]

559

writeback:writeback_single_inode [Tracepoint event]

560

kmem:kmalloc [Tracepoint event]

561

kmem:kmem_cache_alloc [Tracepoint event]

562

kmem:mm_page_alloc [Tracepoint event]

563

kmem:mm_page_alloc_zone_locked [Tracepoint event]

564

kmem:mm_page_pcpu_drain [Tracepoint event]

565

kmem:mm_page_alloc_extfrag [Tracepoint event]

566

vmscan:mm_vmscan_kswapd_sleep [Tracepoint event]

567

vmscan:mm_vmscan_kswapd_wake [Tracepoint event]

568

vmscan:mm_vmscan_wakeup_kswapd [Tracepoint event]

569

vmscan:mm_vmscan_direct_reclaim_begin [Tracepoint event]

570

.

571

.

572

module:module_get [Tracepoint event]

573

module:module_put [Tracepoint event]

574

module:module_request [Tracepoint event]

575

sched:sched_kthread_stop [Tracepoint event]

576

sched:sched_wakeup [Tracepoint event]

577

sched:sched_wakeup_new [Tracepoint event]

578

sched:sched_process_fork [Tracepoint event]

579

sched:sched_process_exec [Tracepoint event]

580

sched:sched_stat_runtime [Tracepoint event]

581

rcu:rcu_utilization [Tracepoint event]

582

workqueue:workqueue_queue_work [Tracepoint event]

583

workqueue:workqueue_execute_end [Tracepoint event]

584

signal:signal_generate [Tracepoint event]

585

signal:signal_deliver [Tracepoint event]

586

timer:timer_init [Tracepoint event]

587

timer:timer_start [Tracepoint event]

588

timer:hrtimer_cancel [Tracepoint event]

589

timer:itimer_state [Tracepoint event]

590

timer:itimer_expire [Tracepoint event]

591

irq:irq_handler_entry [Tracepoint event]

592

irq:irq_handler_exit [Tracepoint event]

593

irq:softirq_entry [Tracepoint event]

594

irq:softirq_exit [Tracepoint event]

595

irq:softirq_raise [Tracepoint event]

596

printk:console [Tracepoint event]

597

task:task_newtask [Tracepoint event]

598

task:task_rename [Tracepoint event]

599

syscalls:sys_enter_socketcall [Tracepoint event]

600

syscalls:sys_exit_socketcall [Tracepoint event]

.

.

.

syscalls:sys_enter_unshare [Tracepoint event]

605

syscalls:sys_exit_unshare [Tracepoint event]

606

raw_syscalls:sys_enter [Tracepoint event]

607

raw_syscalls:sys_exit [Tracepoint event]

</literallayout>

</para>

<emphasis>Tying it Together:</emphasis> These are exactly the same set of events defined

613

by the trace event subsystem and exposed by

614

ftrace/tracecmd/kernelshark as files in

615

/sys/kernel/debug/tracing/events, by SystemTap as

616

kernel.trace("tracepoint_name") and (partially) accessed by LTTng.

</informalexample>

<para>

Only a subset of these would be of interest to us when looking at

621

this workload, so let's choose the most likely subsystems

622

(identified by the string before the colon in the Tracepoint events)

623

and do a 'perf stat' run using only those wildcarded subsystems:

624

625

root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

626

Performance counter stats for 'wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>':

23323 skb:kfree_skb

0 skb:consume_skb

49897 skb:skb_copy_datagram_iovec

631

6217 net:net_dev_xmit

632

6217 net:net_dev_queue

633

7962 net:netif_receive_skb

634

2 net:netif_rx

635

8340 napi:napi_poll

636

0 sched:sched_kthread_stop

637

0 sched:sched_kthread_stop_ret

638

3749 sched:sched_wakeup

639

0 sched:sched_wakeup_new

640

0 sched:sched_switch

641

29 sched:sched_migrate_task

642

0 sched:sched_process_free

643

1 sched:sched_process_exit

644

0 sched:sched_wait_task

645

0 sched:sched_process_wait

646

0 sched:sched_process_fork

647

1 sched:sched_process_exec

648

0 sched:sched_stat_wait

649

2106519415641 sched:sched_stat_sleep

650

0 sched:sched_stat_iowait

651

147453613 sched:sched_stat_blocked

652

12903026955 sched:sched_stat_runtime

653

0 sched:sched_pi_setprio

654

3574 workqueue:workqueue_queue_work

655

3574 workqueue:workqueue_activate_work

656

0 workqueue:workqueue_execute_start

657

0 workqueue:workqueue_execute_end

658

16631 irq:irq_handler_entry

659

16631 irq:irq_handler_exit

660

28521 irq:softirq_entry

661

28521 irq:softirq_exit

662

28728 irq:softirq_raise

663

1 syscalls:sys_enter_sendmmsg

664

1 syscalls:sys_exit_sendmmsg

665

0 syscalls:sys_enter_recvmmsg

666

0 syscalls:sys_exit_recvmmsg

667

14 syscalls:sys_enter_socketcall

668

14 syscalls:sys_exit_socketcall

.

.

.

16965 syscalls:sys_enter_read

673

16965 syscalls:sys_exit_read

674

12854 syscalls:sys_enter_write

675

12854 syscalls:sys_exit_write

.

.

.

58.029710972 seconds time elapsed

681

</literallayout>

682

Let's pick one of these tracepoints and tell perf to do a profile

683

using it as the sampling event:

684

685

root@crownbay:~# perf record -g -e sched:sched_wakeup wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

</literallayout>

</para>

<para>

</para>

<para>

The screenshot above shows the results of running a profile using

695

sched:sched_switch tracepoint, which shows the relative costs of

696

various paths to sched_wakeup (note that sched_wakeup is the

697

name of the tracepoint - it's actually defined just inside

698

ttwu_do_wakeup(), which accounts for the function name actually

699

displayed in the profile:

700

701

/*

702

* Mark the task runnable and perform wakeup-preemption.

703

*/

704

static void

705

ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)

706

{

707

trace_sched_wakeup(p, true);

.

.

.

}

</literallayout>

A couple of the more interesting callchains are expanded and

714

displayed above, basically some network receive paths that

715

presumably end up waking up wget (busybox) when network data is

ready.

</para>

<para>

Note that because tracepoints are normally used for tracing,

721

the default sampling period for tracepoints is 1 i.e. for

722

tracepoints perf will sample on every event occurrence (this

723

can be changed using the -c option). This is in contrast to

724

hardware counters such as for example the default 'cycles'

725

hardware counter used for normal profiling, where sampling

726

periods are much higher (in the thousands) because profiling should

727

have as low an overhead as possible and sampling on every cycle

728

would be prohibitively expensive.

</para>

</section>

<title>Using perf to do Basic Tracing</title>

734

735

<para>

736

Profiling is a great tool for solving many problems or for

737

getting a high-level view of what's going on with a workload or

738

across the system. It is however by definition an approximation,

739

as suggested by the most prominent word associated with it,

740

'sampling'. On the one hand, it allows a representative picture of

741

what's going on in the system to be cheaply taken, but on the other

742

hand, that cheapness limits its utility when that data suggests a

743

need to 'dive down' more deeply to discover what's really going

744

on. In such cases, the only way to see what's really going on is

745

to be able to look at (or summarize more intelligently) the

746

individual steps that go into the higher-level behavior exposed

747

by the coarse-grained profiling data.

</para>

<para>

As a concrete example, we can trace all the events we think might

752

be applicable to our workload:

753

754

root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:*

755

-e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write

756

wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

757

</literallayout>

758

We can look at the raw trace output using 'perf script' with no

759

arguments:

760

761

root@crownbay:~# perf script

762

763

perf 1262 [000] 11624.857082: sys_exit_read: 0x0

764

perf 1262 [000] 11624.857193: sched_wakeup: comm=migration/0 pid=6 prio=0 success=1 target_cpu=000

765

wget 1262 [001] 11624.858021: softirq_raise: vec=1 [action=TIMER]

766

wget 1262 [001] 11624.858074: softirq_entry: vec=1 [action=TIMER]

767

wget 1262 [001] 11624.858081: softirq_exit: vec=1 [action=TIMER]

768

wget 1262 [001] 11624.858166: sys_enter_read: fd: 0x0003, buf: 0xbf82c940, count: 0x0200

769

wget 1262 [001] 11624.858177: sys_exit_read: 0x200

770

wget 1262 [001] 11624.858878: kfree_skb: skbaddr=0xeb248d80 protocol=0 location=0xc15a5308

771

wget 1262 [001] 11624.858945: kfree_skb: skbaddr=0xeb248000 protocol=0 location=0xc15a5308

772

wget 1262 [001] 11624.859020: softirq_raise: vec=1 [action=TIMER]

773

wget 1262 [001] 11624.859076: softirq_entry: vec=1 [action=TIMER]

774

wget 1262 [001] 11624.859083: softirq_exit: vec=1 [action=TIMER]

775

wget 1262 [001] 11624.859167: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400

776

wget 1262 [001] 11624.859192: sys_exit_read: 0x1d7

777

wget 1262 [001] 11624.859228: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400

778

wget 1262 [001] 11624.859233: sys_exit_read: 0x0

779

wget 1262 [001] 11624.859573: sys_enter_read: fd: 0x0003, buf: 0xbf82c580, count: 0x0200

780

wget 1262 [001] 11624.859584: sys_exit_read: 0x200

781

wget 1262 [001] 11624.859864: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400

782

wget 1262 [001] 11624.859888: sys_exit_read: 0x400

783

wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400

784

wget 1262 [001] 11624.859944: sys_exit_read: 0x400

785

</literallayout>

786

This gives us a detailed timestamped sequence of events that

787

occurred within the workload with respect to those events.

</para>

<para>

In many ways, profiling can be viewed as a subset of tracing -

792

theoretically, if you have a set of trace events that's sufficient

793

to capture all the important aspects of a workload, you can derive

794

any of the results or views that a profiling run can.

</para>

<para>

Another aspect of traditional profiling is that while powerful in

799

many ways, it's limited by the granularity of the underlying data.

800

Profiling tools offer various ways of sorting and presenting the

801

sample data, which make it much more useful and amenable to user

802

experimentation, but in the end it can't be used in an open-ended

803

way to extract data that just isn't present as a consequence of

804

the fact that conceptually, most of it has been thrown away.

</para>

<para>

Full-blown detailed tracing data does however offer the opportunity

809

to manipulate and present the information collected during a

810

tracing run in an infinite variety of ways.

</para>

<para>

Another way to look at it is that there are only so many ways that

815

the 'primitive' counters can be used on their own to generate

816

interesting output; to get anything more complicated than simple

817

counts requires some amount of additional logic, which is typically

818

very specific to the problem at hand. For example, if we wanted to

819

make use of a 'counter' that maps to the value of the time

820

difference between when a process was scheduled to run on a

821

processor and the time it actually ran, we wouldn't expect such

822

a counter to exist on its own, but we could derive one called say

823

'wakeup_latency' and use it to extract a useful view of that metric

824

from trace data. Likewise, we really can't figure out from standard

825

profiling tools how much data every process on the system reads and

826

writes, along with how many of those reads and writes fail

827

completely. If we have sufficient trace data, however, we could

828

with the right tools easily extract and present that information,

829

but we'd need something other than pre-canned profiling tools to

do that.

</para>

<para>

Luckily, there is a general-purpose way to handle such needs,

835

called 'programming languages'. Making programming languages

836

easily available to apply to such problems given the specific

837

format of data is called a 'programming language binding' for

838

that data and language. Perf supports two programming language

839

bindings, one for Python and one for Perl.

</para>

<emphasis>Tying it Together:</emphasis> Language bindings for manipulating and

844

aggregating trace data are of course not a new

845

idea. One of the first projects to do this was IBM's DProbes

846

dpcc compiler, an ANSI C compiler which targeted a low-level

847

assembly language running on an in-kernel interpreter on the

848

target system. This is exactly analogous to what Sun's DTrace

849

did, except that DTrace invented its own language for the purpose.

850

Systemtap, heavily inspired by DTrace, also created its own

851

one-off language, but rather than running the product on an

852

in-kernel interpreter, created an elaborate compiler-based

853

machinery to translate its language into kernel modules written

in C.

</informalexample>

<para>

Now that we have the trace data in perf.data, we can use

859

'perf script -g' to generate a skeleton script with handlers

860

for the read/write entry/exit events we recorded:

861

862

root@crownbay:~# perf script -g python

863

generated Python script: perf-script.py

864

</literallayout>

865

The skeleton script simply creates a python function for each

866

event type in the perf.data file. The body of each function simply

867

prints the event name along with its parameters. For example:

868

869

def net__netif_rx(event_name, context, common_cpu,

870

common_secs, common_nsecs, common_pid, common_comm,

871

skbaddr, len, name):

872

print_header(event_name, common_cpu, common_secs, common_nsecs,

873

common_pid, common_comm)

874

875

print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name),

876

</literallayout>

877

We can run that script directly to print all of the events

878

contained in the perf.data file:

879

880

root@crownbay:~# perf script -s perf-script.py

881

882

in trace_begin

883

syscalls__sys_exit_read 0 11624.857082795 1262 perf nr=3, ret=0

884

sched__sched_wakeup 0 11624.857193498 1262 perf comm=migration/0, pid=6, prio=0, success=1, target_cpu=0

885

irq__softirq_raise 1 11624.858021635 1262 wget vec=TIMER

886

irq__softirq_entry 1 11624.858074075 1262 wget vec=TIMER

887

irq__softirq_exit 1 11624.858081389 1262 wget vec=TIMER

888

syscalls__sys_enter_read 1 11624.858166434 1262 wget nr=3, fd=3, buf=3213019456, count=512

889

syscalls__sys_exit_read 1 11624.858177924 1262 wget nr=3, ret=512

890

skb__kfree_skb 1 11624.858878188 1262 wget skbaddr=3945041280, location=3243922184, protocol=0

891

skb__kfree_skb 1 11624.858945608 1262 wget skbaddr=3945037824, location=3243922184, protocol=0

892

irq__softirq_raise 1 11624.859020942 1262 wget vec=TIMER

893

irq__softirq_entry 1 11624.859076935 1262 wget vec=TIMER

894

irq__softirq_exit 1 11624.859083469 1262 wget vec=TIMER

895

syscalls__sys_enter_read 1 11624.859167565 1262 wget nr=3, fd=3, buf=3077701632, count=1024

896

syscalls__sys_exit_read 1 11624.859192533 1262 wget nr=3, ret=471

897

syscalls__sys_enter_read 1 11624.859228072 1262 wget nr=3, fd=3, buf=3077701632, count=1024

898

syscalls__sys_exit_read 1 11624.859233707 1262 wget nr=3, ret=0

899

syscalls__sys_enter_read 1 11624.859573008 1262 wget nr=3, fd=3, buf=3213018496, count=512

900

syscalls__sys_exit_read 1 11624.859584818 1262 wget nr=3, ret=512

901

syscalls__sys_enter_read 1 11624.859864562 1262 wget nr=3, fd=3, buf=3077701632, count=1024

902

syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024

903

syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024

904

syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024

905

</literallayout>

906

That in itself isn't very useful; after all, we can accomplish

907

pretty much the same thing by simply running 'perf script'

908

without arguments in the same directory as the perf.data file.

</para>

<para>

We can however replace the print statements in the generated

913

function bodies with whatever we want, and thereby make it

914

infinitely more useful.

</para>

<para>

As a simple example, let's just replace the print statements in

919

the function bodies with a simple function that does nothing but

920

increment a per-event count. When the program is run against a

921

perf.data file, each time a particular event is encountered,

922

a tally is incremented for that event. For example:

923

924

def net__netif_rx(event_name, context, common_cpu,

925

common_secs, common_nsecs, common_pid, common_comm,

926

skbaddr, len, name):

927

inc_counts(event_name)

928

</literallayout>

929

Each event handler function in the generated code is modified

930

to do this. For convenience, we define a common function called

931

inc_counts() that each handler calls; inc_counts() simply tallies

932

a count for each event using the 'counts' hash, which is a

933

specialized hash function that does Perl-like autovivification, a

934

capability that's extremely useful for kinds of multi-level

935

aggregation commonly used in processing traces (see perf's

936

documentation on the Python language binding for details):

counts = autodict()

def inc_counts(event_name):

941

try:

942

counts[event_name] += 1

943

except TypeError:

944

counts[event_name] = 1

945

</literallayout>

946

Finally, at the end of the trace processing run, we want to

947

print the result of all the per-event tallies. For that, we

948

use the special 'trace_end()' function:

949

950

def trace_end():

951

for event_name, count in counts.iteritems():

952

print "%-40s %10s\n" % (event_name, count)

953

</literallayout>

954

The end result is a summary of all the events recorded in the

955

trace:

956

957

skb__skb_copy_datagram_iovec 13148

958

irq__softirq_entry 4796

959

irq__irq_handler_exit 3805

960

irq__softirq_exit 4795

961

syscalls__sys_enter_write 8990

962

net__net_dev_xmit 652

963

skb__kfree_skb 4047

964

sched__sched_wakeup 1155

965

irq__irq_handler_entry 3804

966

irq__softirq_raise 4799

967

net__net_dev_queue 652

968

syscalls__sys_enter_read 17599

969

net__netif_receive_skb 1743

970

syscalls__sys_exit_read 17598

971

net__netif_rx 2

972

napi__napi_poll 1877

973

syscalls__sys_exit_write 8990

974

</literallayout>

975

Note that this is pretty much exactly the same information we get

976

from 'perf stat', which goes a little way to support the idea

977

mentioned previously that given the right kind of trace data,

978

higher-level profiling-type summaries can be derived from it.

</para>

<para>

Documentation on using the

983

<ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>.

</para>

</section>

<title>System-Wide Tracing and Profiling</title>

989

990

<para>

991

The examples so far have focused on tracing a particular program or

992

workload - in other words, every profiling run has specified the

993

program to profile in the command-line e.g. 'perf record wget ...'.

</para>

<para>

It's also possible, and more interesting in many cases, to run a

998

system-wide profile or trace while running the workload in a

separate shell.

</para>

<para>

To do system-wide profiling or tracing, you typically use

1004

the -a flag to 'perf record'.

</para>

<para>

To demonstrate this, open up one window and start the profile

1009

using the -a flag (press Ctrl-C to stop tracing):

1010

1011

root@crownbay:~# perf record -g -a

1012

^C[ perf record: Woken up 6 times to write data ]

1013

[ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ]

1014

</literallayout>

1015

In another window, run the wget test:

1016

1017

root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>

1018

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

1019

linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA

1020

</literallayout>

1021

Here we see entries not only for our wget load, but for other

1022

processes running on the system as well:

</para>

<para>

</para>

<para>

In the snapshot above, we can see callchains that originate in

1031

libc, and a callchain from Xorg that demonstrates that we're

1032

using a proprietary X driver in userspace (notice the presence

1033

of 'PVR' and some other unresolvable symbols in the expanded

Xorg callchain).

</para>

<para>

Note also that we have both kernel and userspace entries in the

1039

above snapshot. We can also tell perf to focus on userspace but

1040

providing a modifier, in this case 'u', to the 'cycles' hardware

1041

counter when we record a profile:

1042

1043

root@crownbay:~# perf record -g -a -e cycles:u

1044

^C[ perf record: Woken up 2 times to write data ]

1045

[ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ]

</literallayout>

</para>

<para>

</para>

<para>

Notice in the screenshot above, we see only userspace entries ([.])

</para>

<para>

Finally, we can press 'enter' on a leaf node and select the 'Zoom

1059

into DSO' menu item to show only entries associated with a

1060

specific DSO. In the screenshot below, we've zoomed into the

1061

'libc' DSO which shows all the entries associated with the

libc-xxx.so DSO.

</para>

<para>

</para>

<para>

We can also use the system-wide -a switch to do system-wide

1071

tracing. Here we'll trace a couple of scheduler events:

1072

1073

root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup

1074

^C[ perf record: Woken up 38 times to write data ]

1075

[ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ]

1076

</literallayout>

1077

We can look at the raw output using 'perf script' with no

1078

arguments:

1079

1080

root@crownbay:~# perf script

1081

1082

perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1083

perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

1084

kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120

1085

swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000

1086

swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

1087

kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

1088

perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1089

perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

1090

kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120

1091

perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

</literallayout>

</para>

<title>Filtering</title>

1097

1098

<para>

1099

Notice that there are a lot of events that don't really have

1100

anything to do with what we're interested in, namely events

1101

that schedule 'perf' itself in and out or that wake perf up.

1102

We can get rid of those by using the '--filter' option -

1103

for each event we specify using -e, we can add a --filter

1104

after that to filter out trace events that contain fields

1105

with specific values:

1106

1107

root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf'

1108

^C[ perf record: Woken up 38 times to write data ]

1109

[ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ]

1110

1111

1112

root@crownbay:~# perf script

1113

1114

swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

1115

kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

1116

perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1117

perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1118

perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1119

perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1120

perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1121

perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1122

swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001

1123

swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120

1124

kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120

1125

swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000

1126

swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120

1127

kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120

1128

</literallayout>

1129

In this case, we've filtered out all events that have 'perf'

1130

in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice

1131

that there are still events recorded for perf, but notice

1132

that those events don't have values of 'perf' for the filtered

1133

fields. To completely filter out anything from perf will

1134

require a bit more work, but for the purpose of demonstrating

1135

how to use filters, it's close enough.

</para>

<emphasis>Tying it Together:</emphasis> These are exactly the same set of event

1140

filters defined by the trace event subsystem. See the

1141

ftrace/tracecmd/kernelshark section for more discussion about

these event filters.

</informalexample>

<emphasis>Tying it Together:</emphasis> These event filters are implemented by a

1147

special-purpose pseudo-interpreter in the kernel and are an

1148

integral and indispensable part of the perf design as it

1149

relates to tracing. kernel-based event filters provide a

1150

mechanism to precisely throttle the event stream that appears

1151

in user space, where it makes sense to provide bindings to real

1152

programming languages for postprocessing the event stream.

1153

This architecture allows for the intelligent and flexible

1154

partitioning of processing between the kernel and user space.

1155

Contrast this with other tools such as SystemTap, which does

1156

all of its processing in the kernel and as such requires a

1157

special project-defined language in order to accommodate that

1158

design, or LTTng, where everything is sent to userspace and

1159

as such requires a super-efficient kernel-to-userspace

1160

transport mechanism in order to function properly. While

1161

perf certainly can benefit from for instance advances in

1162

the design of the transport, it doesn't fundamentally depend

1163

on them. Basically, if you find that your perf tracing

1164

application is causing buffer I/O overruns, it probably

1165

means that you aren't taking enough advantage of the

1166

kernel filtering engine.

</informalexample>

</section>

</section>

<title>Using Dynamic Tracepoints</title>

1173

1174

<para>

1175

perf isn't restricted to the fixed set of static tracepoints

1176

listed by 'perf list'. Users can also add their own 'dynamic'

1177

tracepoints anywhere in the kernel. For instance, suppose we

1178

want to define our own tracepoint on do_fork(). We can do that

1179

using the 'perf probe' perf subcommand:

1180

1181

root@crownbay:~# perf probe do_fork

1182

Added new event:

1183

probe:do_fork (on do_fork)

1184

1185

You can now use it in all perf tools, such as:

1186

1187

perf record -e probe:do_fork -aR sleep 1

1188

</literallayout>

1189

Adding a new tracepoint via 'perf probe' results in an event

1190

with all the expected files and format in

1191

/sys/kernel/debug/tracing/events, just the same as for static

1192

tracepoints (as discussed in more detail in the trace events

1193

subsystem section:

1194

1195

root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al

1196

drwxr-xr-x 2 root root 0 Oct 28 11:42 .

1197

drwxr-xr-x 3 root root 0 Oct 28 11:42 ..

1198

-rw-r--r-- 1 root root 0 Oct 28 11:42 enable

1199

-rw-r--r-- 1 root root 0 Oct 28 11:42 filter

1200

-r--r--r-- 1 root root 0 Oct 28 11:42 format

1201

-r--r--r-- 1 root root 0 Oct 28 11:42 id

1202

1203

root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format

name: do_fork

ID: 944

format:

field:unsigned short common_type; offset:0; size:2; signed:0;

1208

field:unsigned char common_flags; offset:2; size:1; signed:0;

1209

field:unsigned char common_preempt_count; offset:3; size:1; signed:0;

1210

field:int common_pid; offset:4; size:4; signed:1;

1211

field:int common_padding; offset:8; size:4; signed:1;

1212

1213

field:unsigned long __probe_ip; offset:12; size:4; signed:0;

1214

1215

print fmt: "(%lx)", REC->__probe_ip

1216

</literallayout>

1217

We can list all dynamic tracepoints currently in existence:

1218

1219

root@crownbay:~# perf probe -l

1220

probe:do_fork (on do_fork)

1221

probe:schedule (on schedule)

1222

</literallayout>

1223

Let's record system-wide ('sleep 30' is a trick for recording

1224

system-wide but basically do nothing and then wake up after

1225

30 seconds):

1226

1227

root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30

1228

[ perf record: Woken up 1 times to write data ]

1229

[ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ]

1230

</literallayout>

1231

Using 'perf script' we can see each do_fork event that fired:

1232

1233

root@crownbay:~# perf script

1234

1235

# ========

1236

# captured on: Sun Oct 28 11:55:18 2012

1237

# hostname : crownbay

1238

# os release : 3.4.11-yocto-standard

1239

# perf version : 3.4.11

# arch : i686

# nrcpus online : 2

# nrcpus avail : 2

# cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz

1244

# cpuid : GenuineIntel,6,38,1

1245

# total memory : 1017184 kB

1246

# cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30

1247

# event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern

1248

= 0, id = { 5, 6 }

1249

# HEADER_CPU_TOPOLOGY info available, use -I to display

1250

# ========

1251

#

1252

matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460)

1253

matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460)

1254

pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460)

1255

pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460)

1256

matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460)

1257

matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460)

1258

gthumb 1300 [001] 34217.697451: do_fork: (c1028460)

1259

gthumb 1300 [001] 34219.085734: do_fork: (c1028460)

1260

gthumb 1300 [000] 34219.121351: do_fork: (c1028460)

1261

gthumb 1300 [001] 34219.264551: do_fork: (c1028460)

1262

pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460)

1263

matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460)

1264

matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460)

1265

matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460)

1266

matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460)

1267

matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460)

1268

matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460)

1269

matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460)

1270

gaku 1312 [000] 34237.202388: do_fork: (c1028460)

1271

</literallayout>

1272

And using 'perf report' on the same file, we can see the

1273

callgraphs from starting a few programs during those 30 seconds:

</para>

<para>

</para>

<emphasis>Tying it Together:</emphasis> The trace events subsystem accommodate static

1282

and dynamic tracepoints in exactly the same way - there's no

1283

difference as far as the infrastructure is concerned. See the

1284

ftrace section for more details on the trace event subsystem.

</informalexample>

<emphasis>Tying it Together:</emphasis> Dynamic tracepoints are implemented under the

1289

covers by kprobes and uprobes. kprobes and uprobes are also used

1290

by and in fact are the main focus of SystemTap.

</informalexample>

</section>

</section>

<title>Documentation</title>

1297

1298

<para>

1299

Online versions of the man pages for the commands discussed in this

1300

section can be found here:

1301

1302

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-stat'>'perf stat' manpage</ulink>.

1303

</para></listitem>

1304

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-record'>'perf record' manpage</ulink>.

1305

</para></listitem>

1306

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-report'>'perf report' manpage</ulink>.

1307

</para></listitem>

1308

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-probe'>'perf probe' manpage</ulink>.

1309

</para></listitem>

1310

<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-script'>'perf script' manpage</ulink>.

1311

</para></listitem>

1312

<listitem><para>Documentation on using the

1313

<ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>.

1314

</para></listitem>

1315

<listitem><para>The top-level

1316

<ulink url='http://linux.die.net/man/1/perf'>perf(1) manpage</ulink>.

</para></listitem>

</itemizedlist>

</para>

<para>

Normally, you should be able to invoke the man pages via perf

1323

itself e.g. 'perf help' or 'perf help record'.

</para>

<para>

However, by default Yocto doesn't install man pages, but perf

1328

invokes the man pages for most help functionality. This is a bug

1329

and is being addressed by a Yocto bug:

1330

<ulink url='https://bugzilla.yoctoproject.org/show_bug.cgi?id=3388'>Bug 3388 - perf: enable man pages for basic 'help' functionality</ulink>.

</para>

<para>

The man pages in text form, along with some other files, such as

1335

a set of examples, can be found in the 'perf' directory of the

1336

kernel tree:

1337

1338

tools/perf/Documentation

1339

</literallayout>

1340

There's also a nice perf tutorial on the perf wiki that goes

1341

into more detail than we do here in certain areas:

1342

<ulink url='https://perf.wiki.kernel.org/index.php/Tutorial'>Perf Tutorial</ulink>

</para>

</section>

</section>

<title>ftrace</title>

1349

1350

<para>

1351

'ftrace' literally refers to the 'ftrace function tracer' but in

1352

reality this encompasses a number of related tracers along with

1353

the infrastructure that they all make use of.

</para>

<title>Setup</title>

<para>

For this section, we'll assume you've already performed the basic

1361

setup outlined in the General Setup section.

</para>

<para>

ftrace, trace-cmd, and kernelshark run on the target system,

1366

and are ready to go out-of-the-box - no additional setup is

1367

necessary. For the rest of this section we assume you've ssh'ed

1368

to the host and will be running ftrace on the target. kernelshark

1369

is a GUI application and if you use the '-X' option to ssh you

1370

can have the kernelshark GUI run on the target but display

1371

remotely on the host if you want.

</para>

</section>

<title>Basic ftrace usage</title>

1377

1378

<para>

1379

'ftrace' essentially refers to everything included in

1380

the /tracing directory of the mounted debugfs filesystem

1381

(Yocto follows the standard convention and mounts it

1382

at /sys/kernel/debug). Here's a listing of all the files

1383

found in /sys/kernel/debug/tracing on a Yocto system:

1384

1385

root@sugarbay:/sys/kernel/debug/tracing# ls

1386

README kprobe_events trace

1387

available_events kprobe_profile trace_clock

1388

available_filter_functions options trace_marker

1389

available_tracers per_cpu trace_options

1390

buffer_size_kb printk_formats trace_pipe

1391

buffer_total_size_kb saved_cmdlines tracing_cpumask

1392

current_tracer set_event tracing_enabled

1393

dyn_ftrace_total_info set_ftrace_filter tracing_on

1394

enabled_functions set_ftrace_notrace tracing_thresh

1395

events set_ftrace_pid

1396

free_buffer set_graph_function

1397

</literallayout>

1398

The files listed above are used for various purposes -

1399

some relate directly to the tracers themselves, others are

1400

used to set tracing options, and yet others actually contain

1401

the tracing output when a tracer is in effect. Some of the

1402

functions can be guessed from their names, others need

1403

explanation; in any case, we'll cover some of the files we

1404

see here below but for an explanation of the others, please

1405

see the ftrace documentation.

</para>

<para>

We'll start by looking at some of the available built-in

tracers.

</para>

<para>

cat'ing the 'available_tracers' file lists the set of

1415

available tracers:

1416

1417

root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers

1418

blk function_graph function nop

1419

</literallayout>

1420

The 'current_tracer' file contains the tracer currently in

1421

effect:

1422

1423

root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer

1424

nop

1425

</literallayout>

1426

The above listing of current_tracer shows that

1427

the 'nop' tracer is in effect, which is just another

1428

way of saying that there's actually no tracer

currently in effect.

</para>

<para>

echo'ing one of the available_tracers into current_tracer

1434

makes the specified tracer the current tracer:

1435

1436

root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer

1437

root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer

1438

function

1439

</literallayout>

1440

The above sets the current tracer to be the

1441

'function tracer'. This tracer traces every function

1442

call in the kernel and makes it available as the

1443

contents of the 'trace' file. Reading the 'trace' file

1444

lists the currently buffered function calls that have been

1445

traced by the function tracer:

1446

1447

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

# tracer: function

#

# entries-in-buffer/entries-written: 310629/766471 #P:8

1452

#

1453

# _-----=> irqs-off

1454

# / _----=> need-resched

1455

# | / _---=> hardirq/softirq

1456

# || / _--=> preempt-depth

1457

# ||| / delay

1458

# TASK-PID CPU# |||| TIMESTAMP FUNCTION

1459

# | | | |||| | |

1460

<idle>-0 [004] d..1 470.867169: ktime_get_real <-intel_idle

1461

<idle>-0 [004] d..1 470.867170: getnstimeofday <-ktime_get_real

1462

<idle>-0 [004] d..1 470.867171: ns_to_timeval <-intel_idle

1463

<idle>-0 [004] d..1 470.867171: ns_to_timespec <-ns_to_timeval

1464

<idle>-0 [004] d..1 470.867172: smp_apic_timer_interrupt <-apic_timer_interrupt

1465

<idle>-0 [004] d..1 470.867172: native_apic_mem_write <-smp_apic_timer_interrupt

1466

<idle>-0 [004] d..1 470.867172: irq_enter <-smp_apic_timer_interrupt

1467

<idle>-0 [004] d..1 470.867172: rcu_irq_enter <-irq_enter

1468

<idle>-0 [004] d..1 470.867173: rcu_idle_exit_common.isra.33 <-rcu_irq_enter

1469

<idle>-0 [004] d..1 470.867173: local_bh_disable <-irq_enter

1470

<idle>-0 [004] d..1 470.867173: add_preempt_count <-local_bh_disable

1471

<idle>-0 [004] d.s1 470.867174: tick_check_idle <-irq_enter

1472

<idle>-0 [004] d.s1 470.867174: tick_check_oneshot_broadcast <-tick_check_idle

1473

<idle>-0 [004] d.s1 470.867174: ktime_get <-tick_check_idle

1474

<idle>-0 [004] d.s1 470.867174: tick_nohz_stop_idle <-tick_check_idle

1475

<idle>-0 [004] d.s1 470.867175: update_ts_time_stats <-tick_nohz_stop_idle

1476

<idle>-0 [004] d.s1 470.867175: nr_iowait_cpu <-update_ts_time_stats

1477

<idle>-0 [004] d.s1 470.867175: tick_do_update_jiffies64 <-tick_check_idle

1478

<idle>-0 [004] d.s1 470.867175: _raw_spin_lock <-tick_do_update_jiffies64

1479

<idle>-0 [004] d.s1 470.867176: add_preempt_count <-_raw_spin_lock

1480

<idle>-0 [004] d.s2 470.867176: do_timer <-tick_do_update_jiffies64

1481

<idle>-0 [004] d.s2 470.867176: _raw_spin_lock <-do_timer

1482

<idle>-0 [004] d.s2 470.867176: add_preempt_count <-_raw_spin_lock

1483

<idle>-0 [004] d.s3 470.867177: ntp_tick_length <-do_timer

1484

<idle>-0 [004] d.s3 470.867177: _raw_spin_lock_irqsave <-ntp_tick_length

.

.

.

</literallayout>

Each line in the trace above shows what was happening in

1490

the kernel on a given cpu, to the level of detail of

1491

function calls. Each entry shows the function called,

1492

followed by its caller (after the arrow).

</para>

<para>

The function tracer gives you an extremely detailed idea

1497

of what the kernel was doing at the point in time the trace

1498

was taken, and is a great way to learn about how the kernel

1499

code works in a dynamic sense.

</para>

<emphasis>Tying it Together:</emphasis> The ftrace function tracer is also

1504

available from within perf, as the ftrace:function tracepoint.

</informalexample>

<para>

It is a little more difficult to follow the call chains than

1509

it needs to be - luckily there's a variant of the function

1510

tracer that displays the callchains explicitly, called the

1511

'function_graph' tracer:

1512

1513

root@sugarbay:/sys/kernel/debug/tracing# echo function_graph > current_tracer

1514

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

1515

1516

tracer: function_graph

1517

1518

CPU DURATION FUNCTION CALLS

1519

| | | | | | |

1520

7) 0.046 us | pick_next_task_fair();

1521

7) 0.043 us | pick_next_task_stop();

1522

7) 0.042 us | pick_next_task_rt();

1523

7) 0.032 us | pick_next_task_fair();

1524

7) 0.030 us | pick_next_task_idle();

1525

7) | _raw_spin_unlock_irq() {

1526

7) 0.033 us | sub_preempt_count();

1527

7) 0.258 us | }

1528

7) 0.032 us | sub_preempt_count();

1529

7) + 13.341 us | } /* __schedule */

1530

7) 0.095 us | } /* sub_preempt_count */

1531

7) | schedule() {

1532

7) | __schedule() {

1533

7) 0.060 us | add_preempt_count();

1534

7) 0.044 us | rcu_note_context_switch();

1535

7) | _raw_spin_lock_irq() {

1536

7) 0.033 us | add_preempt_count();

1537

7) 0.247 us | }

1538

7) | idle_balance() {

1539

7) | _raw_spin_unlock() {

1540

7) 0.031 us | sub_preempt_count();

1541

7) 0.246 us | }

1542

7) | update_shares() {

1543

7) 0.030 us | __rcu_read_lock();

1544

7) 0.029 us | __rcu_read_unlock();

1545

7) 0.484 us | }

1546

7) 0.030 us | __rcu_read_lock();

1547

7) | load_balance() {

1548

7) | find_busiest_group() {

1549

7) 0.031 us | idle_cpu();

1550

7) 0.029 us | idle_cpu();

1551

7) 0.035 us | idle_cpu();

1552

7) 0.906 us | }

1553

7) 1.141 us | }

1554

7) 0.022 us | msecs_to_jiffies();

1555

7) | load_balance() {

1556

7) | find_busiest_group() {

1557

7) 0.031 us | idle_cpu();

.

.

.

4) 0.062 us | msecs_to_jiffies();

1562

4) 0.062 us | __rcu_read_unlock();

1563

4) | _raw_spin_lock() {

1564

4) 0.073 us | add_preempt_count();

1565

4) 0.562 us | }

1566

4) + 17.452 us | }

1567

4) 0.108 us | put_prev_task_fair();

1568

4) 0.102 us | pick_next_task_fair();

1569

4) 0.084 us | pick_next_task_stop();

1570

4) 0.075 us | pick_next_task_rt();

1571

4) 0.062 us | pick_next_task_fair();

1572

4) 0.066 us | pick_next_task_idle();

1573

------------------------------------------

1574

4) kworker-74 => <idle>-0

1575

------------------------------------------

1576

1577

4) | finish_task_switch() {

1578

4) | _raw_spin_unlock_irq() {

1579

4) 0.100 us | sub_preempt_count();

1580

4) 0.582 us | }

1581

4) 1.105 us | }

1582

4) 0.088 us | sub_preempt_count();

4) ! 100.066 us | }

.

.

.

3) | sys_ioctl() {

3) 0.083 us | fget_light();

1589

3) | security_file_ioctl() {

1590

3) 0.066 us | cap_file_ioctl();

1591

3) 0.562 us | }

1592

3) | do_vfs_ioctl() {

1593

3) | drm_ioctl() {

1594

3) 0.075 us | drm_ut_debug_printk();

1595

3) | i915_gem_pwrite_ioctl() {

1596

3) | i915_mutex_lock_interruptible() {

1597

3) 0.070 us | mutex_lock_interruptible();

1598

3) 0.570 us | }

1599

3) | drm_gem_object_lookup() {

1600

3) | _raw_spin_lock() {

1601

3) 0.080 us | add_preempt_count();

1602

3) 0.620 us | }

1603

3) | _raw_spin_unlock() {

1604

3) 0.085 us | sub_preempt_count();

1605

3) 0.562 us | }

1606

3) 2.149 us | }

1607

3) 0.133 us | i915_gem_object_pin();

1608

3) | i915_gem_object_set_to_gtt_domain() {

1609

3) 0.065 us | i915_gem_object_flush_gpu_write_domain();

1610

3) 0.065 us | i915_gem_object_wait_rendering();

1611

3) 0.062 us | i915_gem_object_flush_cpu_write_domain();

1612

3) 1.612 us | }

1613

3) | i915_gem_object_put_fence() {

1614

3) 0.097 us | i915_gem_object_flush_fence.constprop.36();

1615

3) 0.645 us | }

1616

3) 0.070 us | add_preempt_count();

1617

3) 0.070 us | sub_preempt_count();

1618

3) 0.073 us | i915_gem_object_unpin();

1619

3) 0.068 us | mutex_unlock();

3) 9.924 us | }

3) + 11.236 us | }

3) + 11.770 us | }

3) + 13.784 us | }

3) | sys_ioctl() {

</literallayout>

As you can see, the function_graph display is much easier to

1627

follow. Also note that in addition to the function calls and

1628

associated braces, other events such as scheduler events

1629

are displayed in context. In fact, you can freely include

1630

any tracepoint available in the trace events subsystem described

1631

in the next section by simply enabling those events, and they'll

1632

appear in context in the function graph display. Quite a

1633

powerful tool for understanding kernel dynamics.

</para>

<para>

Also notice that there are various annotations on the left

1638

hand side of the display. For example if the total time it

1639

took for a given function to execute is above a certain

1640

threshold, an exclamation point or plus sign appears on the

1641

left hand side. Please see the ftrace documentation for

1642

details on all these fields.

</para>

</section>

<title>The 'trace events' Subsystem</title>

1648

1649

<para>

1650

One especially important directory contained within

1651

the /sys/kernel/debug/tracing directory is the 'events'

1652

subdirectory, which contains representations of every

1653

tracepoint in the system. Listing out the contents of

1654

the 'events' subdirectory, we see mainly another set of

1655

subdirectories:

1656

1657

root@sugarbay:/sys/kernel/debug/tracing# cd events

1658

root@sugarbay:/sys/kernel/debug/tracing/events# ls -al

1659

drwxr-xr-x 38 root root 0 Nov 14 23:19 .

1660

drwxr-xr-x 5 root root 0 Nov 14 23:19 ..

1661

drwxr-xr-x 19 root root 0 Nov 14 23:19 block

1662

drwxr-xr-x 32 root root 0 Nov 14 23:19 btrfs

1663

drwxr-xr-x 5 root root 0 Nov 14 23:19 drm

1664

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1665

drwxr-xr-x 40 root root 0 Nov 14 23:19 ext3

1666

drwxr-xr-x 79 root root 0 Nov 14 23:19 ext4

1667

drwxr-xr-x 14 root root 0 Nov 14 23:19 ftrace

1668

drwxr-xr-x 8 root root 0 Nov 14 23:19 hda

1669

-r--r--r-- 1 root root 0 Nov 14 23:19 header_event

1670

-r--r--r-- 1 root root 0 Nov 14 23:19 header_page

1671

drwxr-xr-x 25 root root 0 Nov 14 23:19 i915

1672

drwxr-xr-x 7 root root 0 Nov 14 23:19 irq

1673

drwxr-xr-x 12 root root 0 Nov 14 23:19 jbd

1674

drwxr-xr-x 14 root root 0 Nov 14 23:19 jbd2

1675

drwxr-xr-x 14 root root 0 Nov 14 23:19 kmem

1676

drwxr-xr-x 7 root root 0 Nov 14 23:19 module

1677

drwxr-xr-x 3 root root 0 Nov 14 23:19 napi

1678

drwxr-xr-x 6 root root 0 Nov 14 23:19 net

1679

drwxr-xr-x 3 root root 0 Nov 14 23:19 oom

1680

drwxr-xr-x 12 root root 0 Nov 14 23:19 power

1681

drwxr-xr-x 3 root root 0 Nov 14 23:19 printk

1682

drwxr-xr-x 8 root root 0 Nov 14 23:19 random

1683

drwxr-xr-x 4 root root 0 Nov 14 23:19 raw_syscalls

1684

drwxr-xr-x 3 root root 0 Nov 14 23:19 rcu

1685

drwxr-xr-x 6 root root 0 Nov 14 23:19 rpm

1686

drwxr-xr-x 20 root root 0 Nov 14 23:19 sched

1687

drwxr-xr-x 7 root root 0 Nov 14 23:19 scsi

1688

drwxr-xr-x 4 root root 0 Nov 14 23:19 signal

1689

drwxr-xr-x 5 root root 0 Nov 14 23:19 skb

1690

drwxr-xr-x 4 root root 0 Nov 14 23:19 sock

1691

drwxr-xr-x 10 root root 0 Nov 14 23:19 sunrpc

1692

drwxr-xr-x 538 root root 0 Nov 14 23:19 syscalls

1693

drwxr-xr-x 4 root root 0 Nov 14 23:19 task

1694

drwxr-xr-x 14 root root 0 Nov 14 23:19 timer

1695

drwxr-xr-x 3 root root 0 Nov 14 23:19 udp

1696

drwxr-xr-x 21 root root 0 Nov 14 23:19 vmscan

1697

drwxr-xr-x 3 root root 0 Nov 14 23:19 vsyscall

1698

drwxr-xr-x 6 root root 0 Nov 14 23:19 workqueue

1699

drwxr-xr-x 26 root root 0 Nov 14 23:19 writeback

1700

</literallayout>

1701

Each one of these subdirectories corresponds to a

1702

'subsystem' and contains yet again more subdirectories,

1703

each one of those finally corresponding to a tracepoint.

1704

For example, here are the contents of the 'kmem' subsystem:

1705

1706

root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem

1707

root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al

1708

drwxr-xr-x 14 root root 0 Nov 14 23:19 .

1709

drwxr-xr-x 38 root root 0 Nov 14 23:19 ..

1710

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1711

-rw-r--r-- 1 root root 0 Nov 14 23:19 filter

1712

drwxr-xr-x 2 root root 0 Nov 14 23:19 kfree

1713

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc

1714

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc_node

1715

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc

1716

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc_node

1717

drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_free

1718

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc

1719

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_extfrag

1720

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_zone_locked

1721

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free

1722

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free_batched

1723

drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_pcpu_drain

1724

</literallayout>

1725

Let's see what's inside the subdirectory for a specific

1726

tracepoint, in this case the one for kmalloc:

1727

1728

root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc

1729

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al

1730

drwxr-xr-x 2 root root 0 Nov 14 23:19 .

1731

drwxr-xr-x 14 root root 0 Nov 14 23:19 ..

1732

-rw-r--r-- 1 root root 0 Nov 14 23:19 enable

1733

-rw-r--r-- 1 root root 0 Nov 14 23:19 filter

1734

-r--r--r-- 1 root root 0 Nov 14 23:19 format

1735

-r--r--r-- 1 root root 0 Nov 14 23:19 id

1736

</literallayout>

1737

The 'format' file for the tracepoint describes the event

1738

in memory, which is used by the various tracing tools

1739

that now make use of these tracepoint to parse the event

1740

and make sense of it, along with a 'print fmt' field that

1741

allows tools like ftrace to display the event as text.

1742

Here's what the format of the kmalloc event looks like:

1743

1744

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format

name: kmalloc

ID: 313

format:

field:unsigned short common_type; offset:0; size:2; signed:0;

1749

field:unsigned char common_flags; offset:2; size:1; signed:0;

1750

field:unsigned char common_preempt_count; offset:3; size:1; signed:0;

1751

field:int common_pid; offset:4; size:4; signed:1;

1752

field:int common_padding; offset:8; size:4; signed:1;

1753

1754

field:unsigned long call_site; offset:16; size:8; signed:0;

1755

field:const void * ptr; offset:24; size:8; signed:0;

1756

field:size_t bytes_req; offset:32; size:8; signed:0;

1757

field:size_t bytes_alloc; offset:40; size:8; signed:0;

1758

field:gfp_t gfp_flags; offset:48; size:4; signed:0;

1759

1760

print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc,

1761

(REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1762

1763

gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | ((

1764

gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1765

gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((

1766

gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"},

1767

{(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)),

1768

"GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)((

1769

gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)((

1770

gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned

1771

long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"}, {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"},

1772

{(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u),

1773

"GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned

1774

long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"},

1775

{(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT"

1776

</literallayout>

1777

The 'enable' file in the tracepoint directory is what allows

1778

the user (or tools such as trace-cmd) to actually turn the

1779

tracepoint on and off. When enabled, the corresponding

1780

tracepoint will start appearing in the ftrace 'trace'

1781

file described previously. For example, this turns on the

1782

kmalloc tracepoint:

1783

1784

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable

1785

</literallayout>

1786

At the moment, we're not interested in the function tracer or

1787

some other tracer that might be in effect, so we first turn

1788

it off, but if we do that, we still need to turn tracing on in

1789

order to see the events in the output buffer:

1790

1791

root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer

1792

root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on

1793

</literallayout>

1794

Now, if we look at the the 'trace' file, we see nothing

1795

but the kmalloc events we just turned on:

1796

1797

root@sugarbay:/sys/kernel/debug/tracing# cat trace | less

1798

# tracer: nop

1799

#

1800

# entries-in-buffer/entries-written: 1897/1897 #P:8

1801

#

1802

# _-----=> irqs-off

1803

# / _----=> need-resched

1804

# | / _---=> hardirq/softirq

1805

# || / _--=> preempt-depth

1806

# ||| / delay

1807

# TASK-PID CPU# |||| TIMESTAMP FUNCTION

1808

# | | | |||| | |

1809

dropbear-1465 [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1810

<idle>-0 [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1811

<idle>-0 [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1812

matchbox-termin-1361 [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT

1813

Xorg-1264 [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1814

Xorg-1264 [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1815

Xorg-1264 [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1816

matchbox-termin-1361 [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT

1817

Xorg-1264 [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1818

Xorg-1264 [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1819

Xorg-1264 [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1820

<idle>-0 [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1821

dropbear-1465 [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1822

<idle>-0 [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1823

<idle>-0 [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1824

dropbear-1465 [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1825

<idle>-0 [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1826

<idle>-0 [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1827

dropbear-1465 [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1828

<idle>-0 [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1829

matchbox-termin-1361 [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT

1830

Xorg-1264 [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY

1831

Xorg-1264 [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

1832

Xorg-1264 [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO

1833

<idle>-0 [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1834

dropbear-1465 [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1835

<idle>-0 [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1836

<idle>-0 [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1837

dropbear-1465 [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL

1838

<idle>-0 [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC

1839

matchbox-termin-1361 [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT

1840

</literallayout>

1841

To again disable the kmalloc event, we need to send 0 to the

1842

enable file:

1843

1844

root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable

1845

</literallayout>

1846

You can enable any number of events or complete subsystems

1847

(by using the 'enable' file in the subsystem directory) and

1848

get an arbitrarily fine-grained idea of what's going on in the

1849

system by enabling as many of the appropriate tracepoints

as applicable.

</para>

<para>

A number of the tools described in this HOWTO do just that,

1855

including trace-cmd and kernelshark in the next section.

</para>

<emphasis>Tying it Together:</emphasis> These tracepoints and their representation

1860

are used not only by ftrace, but by many of the other tools

1861

covered in this document and they form a central point of

1862

integration for the various tracers available in Linux.

1863

They form a central part of the instrumentation for the

1864

following tools: perf, lttng, ftrace, blktrace and SystemTap

</informalexample>

<emphasis>Tying it Together:</emphasis> Eventually all the special-purpose tracers

1869

currently available in /sys/kernel/debug/tracing will be

1870

removed and replaced with equivalent tracers based on the

1871

'trace events' subsystem.

</informalexample>

</section>

<title>trace-cmd/kernelshark</title>

1877

1878

<para>

1879

trace-cmd is essentially an extensive command-line 'wrapper'

1880

interface that hides the details of all the individual files

1881

in /sys/kernel/debug/tracing, allowing users to specify

1882

specific particular events within the

1883

/sys/kernel/debug/tracing/events/ subdirectory and to collect

1884

traces and avoid having to deal with those details directly.

</para>

<para>

As yet another layer on top of that, kernelshark provides a GUI

1889

that allows users to start and stop traces and specify sets

1890

of events using an intuitive interface, and view the

1891

output as both trace events and as a per-CPU graphical

1892

display. It directly uses 'trace-cmd' as the plumbing

1893

that accomplishes all that underneath the covers (and

1894

actually displays the trace-cmd command it uses, as we'll see).

</para>

<para>

To start a trace using kernelshark, first start kernelshark:

1899

1900

root@sugarbay:~# kernelshark

1901

</literallayout>

1902

Then bring up the 'Capture' dialog by choosing from the

kernelshark menu:

Capture | Record

</literallayout>

That will display the following dialog, which allows you to

1908

choose one or more events (or even one or more complete

1909

subsystems) to trace:

</para>

<para>

</para>

<para>

Note that these are exactly the same sets of events described

1918

in the previous trace events subsystem section, and in fact

1919

is where trace-cmd gets them for kernelshark.

</para>

<para>

In the above screenshot, we've decided to explore the

1924

graphics subsystem a bit and so have chosen to trace all

1925

the tracepoints contained within the 'i915' and 'drm'

subsystems.

</para>

<para>

After doing that, we can start and stop the trace using

1931

the 'Run' and 'Stop' button on the lower right corner of

1932

the dialog (the same button will turn into the 'Stop'

1933

button after the trace has started):

</para>

<para>

</para>

<para>

Notice that the right-hand pane shows the exact trace-cmd

1942

command-line that's used to run the trace, along with the

1943

results of the trace-cmd run.

</para>

<para>

Once the 'Stop' button is pressed, the graphical view magically

1948

fills up with a colorful per-cpu display of the trace data,

1949

along with the detailed event listing below that:

</para>

<para>

</para>

<para>

Here's another example, this time a display resulting

1958

from tracing 'all events':

</para>

<para>

</para>

<para>

The tool is pretty self-explanatory, but for more detailed

1967

information on navigating through the data, see the

1968

<ulink url='http://rostedt.homelinux.com/kernelshark/'>kernelshark website</ulink>.

</para>

</section>

<title>Documentation</title>

1974

1975

<para>

1976

The documentation for ftrace can be found in the kernel

1977

Documentation directory:

1978

1979

Documentation/trace/ftrace.txt

1980

</literallayout>

1981

The documentation for the trace event subsystem can also

1982

be found in the kernel Documentation directory:

1983

1984

Documentation/trace/events.txt

1985

</literallayout>

1986

There is a nice series of articles on using

1987

ftrace and trace-cmd at LWN:

1988

1989

<listitem><para><ulink url='http://lwn.net/Articles/365835/'>Debugging the kernel using Ftrace - part 1</ulink>

1990

</para></listitem>

1991

<listitem><para><ulink url='http://lwn.net/Articles/366796/'>Debugging the kernel using Ftrace - part 2</ulink>

1992

</para></listitem>

1993

<listitem><para><ulink url='http://lwn.net/Articles/370423/'>Secrets of the Ftrace function tracer</ulink>

1994

</para></listitem>

1995

<listitem><para><ulink url='https://lwn.net/Articles/410200/'>trace-cmd: A front-end for Ftrace</ulink>

</para></listitem>

</itemizedlist>

</para>

<para>

There's more detailed documentation kernelshark usage here:

2002

<ulink url='http://rostedt.homelinux.com/kernelshark/'>KernelShark</ulink>

</para>

<para>

An amusing yet useful README (a tracing mini-HOWTO) can be

2007

found in /sys/kernel/debug/tracing/README.

</para>

</section>

</section>

<title>systemtap</title>

2014

2015

<para>

2016

SystemTap is a system-wide script-based tracing and profiling tool.

</para>

<para>

SystemTap scripts are C-like programs that are executed in the

2021

kernel to gather/print/aggregate data extracted from the context

2022

they end up being invoked under.

</para>

<para>

For example, this probe from the

2027

<ulink url='http://sourceware.org/systemtap/tutorial/'>SystemTap tutorial</ulink>

2028

simply prints a line every time any process on the system open()s

2029

a file. For each line, it prints the executable name of the

2030

program that opened the file, along with its PID, and the name

2031

of the file it opened (or tried to open), which it extracts

2032

from the open syscall's argstr.

probe syscall.open

{

printf ("%s(%d) open (%s)\n", execname(), pid(), argstr)

2037

}

2038

2039

probe timer.ms(4000) # after 4 seconds

{

exit ()

}

</literallayout>

Normally, to execute this probe, you'd simply install

2045

systemtap on the system you want to probe, and directly run

2046

the probe on that system e.g. assuming the name of the file

2047

containing the above text is trace_open.stp:

2048

2049

# stap trace_open.stp

2050

</literallayout>

2051

What systemtap does under the covers to run this probe is 1)

2052

parse and convert the probe to an equivalent 'C' form, 2)

2053

compile the 'C' form into a kernel module, 3) insert the

2054

module into the kernel, which arms it, and 4) collect the data

2055

generated by the probe and display it to the user.

</para>

<para>

In order to accomplish steps 1 and 2, the 'stap' program needs

2060

access to the kernel build system that produced the kernel

2061

that the probed system is running. In the case of a typical

2062

embedded system (the 'target'), the kernel build system

2063

unfortunately isn't typically part of the image running on

2064

the target. It is normally available on the 'host' system

2065

that produced the target image however; in such cases,

2066

steps 1 and 2 are executed on the host system, and steps

2067

3 and 4 are executed on the target system, using only the

systemtap 'runtime'.

</para>

<para>

The systemtap support in Yocto assumes that only steps

2073

3 and 4 are run on the target; it is possible to do

2074

everything on the target, but this section assumes only

2075

the typical embedded use-case.

</para>

<para>

So basically what you need to do in order to run a systemtap

2080

script on the target is to 1) on the host system, compile the

2081

probe into a kernel module that makes sense to the target, 2)

2082

copy the module onto the target system and 3) insert the

2083

module into the target kernel, which arms it, and 4) collect

2084

the data generated by the probe and display it to the user.

</para>

<title>Setup</title>

<para>

Those are a lot of steps and a lot of details, but

2092

fortunately Yocto includes a script called 'crosstap'

2093

that will take care of those details, allowing you to

2094

simply execute a systemtap script on the remote target,

2095

with arguments if necessary.

</para>

<para>

In order to do this from a remote host, however, you

2100

need to have access to the build for the image you

2101

booted. The 'crosstap' script provides details on how

2102

to do this if you run the script on the host without having

2103

done a build:

2104

<note>

2105

SystemTap, which uses 'crosstap', assumes you can establish an

2106

ssh connection to the remote target.

2107

Please refer to the crosstap wiki page for details on verifying

2108

ssh connections at

2109

<ulink url='https://wiki.yoctoproject.org/wiki/Tracing_and_Profiling#systemtap'></ulink>.

2110

Also, the ability to ssh into the target system is not enabled

2111

by default in *-minimal images.

2112

</note>

2113

2114

$ crosstap root@192.168.1.88 trace_open.stp

2115

2116

Error: No target kernel build found.

2117

Did you forget to create a local build of your image?

2118

2119

'crosstap' requires a local sdk build of the target system

2120

(or a build that includes 'tools-profile') in order to build

2121

kernel modules that can probe the target system.

2122

2123

Practically speaking, that means you need to do the following:

2124

- If you're running a pre-built image, download the release

2125

and/or BSP tarballs used to build the image.

2126

- If you're working from git sources, just clone the metadata

2127

and BSP layers needed to build the image you'll be booting.

2128

- Make sure you're properly set up to build a new image (see

2129

the BSP README and/or the widely available basic documentation

2130

that discusses how to build images).

2131

- Build an -sdk version of the image e.g.:

2132

$ bitbake core-image-sato-sdk

2133

OR

2134

- Build a non-sdk image but include the profiling tools:

2135

[ edit local.conf and add 'tools-profile' to the end of

2136

the EXTRA_IMAGE_FEATURES variable ]

2137

$ bitbake core-image-sato

2138

2139

Once you've build the image on the host system, you're ready to

2140

boot it (or the equivalent pre-built image) and use 'crosstap'

2141

to probe it (you need to source the environment as usual first):

2142

2143

$ source oe-init-build-env

2144

$ cd ~/my/systemtap/scripts

2145

$ crosstap root@192.168.1.xxx myscript.stp

2146

</literallayout>

2147

So essentially what you need to do is build an SDK image or

2148

image with 'tools-profile' as detailed in the

2149

"<link linkend='profile-manual-general-setup'>General Setup</link>"

2150

section of this manual, and boot the resulting target image.

</para>

<note>

If you have a build directory containing multiple machines,

2155

you need to have the MACHINE you're connecting to selected

2156

in local.conf, and the kernel in that machine's build

2157

directory must match the kernel on the booted system exactly,

2158

or you'll get the above 'crosstap' message when you try to

invoke a script.

</note>

</section>

<title>Running a Script on a Target</title>

2165

2166

<para>

2167

Once you've done that, you should be able to run a systemtap

2168

script on the target:

2169

2170

$ cd /path/to/yocto

2171

$ source oe-init-build-env

2172

2173

### Shell environment set up for builds. ###

2174

Patrick Williams

d8c66bc

2016-06-20 12:57:21 -0500

[diff] [blame]

2175

You can now run 'bitbake <target>'

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

2176

2177

Common targets are:

Patrick Williams

d8c66bc

2016-06-20 12:57:21 -0500

[diff] [blame]

core-image-minimal

core-image-sato

meta-toolchain

meta-ide-support

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

2182

2183

You can also run generated qemu images with a command like 'runqemu qemux86'

Patrick Williams

d8c66bc

2016-06-20 12:57:21 -0500

[diff] [blame]

2184

Patrick Williams

c124f4f

2015-09-15 14:41:29 -0500

[diff] [blame]

2185

</literallayout>

2186

Once you've done that, you can cd to whatever directory

2187

contains your scripts and use 'crosstap' to run the script:

2188

2189

$ cd /path/to/my/systemap/script

2190

$ crosstap root@192.168.7.2 trace_open.stp

2191

</literallayout>

2192

If you get an error connecting to the target e.g.:

2193

2194

$ crosstap root@192.168.7.2 trace_open.stp

2195

error establishing ssh connection on remote 'root@192.168.7.2'

2196

</literallayout>

2197

Try ssh'ing to the target and see what happens:

2198

2199

$ ssh root@192.168.7.2

2200

</literallayout>

2201

A lot of the time, connection problems are due specifying a

2202

wrong IP address or having a 'host key verification error'.

</para>

<para>

If everything worked as planned, you should see something

2207

like this (enter the password when prompted, or press enter

2208

if it's set up to use no password):

2209

2210

$ crosstap root@192.168.7.2 trace_open.stp

2211

root@192.168.7.2's password:

2212

matchbox-termin(1036) open ("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)

2213

matchbox-termin(1036) open ("/tmp/vteJMC7LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)

</literallayout>

</para>

</section>

<title>Documentation</title>

2220

2221

<para>

2222

The SystemTap language reference can be found here:

2223

<ulink url='http://sourceware.org/systemtap/langref/'>SystemTap Language Reference</ulink>

</para>

<para>

Links to other SystemTap documents, tutorials, and examples can be

2228

found here:

2229

<ulink url='http://sourceware.org/systemtap/documentation.html'>SystemTap documentation page</ulink>

</para>

</section>

</section>

Patrick Williams

2015-09-15 14:41:29 -0500

[diff] [blame]

2234

2235

<title>Sysprof</title>

2236

2237

<para>

2238

Sysprof is a very easy to use system-wide profiler that consists

2239

of a single window with three panes and a few buttons which allow

2240

you to start, stop, and view the profile from one place.

</para>

<title>Setup</title>

<para>

For this section, we'll assume you've already performed the

2248

basic setup outlined in the General Setup section.

</para>

<para>

Sysprof is a GUI-based application that runs on the target

2253

system. For the rest of this document we assume you've

2254

ssh'ed to the host and will be running Sysprof on the

2255

target (you can use the '-X' option to ssh and have the

2256

Sysprof GUI run on the target but display remotely on the

host if you want).

</para>

</section>

<title>Basic Usage</title>

2263

2264

<para>

2265

To start profiling the system, you simply press the 'Start'

2266

button. To stop profiling and to start viewing the profile data

2267

in one easy step, press the 'Profile' button.

</para>

<para>

Once you've pressed the profile button, the three panes will

2272

fill up with profiling data:

</para>

<para>

</para>

<para>

The left pane shows a list of functions and processes.

2281

Selecting one of those expands that function in the right

2282

pane, showing all its callees. Note that this caller-oriented

2283

display is essentially the inverse of perf's default

2284

callee-oriented callchain display.

</para>

<para>

In the screenshot above, we're focusing on __copy_to_user_ll()

2289

and looking up the callchain we can see that one of the callers

2290

of __copy_to_user_ll is sys_read() and the complete callpath

2291

between them. Notice that this is essentially a portion of the

2292

same information we saw in the perf display shown in the perf

2293

section of this page.

</para>

<para>

</para>

<para>

Similarly, the above is a snapshot of the Sysprof display of a

2302

copy-from-user callchain.

</para>

<para>

Finally, looking at the third Sysprof pane in the lower left,

2307

we can see a list of all the callers of a particular function

2308

selected in the top left pane. In this case, the lower pane is

2309

showing all the callers of __mark_inode_dirty:

</para>

<para>

</para>

<para>

Double-clicking on one of those functions will in turn change the

2318

focus to the selected function, and so on.

</para>

<emphasis>Tying it Together:</emphasis> If you like sysprof's 'caller-oriented'

2323

display, you may be able to approximate it in other tools as

2324

well. For example, 'perf report' has the -g (--call-graph)

2325

option that you can experiment with; one of the options is

2326

'caller' for an inverted caller-based callgraph display.

</informalexample>

</section>

<title>Documentation</title>

2332

2333

<para>

2334

There doesn't seem to be any documentation for Sysprof, but

2335

maybe that's because it's pretty self-explanatory.

2336

The Sysprof website, however, is here:

2337

<ulink url='http://sysprof.com/'>Sysprof, System-wide Performance Profiler for Linux</ulink>

</para>

</section>

</section>

<title>LTTng (Linux Trace Toolkit, next generation)</title>

<title>Setup</title>

<para>

For this section, we'll assume you've already performed the

2350

basic setup outlined in the General Setup section.

</para>

<para>

LTTng is run on the target system by ssh'ing to it.

2355

However, if you want to see the traces graphically,

2356

install Eclipse as described in section

2357

"<link linkend='manually-copying-a-trace-to-the-host-and-viewing-it-in-eclipse'>Manually copying a trace to the host and viewing it in Eclipse (i.e. using Eclipse without network support)</link>"

2358

and follow the directions to manually copy traces to the host and

2359

view them in Eclipse (i.e. using Eclipse without network support).

</para>

<note>

Be sure to download and install/run the 'SR1' or later Juno release

2364

of eclipse e.g.:

2365

<ulink url='http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz'>http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz</ulink>

</note>

</section>

<title>Collecting and Viewing Traces</title>

2371

2372

<para>

2373

Once you've applied the above commits and built and booted your

2374

image (you need to build the core-image-sato-sdk image or use one of the

2375

other methods described in the General Setup section), you're

2376

ready to start tracing.

</para>

<title>Collecting and viewing a trace on the target (inside a shell)</title>

2381

2382

<para>

2383

First, from the host, ssh to the target:

2384

2385

$ ssh -l root 192.168.1.47

2386

The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.

2387

RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.

2388

Are you sure you want to continue connecting (yes/no)? yes

2389

Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.

2390

root@192.168.1.47's password:

2391

</literallayout>

2392

Once on the target, use these steps to create a trace:

2393

2394

root@crownbay:~# lttng create

2395

Spawning a session daemon

2396

Session auto-20121015-232120 created.

2397

Traces will be written in /home/root/lttng-traces/auto-20121015-232120

2398

</literallayout>

2399

Enable the events you want to trace (in this case all

2400

kernel events):

2401

2402

root@crownbay:~# lttng enable-event --kernel --all

2403

All kernel events are enabled in channel channel0

</literallayout>

Start the trace:

root@crownbay:~# lttng start

2408

Tracing started for session auto-20121015-232120

2409

</literallayout>

2410

And then stop the trace after awhile or after running

2411

a particular workload that you want to trace:

2412

2413

root@crownbay:~# lttng stop

2414

Tracing stopped for session auto-20121015-232120

2415

</literallayout>

2416

You can now view the trace in text form on the target:

2417

2418

root@crownbay:~# lttng view

2419

[23:21:56.989270399] (+?.?????????) sys_geteuid: { 1 }, { }

2420

[23:21:56.989278081] (+0.000007682) exit_syscall: { 1 }, { ret = 0 }

2421

[23:21:56.989286043] (+0.000007962) sys_pipe: { 1 }, { fildes = 0xB77B9E8C }

2422

[23:21:56.989321802] (+0.000035759) exit_syscall: { 1 }, { ret = 0 }

2423

[23:21:56.989329345] (+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot = 3, flags = 131362, fd = 4294967295, pgoff = 0 }

2424

[23:21:56.989351694] (+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 }

2425

[23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags = 0x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 }

2426

[23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388 }

2427

[23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm = "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 }

2428

[23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer = 3970832076, clockid = 1, mode = 1 }

2429

[23:21:56.989770462] (+0.000262044) hrtimer_cancel: { 1 }, { hrtimer = 3993865440 }

2430

[23:21:56.989771580] (+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 }

2431

[23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, { hrtimer = 3993865440, now = 79815980007057, function = 3238465232 }

2432

[23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, { hrtimer = 3993812192, now = 79815980008174, function = 3238465232 }

2433

[23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 }

2434

[23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 }

2435

[23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 }

2436

[23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098 }

2437

[23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime = 36976733240 }

2438

[23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: { 0 }, { hrtimer = 3993812192 }

2439

[23:21:56.989819631] (+0.000001117) hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 }

2440

[23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer = 3993812192, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }

2441

[23:21:56.989822984] (+0.000001118) hrtimer_start: { 1 }, { hrtimer = 3993865440, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }

2442

[23:21:56.989832762] (+0.000009778) softirq_entry: { 1 }, { vec = 1 }

2443

[23:21:56.989833879] (+0.000001117) softirq_entry: { 0 }, { vec = 1 }

2444

[23:21:56.989838069] (+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 }

2445

[23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer = 3993818708 }

2446

[23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1 }, { timer = 3993871956, now = 79515980, function = 3238277552 }

2447

[23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer = 3993818708, now = 79515980, function = 3238277552 }

2448

[23:21:56.989854831] (+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 49237, vruntime = 43368363335 }

2449

[23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 }

2450

[23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm = "kworker/1:1", tid = 21, delay = 9451318 }

2451

[23:21:56.989862374] (+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4, delay = 9958820 }

2452

[23:21:56.989868241] (+0.000005867) sched_wakeup: { 0 }, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu = 0 }

2453

[23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm = "kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 }

2454

[23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer = 3993871956 }

2455

[23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0 }, { timer = 3993818708 }

.

.

.

</literallayout>

You can now safely destroy the trace session (note that

2461

this doesn't delete the trace - it's still there

2462

in ~/lttng-traces):

2463

2464

root@crownbay:~# lttng destroy

2465

Session auto-20121015-232120 destroyed at /home/root

2466

</literallayout>

2467

Note that the trace is saved in a directory of the same

2468

name as returned by 'lttng create', under the ~/lttng-traces

2469

directory (note that you can change this by supplying your

2470

own name to 'lttng create'):

2471

2472

root@crownbay:~# ls -al ~/lttng-traces

2473

drwxrwx--- 3 root root 1024 Oct 15 23:21 .

2474

drwxr-xr-x 5 root root 1024 Oct 15 23:57 ..

2475

drwxrwx--- 3 root root 1024 Oct 15 23:21 auto-20121015-232120

</literallayout>

</para>

</section>

<title>Collecting and viewing a userspace trace on the target (inside a shell)</title>

2482

2483

<para>

2484

For LTTng userspace tracing, you need to have a properly

2485

instrumented userspace program. For this example, we'll use

2486

the 'hello' test program generated by the lttng-ust build.

</para>

<para>

The 'hello' test program isn't installed on the rootfs by

2491

the lttng-ust build, so we need to copy it over manually.

2492

First cd into the build directory that contains the hello

2493

executable:

2494

2495

$ cd build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs

2496

</literallayout>

2497

Copy that over to the target machine:

2498

2499

$ scp hello root@192.168.1.20:

2500

</literallayout>

2501

You now have the instrumented lttng 'hello world' test

2502

program on the target, ready to test.

</para>

<para>

First, from the host, ssh to the target:

2507

2508

$ ssh -l root 192.168.1.47

2509

The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.

2510

RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.

2511

Are you sure you want to continue connecting (yes/no)? yes

2512

Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.

2513

root@192.168.1.47's password:

2514

</literallayout>

2515

Once on the target, use these steps to create a trace:

2516

2517

root@crownbay:~# lttng create

2518

Session auto-20190303-021943 created.

2519

Traces will be written in /home/root/lttng-traces/auto-20190303-021943

2520

</literallayout>

2521

Enable the events you want to trace (in this case all

2522

userspace events):

2523

2524

root@crownbay:~# lttng enable-event --userspace --all

2525

All UST events are enabled in channel channel0

</literallayout>

Start the trace:

root@crownbay:~# lttng start

2530

Tracing started for session auto-20190303-021943

2531

</literallayout>

2532

Run the instrumented hello world program:

2533

2534

root@crownbay:~# ./hello

Hello, World!

Tracing... done.

</literallayout>

And then stop the trace after awhile or after running a

2539

particular workload that you want to trace:

2540

2541

root@crownbay:~# lttng stop

2542

Tracing stopped for session auto-20190303-021943

2543

</literallayout>

2544

You can now view the trace in text form on the target:

2545

2546

root@crownbay:~# lttng view

2547

[02:31:14.906146544] (+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0, netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2548

[02:31:14.906170360] (+0.000023816) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1, longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2549

[02:31:14.906183140] (+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2, netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

2550

[02:31:14.906194385] (+0.000011245) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3, longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }

.

.

.

</literallayout>

You can now safely destroy the trace session (note that

2556

this doesn't delete the trace - it's still

2557

there in ~/lttng-traces):

2558

2559

root@crownbay:~# lttng destroy

2560

Session auto-20190303-021943 destroyed at /home/root

</literallayout>

</para>

</section>

<title>Manually copying a trace to the host and viewing it in Eclipse (i.e. using Eclipse without network support)</title>

2567

2568

<para>

2569

If you already have an LTTng trace on a remote target and

2570

would like to view it in Eclipse on the host, you can easily

2571

copy it from the target to the host and import it into

2572

Eclipse to view it using the LTTng Eclipse plug-in already

2573

bundled in the Eclipse (Juno SR1 or greater).

</para>

<para>

Using the trace we created in the previous section, archive

2578

it and copy it to your host system:

2579

2580

root@crownbay:~/lttng-traces# tar zcvf auto-20121015-232120.tar.gz auto-20121015-232120

2581

auto-20121015-232120/

2582

auto-20121015-232120/kernel/

2583

auto-20121015-232120/kernel/metadata

2584

auto-20121015-232120/kernel/channel0_1

2585

auto-20121015-232120/kernel/channel0_0

2586

2587

$ scp root@192.168.1.47:lttng-traces/auto-20121015-232120.tar.gz .

2588

root@192.168.1.47's password:

2589

auto-20121015-232120.tar.gz 100% 1566KB 1.5MB/s 00:01

2590

</literallayout>

2591

Unarchive it on the host:

2592

2593

$ gunzip -c auto-20121015-232120.tar.gz | tar xvf -

2594

auto-20121015-232120/

2595

auto-20121015-232120/kernel/

2596

auto-20121015-232120/kernel/metadata

2597

auto-20121015-232120/kernel/channel0_1

2598

auto-20121015-232120/kernel/channel0_0

2599

</literallayout>

2600

We can now import the trace into Eclipse and view it:

2601

2602

<listitem><para>First, start eclipse and open the

2603

'LTTng Kernel' perspective by selecting the following

2604

menu item:

2605

2606

Window | Open Perspective | Other...

2607

</literallayout></para></listitem>

2608

<listitem><para>In the dialog box that opens, select

2609

'LTTng Kernel' from the list.</para></listitem>

2610

<listitem><para>Back at the main menu, select the

2611

following menu item:

2612

2613

File | New | Project...

2614

</literallayout></para></listitem>

2615

<listitem><para>In the dialog box that opens, select

2616

the 'Tracing | Tracing Project' wizard and press

2617

'Next>'.</para></listitem>

2618

<listitem><para>Give the project a name and press

2619

'Finish'.</para></listitem>

2620

<listitem><para>In the 'Project Explorer' pane under

2621

the project you created, right click on the

2622

'Traces' item.</para></listitem>

2623

<listitem><para>Select 'Import..." and in the dialog

2624

that's displayed:</para></listitem>

2625

<listitem><para>Browse the filesystem and find the

2626

select the 'kernel' directory containing the trace

2627

you copied from the target

2628

e.g. auto-20121015-232120/kernel</para></listitem>

2629

<listitem><para>'Checkmark' the directory in the tree

2630

that's displayed for the trace</para></listitem>

2631

<listitem><para>Below that, select 'Common Trace Format:

2632

Kernel Trace' for the 'Trace Type'</para></listitem>

2633

<listitem><para>Press 'Finish' to close the dialog

2634

</para></listitem>

2635

<listitem><para>Back in the 'Project Explorer' pane,

2636

double-click on the 'kernel' item for the

2637

trace you just imported under 'Traces'

2638

</para></listitem>

2639

</orderedlist>

2640

You should now see your trace data displayed graphically

2641

in several different views in Eclipse:

</para>

<para>

</para>

<para>

You can access extensive help information on how to use

2650

the LTTng plug-in to search and analyze captured traces via

2651

the Eclipse help system:

2652

2653

Help | Help Contents | LTTng Plug-in User Guide

</literallayout>

</para>

</section>

<title>Collecting and viewing a trace in Eclipse</title>

2660

2661

<note>

2662

This section on collecting traces remotely doesn't currently

2663

work because of Eclipse 'RSE' connectivity problems. Manually

2664

tracing on the target, copying the trace files to the host,

2665

and viewing the trace in Eclipse on the host as outlined in

2666

previous steps does work however - please use the manual

2667

steps outlined above to view traces in Eclipse.

</note>

<para>

In order to trace a remote target, you also need to add

2672

a 'tracing' group on the target and connect as a user

2673

who's part of that group e.g:

2674

2675

# adduser tomz

2676

# groupadd -r tracing

2677

# usermod -a -G tracing tomz

2678

</literallayout>

2679

2680

<listitem><para>First, start eclipse and open the

2681

'LTTng Kernel' perspective by selecting the following

2682

menu item:

2683

2684

Window | Open Perspective | Other...

2685

</literallayout></para></listitem>

2686

<listitem><para>In the dialog box that opens, select

2687

'LTTng Kernel' from the list.</para></listitem>

2688

<listitem><para>Back at the main menu, select the

2689

following menu item:

2690

2691

File | New | Project...

2692

</literallayout></para></listitem>

2693

<listitem><para>In the dialog box that opens, select

2694

the 'Tracing | Tracing Project' wizard and

2695

press 'Next>'.</para></listitem>

2696

<listitem><para>Give the project a name and press

2697

'Finish'. That should result in an entry in the

2698

'Project' subwindow.</para></listitem>

2699

<listitem><para>In the 'Control' subwindow just below

2700

it, press 'New Connection'.</para></listitem>

2701

<listitem><para>Add a new connection, giving it the

2702

hostname or IP address of the target system.

2703

</para></listitem>

2704

<listitem><para>Provide the username and password

2705

of a qualified user (a member of the 'tracing' group)

2706

or root account on the target system.

2707

</para></listitem>

2708

<listitem><para>Provide appropriate answers to whatever

2709

else is asked for e.g. 'secure storage password'

2710

can be anything you want.

2711

If you get an 'RSE Error' it may be due to proxies.

2712

It may be possible to get around the problem by

2713

changing the following setting:

2714

2715

Window | Preferences | Network Connections

2716

</literallayout>

2717

Switch 'Active Provider' to 'Direct'

</para></listitem>

</orderedlist>

</para>

</section>

</section>

<title>Documentation</title>

2726

2727

<para>

2728

You can find the primary LTTng Documentation on the

2729

<ulink url='https://lttng.org/docs/'>LTTng Documentation</ulink>

2730

site.

2731

The documentation on this site is appropriate for intermediate to

2732

advanced software developers who are working in a Linux environment

2733

and are interested in efficient software tracing.

</para>

<para>

For information on LTTng in general, visit the

2738

<ulink url='http://lttng.org/lttng2.0'>LTTng Project</ulink>

2739

site.

2740

You can find a "Getting Started" link on this site that takes

2741

you to an LTTng Quick Start.

</para>

<para>

Finally, you can access extensive help information on how to use

2746

the LTTng plug-in to search and analyze captured traces via the

2747

Eclipse help system:

2748

2749

Help | Help Contents | LTTng Plug-in User Guide

</literallayout>

</para>

</section>

</section>

<title>blktrace</title>

2757

2758

<para>

2759

blktrace is a tool for tracing and reporting low-level disk I/O.

2760

blktrace provides the tracing half of the equation; its output can

2761

be piped into the blkparse program, which renders the data in a

2762

human-readable form and does some basic analysis:

</para>

<title>Setup</title>

<para>

For this section, we'll assume you've already performed the

2770

basic setup outlined in the

2771

"<link linkend='profile-manual-general-setup'>General Setup</link>"

section.

</para>

<para>

blktrace is an application that runs on the target system.

2777

You can run the entire blktrace and blkparse pipeline on the

2778

target, or you can run blktrace in 'listen' mode on the target

2779

and have blktrace and blkparse collect and analyze the data on

2780

the host (see the

2781

"<link linkend='using-blktrace-remotely'>Using blktrace Remotely</link>"

2782

section below).

2783

For the rest of this section we assume you've ssh'ed to the

2784

host and will be running blkrace on the target.

</para>

</section>

<title>Basic Usage</title>

2790

2791

<para>

2792

To record a trace, simply run the 'blktrace' command, giving it

2793

the name of the block device you want to trace activity on:

2794

2795

root@crownbay:~# blktrace /dev/sdc

2796

</literallayout>

2797

In another shell, execute a workload you want to trace.

2798

2799

root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>; sync

2800

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

2801

linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA

2802

</literallayout>

2803

Press Ctrl-C in the blktrace shell to stop the trace. It will

2804

display how many events were logged, along with the per-cpu file

2805

sizes (blktrace records traces in per-cpu kernel buffers and

2806

simply dumps them to userspace for blkparse to merge and sort

later).

^C=== sdc ===

CPU 0: 7082 events, 332 KiB data

2811

CPU 1: 1578 events, 74 KiB data

2812

Total: 8660 events (dropped 0), 406 KiB data

2813

</literallayout>

2814

If you examine the files saved to disk, you see multiple files,

2815

one per CPU and with the device name as the first part of the

2816

filename:

2817

2818

root@crownbay:~# ls -al

2819

drwxr-xr-x 6 root root 1024 Oct 27 22:39 .

2820

drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..

2821

-rw-r--r-- 1 root root 339938 Oct 27 22:40 sdc.blktrace.0

2822

-rw-r--r-- 1 root root 75753 Oct 27 22:40 sdc.blktrace.1

2823

</literallayout>

2824

To view the trace events, simply invoke 'blkparse' in the

2825

directory containing the trace files, giving it the device name

2826

that forms the first part of the filenames:

2827

2828

root@crownbay:~# blkparse sdc

2829

2830

8,32 1 1 0.000000000 1225 Q WS 3417048 + 8 [jbd2/sdc-8]

2831

8,32 1 2 0.000025213 1225 G WS 3417048 + 8 [jbd2/sdc-8]

2832

8,32 1 3 0.000033384 1225 P N [jbd2/sdc-8]

2833

8,32 1 4 0.000043301 1225 I WS 3417048 + 8 [jbd2/sdc-8]

2834

8,32 1 0 0.000057270 0 m N cfq1225 insert_request

2835

8,32 1 0 0.000064813 0 m N cfq1225 add_to_rr

2836

8,32 1 5 0.000076336 1225 U N [jbd2/sdc-8] 1

2837

8,32 1 0 0.000088559 0 m N cfq workload slice:150

2838

8,32 1 0 0.000097359 0 m N cfq1225 set_active wl_prio:0 wl_type:1

2839

8,32 1 0 0.000104063 0 m N cfq1225 Not idling. st->count:1

2840

8,32 1 0 0.000112584 0 m N cfq1225 fifo= (null)

2841

8,32 1 0 0.000118730 0 m N cfq1225 dispatch_insert

2842

8,32 1 0 0.000127390 0 m N cfq1225 dispatched a request

2843

8,32 1 0 0.000133536 0 m N cfq1225 activate rq, drv=1

2844

8,32 1 6 0.000136889 1225 D WS 3417048 + 8 [jbd2/sdc-8]

2845

8,32 1 7 0.000360381 1225 Q WS 3417056 + 8 [jbd2/sdc-8]

2846

8,32 1 8 0.000377422 1225 G WS 3417056 + 8 [jbd2/sdc-8]

2847

8,32 1 9 0.000388876 1225 P N [jbd2/sdc-8]

2848

8,32 1 10 0.000397886 1225 Q WS 3417064 + 8 [jbd2/sdc-8]

2849

8,32 1 11 0.000404800 1225 M WS 3417064 + 8 [jbd2/sdc-8]

2850

8,32 1 12 0.000412343 1225 Q WS 3417072 + 8 [jbd2/sdc-8]

2851

8,32 1 13 0.000416533 1225 M WS 3417072 + 8 [jbd2/sdc-8]

2852

8,32 1 14 0.000422121 1225 Q WS 3417080 + 8 [jbd2/sdc-8]

2853

8,32 1 15 0.000425194 1225 M WS 3417080 + 8 [jbd2/sdc-8]

2854

8,32 1 16 0.000431968 1225 Q WS 3417088 + 8 [jbd2/sdc-8]

2855

8,32 1 17 0.000435251 1225 M WS 3417088 + 8 [jbd2/sdc-8]

2856

8,32 1 18 0.000440279 1225 Q WS 3417096 + 8 [jbd2/sdc-8]

2857

8,32 1 19 0.000443911 1225 M WS 3417096 + 8 [jbd2/sdc-8]

2858

8,32 1 20 0.000450336 1225 Q WS 3417104 + 8 [jbd2/sdc-8]

2859

8,32 1 21 0.000454038 1225 M WS 3417104 + 8 [jbd2/sdc-8]

2860

8,32 1 22 0.000462070 1225 Q WS 3417112 + 8 [jbd2/sdc-8]

2861

8,32 1 23 0.000465422 1225 M WS 3417112 + 8 [jbd2/sdc-8]

2862

8,32 1 24 0.000474222 1225 I WS 3417056 + 64 [jbd2/sdc-8]

2863

8,32 1 0 0.000483022 0 m N cfq1225 insert_request

2864

8,32 1 25 0.000489727 1225 U N [jbd2/sdc-8] 1

2865

8,32 1 0 0.000498457 0 m N cfq1225 Not idling. st->count:1

2866

8,32 1 0 0.000503765 0 m N cfq1225 dispatch_insert

2867

8,32 1 0 0.000512914 0 m N cfq1225 dispatched a request

2868

8,32 1 0 0.000518851 0 m N cfq1225 activate rq, drv=2

.

.

.

8,32 0 0 58.515006138 0 m N cfq3551 complete rqnoidle 1

2873

8,32 0 2024 58.516603269 3 C WS 3156992 + 16 [0]

2874

8,32 0 0 58.516626736 0 m N cfq3551 complete rqnoidle 1

2875

8,32 0 0 58.516634558 0 m N cfq3551 arm_idle: 8 group_idle: 0

2876

8,32 0 0 58.516636933 0 m N cfq schedule dispatch

2877

8,32 1 0 58.516971613 0 m N cfq3551 slice expired t=0

2878

8,32 1 0 58.516982089 0 m N cfq3551 sl_used=13 disp=6 charge=13 iops=0 sect=80

2879

8,32 1 0 58.516985511 0 m N cfq3551 del_from_rr

2880

8,32 1 0 58.516990819 0 m N cfq3551 put_queue

2881

2882

CPU0 (sdc):

2883

Reads Queued: 0, 0KiB Writes Queued: 331, 26,284KiB

2884

Read Dispatches: 0, 0KiB Write Dispatches: 485, 40,484KiB

2885

Reads Requeued: 0 Writes Requeued: 0

2886

Reads Completed: 0, 0KiB Writes Completed: 511, 41,000KiB

2887

Read Merges: 0, 0KiB Write Merges: 13, 160KiB

2888

Read depth: 0 Write depth: 2

2889

IO unplugs: 23 Timer unplugs: 0

2890

CPU1 (sdc):

2891

Reads Queued: 0, 0KiB Writes Queued: 249, 15,800KiB

2892

Read Dispatches: 0, 0KiB Write Dispatches: 42, 1,600KiB

2893

Reads Requeued: 0 Writes Requeued: 0

2894

Reads Completed: 0, 0KiB Writes Completed: 16, 1,084KiB

2895

Read Merges: 0, 0KiB Write Merges: 40, 276KiB

2896

Read depth: 0 Write depth: 2

2897

IO unplugs: 30 Timer unplugs: 1

2898

2899

Total (sdc):

2900

Reads Queued: 0, 0KiB Writes Queued: 580, 42,084KiB

2901

Read Dispatches: 0, 0KiB Write Dispatches: 527, 42,084KiB

2902

Reads Requeued: 0 Writes Requeued: 0

2903

Reads Completed: 0, 0KiB Writes Completed: 527, 42,084KiB

2904

Read Merges: 0, 0KiB Write Merges: 53, 436KiB

2905

IO unplugs: 53 Timer unplugs: 1

2906

2907

Throughput (R/W): 0KiB/s / 719KiB/s

2908

Events (sdc): 6,592 entries

2909

Skips: 0 forward (0 - 0.0%)

2910

Input file sdc.blktrace.0 added

2911

Input file sdc.blktrace.1 added

2912

</literallayout>

2913

The report shows each event that was found in the blktrace data,

2914

along with a summary of the overall block I/O traffic during

2915

the run. You can look at the

2916

<ulink url='http://linux.die.net/man/1/blkparse'>blkparse</ulink>

2917

manpage to learn the

2918

meaning of each field displayed in the trace listing.

</para>

<para>

blktrace and blkparse are designed from the ground up to

2926

be able to operate together in a 'pipe mode' where the

2927

stdout of blktrace can be fed directly into the stdin of

2928

blkparse:

2929

2930

root@crownbay:~# blktrace /dev/sdc -o - | blkparse -i -

2931

</literallayout>

2932

This enables long-lived tracing sessions to run without

2933

writing anything to disk, and allows the user to look for

2934

certain conditions in the trace data in 'real-time' by

2935

viewing the trace output as it scrolls by on the screen or

2936

by passing it along to yet another program in the pipeline

2937

such as grep which can be used to identify and capture

2938

conditions of interest.

</para>

<para>

There's actually another blktrace command that implements

2943

the above pipeline as a single command, so the user doesn't

2944

have to bother typing in the above command sequence:

2945

2946

root@crownbay:~# btrace /dev/sdc

</literallayout>

</para>

</section>

<title>Using blktrace Remotely</title>

2953

2954

<para>

2955

Because blktrace traces block I/O and at the same time

2956

normally writes its trace data to a block device, and

2957

in general because it's not really a great idea to make

2958

the device being traced the same as the device the tracer

2959

writes to, blktrace provides a way to trace without

2960

perturbing the traced device at all by providing native

2961

support for sending all trace data over the network.

</para>

<para>

To have blktrace operate in this mode, start blktrace on

2966

the target system being traced with the -l option, along with

2967

the device to trace:

2968

2969

root@crownbay:~# blktrace -l /dev/sdc

2970

server: waiting for connections...

2971

</literallayout>

2972

On the host system, use the -h option to connect to the

2973

target system, also passing it the device to trace:

2974

2975

$ blktrace -d /dev/sdc -h 192.168.1.43

2976

blktrace: connecting to 192.168.1.43

2977

blktrace: connected!

2978

</literallayout>

2979

On the target system, you should see this:

2980

2981

server: connection from 192.168.1.43

2982

</literallayout>

2983

In another shell, execute a workload you want to trace.

2984

2985

root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>; sync

2986

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

2987

linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA

2988

</literallayout>

2989

When it's done, do a Ctrl-C on the host system to

stop the trace:

^C=== sdc ===

CPU 0: 7691 events, 361 KiB data

2994

CPU 1: 4109 events, 193 KiB data

2995

Total: 11800 events (dropped 0), 554 KiB data

2996

</literallayout>

2997

On the target system, you should also see a trace

2998

summary for the trace just ended:

2999

3000

server: end of run for 192.168.1.43:sdc

3001

=== sdc ===

3002

CPU 0: 7691 events, 361 KiB data

3003

CPU 1: 4109 events, 193 KiB data

3004

Total: 11800 events (dropped 0), 554 KiB data

3005

</literallayout>

3006

The blktrace instance on the host will save the target

3007

output inside a hostname-timestamp directory:

3008

3009

$ ls -al

3010

drwxr-xr-x 10 root root 1024 Oct 28 02:40 .

3011

drwxr-sr-x 4 root root 1024 Oct 26 18:24 ..

3012

drwxr-xr-x 2 root root 1024 Oct 28 02:40 192.168.1.43-2012-10-28-02:40:56

3013

</literallayout>

3014

cd into that directory to see the output files:

3015

3016

$ ls -l

3017

-rw-r--r-- 1 root root 369193 Oct 28 02:44 sdc.blktrace.0

3018

-rw-r--r-- 1 root root 197278 Oct 28 02:44 sdc.blktrace.1

3019

</literallayout>

3020

And run blkparse on the host system using the device name:

$ blkparse sdc

8,32 1 1 0.000000000 1263 Q RM 6016 + 8 [ls]

3025

8,32 1 0 0.000036038 0 m N cfq1263 alloced

3026

8,32 1 2 0.000039390 1263 G RM 6016 + 8 [ls]

3027

8,32 1 3 0.000049168 1263 I RM 6016 + 8 [ls]

3028

8,32 1 0 0.000056152 0 m N cfq1263 insert_request

3029

8,32 1 0 0.000061600 0 m N cfq1263 add_to_rr

3030

8,32 1 0 0.000075498 0 m N cfq workload slice:300

.

.

.

8,32 0 0 177.266385696 0 m N cfq1267 arm_idle: 8 group_idle: 0

3035

8,32 0 0 177.266388140 0 m N cfq schedule dispatch

3036

8,32 1 0 177.266679239 0 m N cfq1267 slice expired t=0

3037

8,32 1 0 177.266689297 0 m N cfq1267 sl_used=9 disp=6 charge=9 iops=0 sect=56

3038

8,32 1 0 177.266692649 0 m N cfq1267 del_from_rr

3039

8,32 1 0 177.266696560 0 m N cfq1267 put_queue

3040

3041

CPU0 (sdc):

3042

Reads Queued: 0, 0KiB Writes Queued: 270, 21,708KiB

3043

Read Dispatches: 59, 2,628KiB Write Dispatches: 495, 39,964KiB

3044

Reads Requeued: 0 Writes Requeued: 0

3045

Reads Completed: 90, 2,752KiB Writes Completed: 543, 41,596KiB

3046

Read Merges: 0, 0KiB Write Merges: 9, 344KiB

3047

Read depth: 2 Write depth: 2

3048

IO unplugs: 20 Timer unplugs: 1

3049

CPU1 (sdc):

3050

Reads Queued: 688, 2,752KiB Writes Queued: 381, 20,652KiB

3051

Read Dispatches: 31, 124KiB Write Dispatches: 59, 2,396KiB

3052

Reads Requeued: 0 Writes Requeued: 0

3053

Reads Completed: 0, 0KiB Writes Completed: 11, 764KiB

3054

Read Merges: 598, 2,392KiB Write Merges: 88, 448KiB

3055

Read depth: 2 Write depth: 2

3056

IO unplugs: 52 Timer unplugs: 0

3057

3058

Total (sdc):

3059

Reads Queued: 688, 2,752KiB Writes Queued: 651, 42,360KiB

3060

Read Dispatches: 90, 2,752KiB Write Dispatches: 554, 42,360KiB

3061

Reads Requeued: 0 Writes Requeued: 0

3062

Reads Completed: 90, 2,752KiB Writes Completed: 554, 42,360KiB

3063

Read Merges: 598, 2,392KiB Write Merges: 97, 792KiB

3064

IO unplugs: 72 Timer unplugs: 1

3065

3066

Throughput (R/W): 15KiB/s / 238KiB/s

3067

Events (sdc): 9,301 entries

3068

Skips: 0 forward (0 - 0.0%)

3069

</literallayout>

3070

You should see the trace events and summary just as

3071

you would have if you'd run the same command on the target.

</para>

</section>

<title>Tracing Block I/O via 'ftrace'</title>

3077

3078

<para>

3079

It's also possible to trace block I/O using only

3080

3081

which can be useful for casual tracing

3082

if you don't want to bother dealing with the userspace tools.

</para>

<para>

To enable tracing for a given device, use

3087

/sys/block/xxx/trace/enable, where xxx is the device name.

3088

This for example enables tracing for /dev/sdc:

3089

3090

root@crownbay:/sys/kernel/debug/tracing# echo 1 > /sys/block/sdc/trace/enable

3091

</literallayout>

3092

Once you've selected the device(s) you want to trace,

3093

selecting the 'blk' tracer will turn the blk tracer on:

3094

3095

root@crownbay:/sys/kernel/debug/tracing# cat available_tracers

3096

blk function_graph function nop

3097

3098

root@crownbay:/sys/kernel/debug/tracing# echo blk > current_tracer

3099

</literallayout>

3100

Execute the workload you're interested in:

3101

3102

root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt

3103

</literallayout>

3104

And look at the output (note here that we're using

3105

'trace_pipe' instead of trace to capture this trace -

3106

this allows us to wait around on the pipe for data to

3107

appear):

3108

3109

root@crownbay:/sys/kernel/debug/tracing# cat trace_pipe

3110

cat-3587 [001] d..1 3023.276361: 8,32 Q R 1699848 + 8 [cat]

3111

cat-3587 [001] d..1 3023.276410: 8,32 m N cfq3587 alloced

3112

cat-3587 [001] d..1 3023.276415: 8,32 G R 1699848 + 8 [cat]

3113

cat-3587 [001] d..1 3023.276424: 8,32 P N [cat]

3114

cat-3587 [001] d..2 3023.276432: 8,32 I R 1699848 + 8 [cat]

3115

cat-3587 [001] d..1 3023.276439: 8,32 m N cfq3587 insert_request

3116

cat-3587 [001] d..1 3023.276445: 8,32 m N cfq3587 add_to_rr

3117

cat-3587 [001] d..2 3023.276454: 8,32 U N [cat] 1

3118

cat-3587 [001] d..1 3023.276464: 8,32 m N cfq workload slice:150

3119

cat-3587 [001] d..1 3023.276471: 8,32 m N cfq3587 set_active wl_prio:0 wl_type:2

3120

cat-3587 [001] d..1 3023.276478: 8,32 m N cfq3587 fifo= (null)

3121

cat-3587 [001] d..1 3023.276483: 8,32 m N cfq3587 dispatch_insert

3122

cat-3587 [001] d..1 3023.276490: 8,32 m N cfq3587 dispatched a request

3123

cat-3587 [001] d..1 3023.276497: 8,32 m N cfq3587 activate rq, drv=1

3124

cat-3587 [001] d..2 3023.276500: 8,32 D R 1699848 + 8 [cat]

3125

</literallayout>

3126

And this turns off tracing for the specified device:

3127

3128

root@crownbay:/sys/kernel/debug/tracing# echo 0 > /sys/block/sdc/trace/enable

</literallayout>

</para>

</section>

</section>

<title>Documentation</title>

3136

3137

<para>

3138

Online versions of the man pages for the commands discussed

3139

in this section can be found here:

3140

3141

<listitem><para><ulink url='http://linux.die.net/man/8/blktrace'>http://linux.die.net/man/8/blktrace</ulink>

3142

</para></listitem>

3143

<listitem><para><ulink url='http://linux.die.net/man/1/blkparse'>http://linux.die.net/man/1/blkparse</ulink>

3144

</para></listitem>

3145

<listitem><para><ulink url='http://linux.die.net/man/8/btrace'>http://linux.die.net/man/8/btrace</ulink>

</para></listitem>

</itemizedlist>

</para>

<para>

The above manpages, along with manpages for the other

3152

blktrace utilities (btt, blkiomon, etc) can be found in the

3153

/doc directory of the blktrace tools git repo:

3154

3155

$ git clone git://git.kernel.dk/blktrace.git

</literallayout>

</para>

</section>

</section>

</chapter>

<!--

vim: expandtab tw=80 ts=4

3163

-->