Perf is an interface to access the performance monitoring unit (PMU) of a processor and to record and display software events such as page faults. It supports system-wide, per-thread, and KVM virtualization guest monitoring.
You can store resulting information in a report. This report contains information about, for example, instruction pointers or what code a thread was executing.
Perf consists of two parts:
Code integrated into the Linux kernel that instructs the hardware.
The perf
user space utility that allows you to use the
kernel code and helps you analyze gathered data.
Performance monitoring means collecting information related to how an application or system performs. This information can be obtained either through software-based means or from the CPU or chipset. Perf integrates both of these methods.
Many modern processors contain a performance monitoring unit (PMU). The design and functionality of a PMU is CPU-specific. For example, the number of registers, counters and features supported varies by CPU implementation.
Each PMU model consists of a set of registers: the performance monitor configuration (PMC) and the performance monitor data (PMD). Both can be read, but only PMCs are writable. These registers store configuration information and data.
Perf supports several profiling modes:
Counting. Count the number of occurrences of an event.
Event-based sampling. A less exact way of counting: A sample is recorded whenever a certain threshold number of events has occurred.
Time-based sampling. A less exact way of counting: A sample is recorded in a defined frequency.
Instruction-based sampling (AMD64 only). The processor follows instructions appearing in a given interval and samples which events they produce. This allows following up on individual instructions and seeing which of them is critical to performance.
The Perf kernel code is already included with the default kernel. To be able to use the user space utility, install the package perf.
To gather the required information, the perf
tool has
several subcommands. This section gives an overview of the most often used
commands.
To see help in the form of a man page for any of the subcommands, use either
perf help
SUBCOMMAND
or
man perf-
SUBCOMMAND.
perf stat
Start a program and create a statistical overview that is displayed after
the program quits.
perf stat
is used to count events.
perf record
Start a program and create a report with performance counter information.
The report is stored as perf.data
in the current
directory.
perf record
is used to sample events.
perf report
Display a report that was previously created with
perf record
.
perf annotate
Display a report file and an annotated version of the executed code. If debug symbols are installed, the source code is also displayed.
perf list
List event types that Perf can report with the current kernel and with
your CPU.
You can filter event types by category. For example, to see hardware
events only, use perf list hw
.
The man page for perf_event_open
has short descriptions
for the most important events.
For example, to find a description of the event
branch-misses
, search for
BRANCH_MISSES
(note the spelling differences):
>
man
perf_event_open |grep
-A5 BRANCH_MISSES
Sometimes, events may be ambiguous. The lowercase hardware event names are not the names of raw hardware events but instead the names of aliases created by Perf. These aliases map to differently named but similarly defined hardware events on each supported processor.
For example, the cpu-cycles
event is mapped to
the hardware event UNHALTED_CORE_CYCLES
on
Intel processors.
On AMD processors, however, it is mapped to the hardware event
CPU_CLK_UNHALTED
.
Perf also allows measuring raw events specific to your hardware. To look up their descriptions, see the Architecture Software Developer's Manual of your CPU vendor. The relevant documents for AMD64/Intel 64 processors are linked to in Section 6.7, “More information”.
perf top
Display system activity as it happens.
perf trace
This command behaves similarly to strace
.
With this subcommand, you can see which system calls are executed by a
particular thread or process and which signals it receives.
To count the number of occurrences of an event, such as those displayed by
perf list
, use:
#
perf
stat -e EVENT -a
To count multiple types of events at once, list them separated by commas.
For example, to count cpu-cycles
and
instructions
, use:
#
perf
stat -e cpu-cycles,instructions -a
To stop the session, press Ctrl–C.
You can also count the number of occurrences of an event within a particular time:
#
perf
stat -e EVENT -a -- sleep TIME
Replace TIME by a value in seconds.
There are several ways to sample events specific to a particular command:
To create a report for a newly invoked command, use:
#
perf
record COMMAND
Then, use the started process normally. When you quit the process, the Perf session also stops.
To create a report for the entire system while a newly invoked command is running, use:
#
perf
record -a COMMAND
Then, use the started process normally. When you quit the process, the Perf session also stops.
To create a report for an already running process, use:
#
perf
record -p PID
Replace PID with a process ID. To stop the session, press Ctrl–C.
Now you can view the gathered data (perf.data
)
using:
>
perf
report
This opens a pseudo-graphical interface. To receive help, press H. To quit, press Q.
If you prefer a graphical interface, try the GTK+ interface of Perf:
>
perf
report --gtk
However, the GTK+ interface is limited in functionality.
This chapter only provides a short overview. Refer to the following links for more information:
The project home page.
It also features a tutorial on using perf
.
Unofficial page with many one-line examples of how to use
perf
.
Unofficial page with several resources, primarily relating to the Linux kernel code of Perf and its API. This page includes, for example, a CPU compatibility table and a programming guide.
The Intel Architectures Software Developer's Manual, Volume 3B.
The AMD Architecture Programmer's Manual, Volume 2.
Consult this chapter for other performance optimizations.