Applies to openSUSE Leap 15

7 OProfile—System-Wide Profiler

Abstract

OProfile is a profiler for dynamic program analysis. It investigates the behavior of a running program and gathers information. This information can be viewed and gives hints for further optimization.

It is not necessary to recompile or use wrapper libraries to use OProfile. Not even a kernel patch is needed. Usually, when profiling an application, a small overhead is expected, depending on the workload and sampling frequency.

7.1 Conceptual Overview

OProfile consists of a kernel driver and a daemon for collecting data. It uses the hardware performance counters provided on many processors. OProfile is capable of profiling all code including the kernel, kernel modules, kernel interrupt handlers, system shared libraries, and other applications.

Modern processors support profiling through the hardware by performance counters. Depending on the processor, there can be many counters and each of these can be programmed with an event to count. Each counter has a value which determines how often a sample is taken. The lower the value, the more often it is used.

During the post-processing step, all information is collected and instruction addresses are mapped to a function name.

7.2 Installation and Requirements

To use OProfile, install the oprofile package.

It is useful to install the *-debuginfo package for the respective application you want to profile. If you want to profile the kernel, you need the debuginfo package as well.

7.3 Available OProfile Utilities

OProfile contains several utilities to handle the profiling process and its profiled data. The following list is a short summary of programs used in this chapter:

opannotate

Outputs annotated source or assembly listings mixed with profile information. An annotated report can be used in combination with addr2line to identify the source file and line where hotspots potentially exist. See man addr2line for more information.

opcontrol

Controls the profiling sessions (start or stop), dumps profile data, and sets up parameters.

ophelp

Lists available events with short descriptions.

opimport

Converts sample database files from a foreign binary format to the native format.

opreport

Generates reports from profiled data.

7.4 Using OProfile

With OProfile, you can profile both the kernel and applications. When profiling the kernel, tell OProfile where to find the vmlinuz* file. Use the --vmlinux option and point it to vmlinuz* (usually in /boot). If you need to profile kernel modules, OProfile does this by default. However, make sure you read http://oprofile.sourceforge.net/doc/kernel-profiling.html.

Applications usually do not need to profile the kernel, therefore you should use the --no-vmlinux option to reduce the amount of information.

7.4.1 Creating a Report

Starting the daemon, collecting data, stopping the daemon, and creating a report.

  1. Open a shell and log in as root.

  2. Decide if you want to profile with or without the Linux kernel:

    1. Profile With the Linux Kernel.  Execute the following commands, because opcontrol can only work with uncompressed images:

      tux > cp /boot/vmlinux-`uname -r`.gz /tmp
      tux > gunzip /tmp/vmlinux*.gz
      tux > opcontrol --vmlinux=/tmp/vmlinux*
    2. Profile Without the Linux Kernel.  Use the following command:

      root # opcontrol --no-vmlinux

      To see which functions call other functions in the output, additionally use the --callgraph option and set a maximum DEPTH:

      root # opcontrol --no-vmlinux --callgraph DEPTH
  3. Start the OProfile daemon:

    root # opcontrol --start
    Using 2.6+ OProfile kernel interface.
    Using log file /var/lib/oprofile/samples/oprofiled.log
    Daemon started.
    Profiler running.
  4. Now start the application you want to profile.

  5. Stop the OProfile daemon:

    root # opcontrol --stop
  6. Dump the collected data to /var/lib/oprofile/samples:

    root # opcontrol --dump
  7. Create a report:

    root # opreport
    Overflow stats not available
    CPU: CPU with timer interrupt, speed 0 MHz (estimated)
    Profiling through timer interrupt
              TIMER:0|
      samples|      %|
    ------------------
        84877 98.3226 no-vmlinux
    ...
  8. Shut down the oprofile daemon:

    root # opcontrol --shutdown

7.4.2 Getting Event Configurations

The general procedure for event configuration is as follows:

  1. Use first the events CPU-CLK_UNHALTED and INST_RETIRED to find optimization opportunities.

  2. Use specific events to find bottlenecks. To list them, use the command opcontrol --list-events.

If you need to profile certain events, first check the available events supported by your processor with the ophelp command (example output generated from Intel Core i5 CPU):

root # ophelp
oprofile: available events for CPU type "Intel Architectural Perfmon"

See Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3B (Document 253669) Chapter 18 for architectural perfmon events
This is a limited set of fallback events because oprofile does not know your CPU
CPU_CLK_UNHALTED: (counter: all))
        Clock cycles when not halted (min count: 6000)
INST_RETIRED: (counter: all))
        number of instructions retired (min count: 6000)
LLC_MISSES: (counter: all))
        Last level cache demand requests from this core that missed the LLC (min count: 6000)
        Unit masks (default 0x41)
        ----------
        0x41: No unit mask
LLC_REFS: (counter: all))
        Last level cache demand requests from this core (min count: 6000)
        Unit masks (default 0x4f)
        ----------
        0x4f: No unit mask
BR_MISS_PRED_RETIRED: (counter: all))
        number of mispredicted branches retired (precise) (min count: 500)

You can get the same output from opcontrol --list-events.

Specify the performance counter events with the option --event. Multiple options are possible. This option needs an event name (from ophelp) and a sample rate, for example:

root # opcontrol --event=CPU_CLK_UNHALTED:100000
Warning
Warning: Setting Sampling Rates with CPU_CLK_UNHALTED

Setting low sampling rates can seriously impair the system performance while high sample rates can disrupt the system to such a high degree that the data is useless. It is recommended to tune the performance metric for being monitored with and without OProfile and to experimentally determine the minimum sample rate that disrupts the performance the least.

7.5 Using OProfile's GUI

The GUI for OProfile can be started as root with oprof_start, see Figure 7.1, “GUI for OProfile”. Select your events and change the counter, if necessary. Every green line is added to the list of checked events. Hover the mouse over the line to see a help text in the status line below. Use the Configuration tab to set the buffer and CPU size, the verbose option and others. Click Start to execute OProfile.

GUI for OProfile
Figure 7.1: GUI for OProfile

7.6 Generating Reports

Before generating a report, make sure OProfile has dumped your data to the /var/lib/oprofile/samples directory using the command opcontrol --dump. A report can be generated with the commands opreport or opannotate.

Calling opreport without any options gives a complete summary. With an executable as an argument, retrieve profile data only from this executable. If you analyze applications written in C++, use the --demangle smart option.

The opannotate generates output with annotations from source code. Run it with the following options:

root # opannotate --source \
   --base-dirs=BASEDIR \
   --search-dirs= \
   --output-dir=annotated/ \
   /lib/libfoo.so

The option --base-dir contains a comma separated list of paths which is stripped from debug source files. These paths were searched prior to looking in --search-dirs. The --search-dirs option is also a comma separated list of directories to search for source files.

Note
Note: Inaccuracies in Annotated Source

Because of compiler optimization, code can disappear and appear in a different place. Use the information in http://oprofile.sourceforge.net/doc/debug-info.html to fully understand its implications.

7.7 For More Information

This chapter only provides a short overview. Refer to the following links for more information:

http://oprofile.sourceforge.net

The project home page.

Manpages

Details descriptions about the options of the different tools.

/usr/share/doc/packages/oprofile/oprofile.html

Contains the OProfile manual.

http://developer.intel.com/

Architecture reference for Intel processors.

http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC/

Architecture reference for PowerPC64 processors in IBM iSeries, pSeries, and Blade server systems.

Print this page