DCPI documentation

»

DCPI

Site information

» Send us your comments

Installation

» Download DCPI
» Installing DCPI

Product information

» Frequently asked questions
» Documentation
» Publications
customer times newsletter link

Detailed man pages for Tru64 UNIX and Windows NT.

The HP DCPI system permits low-overhead continuous profiling of all executables, including the kernel. It is based on periodic sampling using the Alpha performance counter hardware. Profiles containing samples for each executable image (including shared libraries) are stored in a user-specified directory.

Tools are provided to analyze profiles and produce a breakdown of all cpu time by image, and by procedure within images. In addition, detailed information can be produced showing the total time spent executing each individual instruction in a procedure. Additional analysis tools also determine the average number of cycles taken by each instruction, the approximate number of times the instruction was executed, and the possible reasons for any cycles spent stalled not executing instructions (e.g., waiting for data to be fetched from memory).

The material below provides an overview of the system, including examples of its use. Detailed man pages for Tru64 UNIX and Windows NT are also available; they give more information about each program in the system, including all command-line options, limitations, and known bugs. The overview below will help you get started using the system, but we recommend that you read all the man pages as well.

System Structure

At a high level, the Continuous Profiling Infrastructure consists of four pieces:

  • A device driver that services interrupts from the cpu performance event counters, recording program-counter samples.
  • Several sources of information about what executable images are loaded in each running process and where they are loaded. This makes it possible to associate program-counter samples for a process with instructions in image files on disk.
  • A daemon process and tools to control it. The daemon extracts samples from the driver and builds an on-disk database organized by executable image.
  • A collection of tools that analyze profiles and their associated executable images, producing summary output in several different forms and at several different levels of detail.

System Components

Device Driver

The pcount device driver must be installed prior to data collection. The device driver acts as an interface between the Alpha performance counter hardware and the daemon process. It services interrupts from the performance counters, and on each interrupt records the process id and program counter value for the interrupted program. These program-counter samples are buffered in the kernel until they are extracted by dcpid.

Determining Loadmap Information

The profiling system uses different sources of information to determine which executable images are loaded in each process and where they are loaded. On Tru64 UNIX, it uses a modified system dynamic loader, a hook in the kernel exec path, system process tables, and dcpiscan. The loader and the exec hook are used continuously; the system process tables are examined each time the daemon starts running; and dcpiscan is typically run once at setup and then infrequently when the images on disk change. All four sources of information provide pathnames for images; these are stored with profiles so that the analysis tools can quickly find the images associated with each profile.

  • A modified version of the system dynamic loader informs the daemon whenever a dynamically linked program starts or loads a shared library. The information provided by the loader to the daemon includes the pathname of the image being loaded, the process it is loaded into, and the address at which it is loaded.
  • The pcount driver installs a hook in the kernel exec path that captures information about all statically linked images.
  • When the daemon process starts up, it scans all active processes and their mapped regions to identify the images loaded in processes that started before the daemon was started.
  • The program dcpiscan is used to scan filesystem directories for executables and build a mapping from pathnames to executables. A default mapping for common Tru64 UNIX executables is compiled into the system; dcpiscan is usually run when the profiling system is installed to provide more accurate identification of site-specific binaries. Since the modified dynamic loader includes pathnames in the information it provides to the daemon, dcpiscan is mostly useful for obtaining pathnames of statically linked images.

The modified loader and the exec hook together ensure that dcpid knows about virtually all images loaded into each process.

Building a Profile Database

The dcpid daemon extracts program-counter samples from the device driver and stores them in an on-disk database. The database resides in a user-specified directory, and may be shared by multiple machines. All samples are organized into non-overlapping epochs, each of which contains samples for some time interval. A new epoch is started (and the previous epoch terminated) using the dcpiepoch command.

Each epoch occupies a separate directory; each epoch directory contains subdirectories for each platform sharing the database. (A platform typically corresponds to an individual host, but can be configured using the file hosts in the top-level database directory to correspond to a user-specified collection of machines.) Each platform directory contains files with profile information, typically one file per image. See the dcpiformat(4)man page for details of the file format.

Samples are buffered in the device driver and in dcpid. Buffered samples are flushed out periodically and also when an epoch is terminated. To ensure consistent results, the analysis tools should be run only on a completed epoch.

The dcpictl utility can be used to control dcpid. It provides commands to terminate an epoch and begin a new one; to shut down monitoring; to flush all buffered samples to the on-disk database; and to inform dcpid manually about an image loaded into a process. (The latter is useful only in unusual circumstances; see the man page for dcpictl for details.)

Analysis Tools

During an epoch, samples are collected for all running images, including all applications, shared libraries, and the kernel. There are several ways to analyze the profile data for an epoch, from a coarse-grained accounting for each image to a fine-grained analysis of each instruction. Output from the tools ranges from a simple prof-style listing of time spent in each image to basic-block flowgraphs of each procedure annotated with information such as sample counts, the average number of cycles taken by each instruction, and the possible causes of stall cycles.

  • At a coarse level of detail, dcpiprof shows the time spent in any set of images active during an epoch. This time can be broken down either by image or by procedure within each image.
  • At a fine level of detail, a basic-block flowgraph can be produced for one or more procedures, showing a control-flow graph of the machine instructions in each procedure annotated with sample counts for each instruction, the source code associated with each basic block, and an analysis of the number of stall cycles for each instruction and the reasons for each stall.
    • dcpicalc produces a control-flow graph with the execution frequency of each basic block, the average number of cycles taken by each instruction, the possible reasons for each stalled cycle, and a summary of how time was spent in the procedure.
    • dcpisource annotates a control-flow graph (produced by dcpicalc with source code for each basic block.
    • dcpi2ps takes a control-flow graph from any of the tools above and produces a postscript file for viewing or printing.

    These tools are typically run in a pipeline, e.g.:

    dcpicalc | dcpisource | dcpi2ps

    (with appropriate flags and arguments -- see the man pages and examples below for details).

  • dcpilist lists the contents of a procedure annotated with samples collecting during profiling and with the average number of cycles required to execute each line of code. The listing can contain either machine instructions, or source lines, or both.
  • dcpiwhatcg produces a program-level (i.e., entire image, not just a single procedure) summary breakdown of where time has been spent (percent of cycles spent in, e.g., memory delays, static stalls, branch mispredicts, and useful execution).
  • dcpidiff and dcpistats compare sets of profile data. dcpidiff compares two sets of profiles for a single procedure, highlighting basic blocks or source lines with the largest differences. dcpistats (currently available only for Tru64 UNIX) compares multiple sets of raw sample counts and prints various statistics about them; it is useful for comparing variations across multiple runs of the same program or for comparing differences between slightly different versions of a program.
  • dcpicat prints the contents of one or more profile files in an ASCII format. This is useful mostly to people debugging the Continuous Profiling Infrastructure.

Miscellaneous Utilities

Several other utilities are provided. They are currently available only for Tru64 UNIX.

  • dcpi2pix produces pixie-format output from the profile database, thus enabling existing tools that take pixie-format input to be driven from the profile database.
  • dcpix instruments an executable to measure execution frequencies for basic blocks and control flow edges directly. The output can be used by dcpicalc instead of estimating frequencies from sample counts. (Note: the typical mode of operation for this profiling system does not require instrumenting executables; using dcpix to instrument executables can be useful in rare circumstances when dcpicalc produces poor estimates, or when evaluating the quality of the estimates produces by dcpicalc.)
  • dcpisumxct aggregates execution counts measured using dcpix from multiple runs of an instrumented program.
  • dcpicc compiles C programs to produce object code that helps dcpisource in identifying which source token each instruction corresponds to.
  • dcpikdiff creates a new image based on both vmunix and kmem(7) that captures the true running kernel image after Tru64 UNIX dynamically patches itself for the particular system it is running on.
  • dcpiversion prints the version number of the installed release. This is useful when reporting bugs or other problems so that the developers of the system know what version you are using.
  • dcpiuninstall uninstalls the profiling system, removing all binaries and man pages, and replacing the system dynamic loader with the original version (which was saved during the installation process). Note: dcpiuninstall does not remove profile databases, nor does it remove the device driver from the kernel.

Man Pages

Detailed man pages for Tru64 Unix and Windows NT are available for all programs in the Continuous Profiling Infrastructure.

Examples

Examples illustrating how to use the various tools on Tru64 UNIX and Windows NT are available. In addition, the man pages for many of the tools have detailed examples showing how to interpret their output.

Limitations

  • The profiling system works on HP Alpha processors running Tru64 UNIX or Windows NT. OpenVMS is not yet supported.
  • On Tru64 UNIX, for processes that use the exec(2) system call (or its variants), samples gathered prior to an exec() call may be charged to an image that is running after exec() returns. This is usually not a serious problem: in the common case, a process will call exec() soon after being forked, and will not call exec() again. Since only a few samples are gathered prior to the call to exec(), only a few samples can be charged to the wrong image.

Bugs

  • A kernel bug exists in Tru64 UNIX that can occasionally cause dcpid to crash, and can even crash the kernel in rare cases. Running dcpid with the -b flag prevents dcpid from doing its initial scan of the system process tables and hence from triggering the bug. Note that this will also prevent dcpid from determining what images are loaded in processes that are already running when it starts up.

    Unlike earlier versions of dcpid, which performed frequent scans of system tables to identify statically linked executables, the current version only performs a single scan during initialization. Thus, it is extremely unlikely that this problem will be encountered. It is generally worth the risk of performing the initial scan in order to obtain useful information about the processes that were already executing when dcpid was started.