 |
» |
|
|
|
 |
 |
dcpid(1)
NAME
dcpid - DCPI daemon.
SYNOPSIS
dcpid [options] database
DESCRIPTION
Dcpid continuously extracts raw samples from the specified performance
counter device, associates them with their corresponding images, and updates
disk-based image profiles in the specified profile database. A new profile
database can be created by specifying an empty directory.
Dcpid also supports an alternate operating mode which allows "Dynamic Access
to DCPI Data (DADD)". When run in this mode, the daemon does not write data
to a profile database, but delivers it interactively to a registered set
of client programs through the use of shared memory.
Dcpid is normally terminated using dcpiquit(1).
Dcpid shuts down gracefully in response to termination signals, flushing
all unsaved samples to their corresponding profiles before terminating.
Dcpid must be executed with root privileges. If desired, dcpid can
be installed as a setuid-root program.
Dcpid can be configured to automatically delete old epochs and profiles
(via the -gc option.) This deletion occurs every day sometime between 3am
and 5am. All epochs that were started more than seven days ago, and are not
one of the three latest epochs are deleted. Therefore the three most recent
epochs are always preserved, and all epochs are preserved for at least seven
days after their creation.
EVENT OPTIONS
- -slot type[:period]+...+type[:period]
-
- -event type[:period]
-
- -t is shorthand for -event
- The -event option is deprecated, and may be used to specify
only a single event to avoid confusion with earlier versions of dcpid.
The -slot option selects a set of event types to monitor simultaneously
using the set of available hardware performance counters. Each event
includes a type and an optional period specification. The -slot option
may be repeated to specify a sequence of slots which are time-multiplexed
onto the hardware counters, allowing a larger set of events to be monitored.
If no -slot arguments are specified on the command line, the
default is to monitor cycles and imiss events on Alpha
21064/EV4 and 21164/EV5 processors and to multiplex between cycles and ProfileMe statistics
on 21264a/EV67 and later processors; using default sampling periods in
either case.
Event Types
See dcpiprofileme(1) for information
on ProfileMe event types supported on Alpha 21264a/EV67 and later
processors. Non-ProfileMe event types supported on Alpha 21264a/EV67
and later processors are:
- cycles = processor cycles (c1)
- retires = retired instructions (c0)
- replaytrap = mbox replay traps (c1)
- bmiss = bcache misses or long-latency probes (c1)
Event types supported on Alpha 21264/EV6 processors:
- cycles = processor cycles (c0 or c1)
- retires = retired instructions (c0)
- replaytrap = mbox replay traps (c1)
- itbmiss = retired itb misses (c1)
- dtbmiss = retired dtb single misses (c1)
- dtbdblmiss = retired dtb double misses (c1)
- retcondbr = retired conditional branches (c1)
- retunalign = retired unaligned traps (c1)
Event types supported on both 21064/EV4 and 21164/EV5 Alpha processors:
- cycles = processor cycles
- issues = instruction issues
- nonissue = non-issue cycles
- imiss = instruction cache misses
- dmiss = data cache misses
- branchmp = branch mispredicts
- flow = flow control changes (see Caveats below)
- pipelinedry = pipeline dry cycles (no valid I-stream data)
- issue2 = cycles with 2 issues
- intop = integer operations (excluding loads/stores)
- fpop = floating point operations (excluding loads/stores/br)
- load = load instructions
- store = store instructions
Additional event types supported on Alpha 21164/EV5 processors:
- itbmiss = instruction translation buffer misses
- dtbmiss = data translation buffer misses
- pcmp = PC mispredicts
- iaccess = instruction cache accesses
- daccess = data cache accesses
- smiss = on-chip secondary cache misses
- srmiss = on-chip secondary cache read misses
- swmiss = on-chip secondary cache write misses
- saccess = on-chip secondary cache accesses
- sread = on-chip secondary cache reads
- swrite = on-chip secondary cache writes
- svictim = on-chip secondary cache victims
- sshwrite = on-chip secondary cache shared writes
- bmiss = board-level cache misses
- bhit = board-level cache hits
- bvictim = board-level cache victims
- bref = board-level cache references
- sysinv = system invalidates
- sysread = system read requests
- sysreq = system requests
- splitissue = split issue cycles
- replaytrap = replay traps
- issue1 = cycles with 1 issue
- issue3 = cycles with 3 issues
- issue4 = cycles with 4 issues
- mb = memory barriers
- loadmerged = loads merged (in MAF)
- ldureplay = load/use (ldu) replays
- wbmafreplay = write buffer or maf full replays
- loadlocked = LDx_L instructions
- longstall = stall longer than 12 cycles
- external = external event (system-specific or unused)
Additional event types supported on Alpha 21064/EV4 processors:
- palmode = cycles executing palcode
- pipefrozen = pipeline frozen due to resource conflict
The optional event period follows the event type, and has the format :Mperiod,
where M is a period modifier, and period is the sampling
period. If the event period is omitted, reasonable defaults are automatically
chosen based on the particular event type and the processor hardware.
The period modifier must be R, denoting a random sampling
interval with a mean equal to period events, or F, denoting
a fixed sampling interval equal to period events. If omitted,
the default is to use a random sampling interval on hardware that supports
it, or a fixed sampling interval otherwise.
The sampling period specifies how often the event should be
sampled, expressed as a decimal number. The suffix K can be used
to scale the specified period by 1024.
The period modifier and period specifications are limited
on the Alpha 21064/EV4 processor, which uses a fixed sampling period
(65536 for cycles, issues, and flow, and 4096
for the other events). Later Alpha processors such as the 21164/EV5 have
hardware support for modifying the sampling period and can support arbitrary
fixed periods, as well as randomized periods. Randomization of the sampling
interval helps avoid undesirable synchronization effects with periodic
code execution. Caveat: The current driver implementation restricts
the set of valid randomized periods. For the cycles event, a
valid randomized period must have the form (65536 - 2^n). Future versions
of the driver may allow more flexibility.
Examples
- -slot cycles:R63488+imiss
- Monitor cycle counter events, with a randomized sampling period
whose mean is one sample every 63488 cycles. In addition, simultaneously
monitor imiss events using the default period.
- -slot cycles+imiss:F4096 -slot cycles+dmiss -slot cycles+branchmp
- Always monitor cycle counter events with the default sampling
period. In addition, rotate among gathering imiss, dmiss,
and branchmp events, using a fixed 4K sampling period for imiss,
and the default sampling rates for dmiss and branchmp.
- -slot cycles+imiss -slot cycles+imiss -slot cycles+dmiss
- Always sample cycles with the default sampling period.
Switch between sampling imiss events 2/3 of the time and dmiss events
1/3 of the time, using the default sampling rate for those events.
In this example, one event is repeated across multiple -slot options,
in order to sample it more frequently than the other kinds of events.
Caveats
Alpha aggregate performance counter interrupts are not precise for
events other than cycles and dtbmiss, so a sample for
some other event may not be correctly attributed to the instruction which
generated the event. On 21264/EV6 processors, none of the events are
guaranteed to be precise, but the replaytrap event usually seems to be. ProfileMe performance
counter interrupts are also not precise; but the PC of the profiled instruction
is latched so that the attributions are always correct.
There are only a limited number of hardware performance counters (3
on Alpha 21164/EV5 processors and 2 on all other Alpha implementations),
and each counter can only count a subset of all events. Thus, certain
combinations of events cannot be simultaneously monitored. Consult the
Alpha AXP Architecture Reference Manual by Sites & Witek, Appendix
D, for detailed information about legal event combinations. dcpid uses
a simplistic algorithm to select a counter for each event on the command
line, so the order of the events on the command line can affect whether dcpid finds
a counter for each event. It is better to list events that can be counted
only on a single counter before other events.
When multiplexing events, the cycles event type must always
be monitored, since cycle sample interrupts are used to decide when to
switch to the next multiplexed event type. This switching interval is
controlled by the -mux option (see below).
On the Alpha 21064/EV4 processor, issues counts the total
number of instruction issues divided by 2, and nonissues counts
the total number of nonissues divided by 2.
On the Alpha 21164/EV5, the meaning of the "flow" event is altered
by whether the "branchmp" or "pcmp" events are samples at the same time
as the "flow" event: With "branchmp" sampling, "flow" events happen only
at conditional branches. With "pcmp" sampling, "flow" events happened
only at jsr and ret instructions. (Simultaneous sampling of "branchmp" and "pcmp" events
is not possible, though multiplexed sampling of these events is possible.)
- -mux interval
-
- -I is shorthand for -mux
- For slot multiplexing, switch the events being monitored every interval units
of 64K cycles. The default multiplexing interval is 10; i.e.
the monitored events will be switched about every 640K cycles.
Note: the default multiplexing interval is 100 for Alpha
21064/EV4-based machines. On the 21064/EV4, counter values cannot
be read and restored. During event multiplexing, this means that
the counter values are reset to zero whenever a multiplexing interval expires.
With frequent time-multiplexed switching, this can result in
distortion in the sampling of events. For this reason, it is
recommended that the multiplexing interval not be set
below about 20 for this processor.
IMAGE ASSOCIATION OPTIONS
- -bypid image
-
- -i is shorthand for -bypid
- Store separate profiles for each process that loads the specified executable
image. By default, the profile associated with an executable image contains
aggregated samples for all processes that execute that image. This
option allows samples to be identified by process as well as by image.
The filenames for per-process profiles have the suffix "_PID", where PID
is a local process identifier. This option can be repeated to specify per-process
profiling for multiple executable images.
- -map mapfile
-
- -m is shorthand for -map
- Use specified map file generated by dcpiscan(1) for
associating processes with named images. This option can be repeated, allowing
several map files to be specified; information from all of the supplied
map files is merged.
Dcpiscan is automatically run at installation time to create the
default local map of images found in the usual places (e.g. /usr/bin,
/usr/shlib, et cetera). Note: The default map does not include any
images found in /usr/local. If there are significant images in /usr/local
or other non-standard local directories, you should use dcpiscan to
create a map for those files.
- -forkid seconds
- A hook in the kernel exec path provides information to dcpid about
image loadmaps for statically-loaded processes. The system loader provides
information to dcpid about image loadmaps for dynamically-loaded
processes (unless the user's environment variable RLD_DCPI_DISABLE is
set). Unfortunately, there is no convenient hook for capturing information
about processes that are created via fork(2) which do not subsequently
invoke exec(2).
To obtain loadmap information for such forked processes that are relatively
long-lived, periodic scans of system tables are performed to match unknown
forked processes with information known about their parents. By default,
a scan is performed every 30 seconds. This feature can be disabled by
specifying a scan period of 0 seconds.
- -unknown
-
- -u is shorthand for -unknown
- Store separate per-process profiles for samples that cannot be associated
with any image. Unknown samples will be stored in profiles associated
with 1MB regions of each process address space; these "anonymous" profiles
are given names of the form hostPID@address. If this option is
not specified, a count of all unknown samples is stored in a single profile
named unknown@host.
PROFILE DATABASE OPTIONS
- -epoch
-
- -e is shorthand for -epoch
- Use the most recent existing epoch for storing new profiles. By default,
a new epoch is created each time dcpid is restarted. New epochs
can also be started using dcpiepoch(1).
- -create_epoch
-
- -ce is shorthand for -create_epoch
- Create a new profiling epoch within the named database. Specifying -create_epoch
does not result in a full invocation of the daemon. Instead, the daemon
shuts down immediately once the new epoch has been created. This option
is intended for use in a cluster environment, where it should precede the
launch of daemons on multiple nodes. The daemon invocations which follow
should all specify the -epoch option, with the result being that all profile
data will be written to the same epoch within a centralized database.
- -gc
- By default, dcpid never deletes profile data. If this option
is supplied, dcpid will delete old epochs. It will still keep
at least the three latest epochs, as well as any epoch that was created
within the last seven days.
- -merge seconds
-
- -M is shorthand for -merge
- Merge buffered profile samples from dcpid to the non-volatile
profile database every seconds seconds. Defaults to every 600
seconds (10 minutes).
DRIVER OPTIONS
- -flush seconds
-
- -F is shorthand for -flush
- Flush samples from the performance counter device driver to dcpid every seconds seconds.
Defaults to every 300 seconds (5 minutes). Samples are also automatically
flushed from the driver to dcpid whenever remaining driver buffer
space is low.
- -hash bytes
-
- -H is shorthand for -hash
- Specifies the desired size of the driver hash table data structure in
bytes. The default is 524288 (512K bytes). The driver treats the specified
size as a hint, and may impose additional constraints, such as forcing
the actual size to be a power of two.
- -chunk bytes
-
- -C is shorthand for -chunk
- Specifies the desired chunk size to use when flushing driver hash
table data structure. The default is 16384 (16K bytes). The driver treats
the specified size as a hint, and may impose additional constraints,
such as forcing the actual size to be a power of two.
LOGGING OPTIONS
- -log logfile
-
- -l is shorthand for -log
- Use specified file for logging warnings, errors, debugging information,
and other messages. Defaults to dcpid-host.log in the specified
profile database directory, where host is the local hostname. The log file
is written using append mode, so it is safe to reuse the same log file
across dcpid invocations.
Note: the Unix command tail -f is useful for watching the log
as it is written.
- -quiet
-
- -q is shorthand for -quiet
- Operate in quiet mode, disabling most message logging. By default, dcpid logs
errors, debugging information, and other messages to the specified log
file.
- -verbose
-
- -v is shorthand for -verbose
- Operate in verbose mode, enabling extra message logging.
- -status seconds
-
- -L is shorthand for -status
- Log dcpid status information to the log file every seconds seconds.
The default period is 0 (i.e. disabled).
- -logmaps
-
- -x is shorthand for -logmaps
- Log image loadmap information as it becomes available.
VALUE-PROFILING OPTIONS
- -vprof
- Enables value-profiling, if supported by the driver. Value profiling
is an extension to DCPI data collection that captures register values for
profiled user-mode instructions. Value profiling imposes additional overhead,
with a typical slowdown of approximately 10%.
- -vreplay
- Enables value-profiling for detecting potential replay traps, if supported
by the driver. This value profiling mode captures the PC values for recent
memory operations accessing either (1) the same effective address as
the profiled instruction, or (2) an effective address that would map
to the same 64-byte cache line in a 32-direct mapped cache. Case (1)
identifies instructions that might trigger order and wrong-size replay
traps on the 21264/EV6. Case (2) identifies instructions that might trigger
what is sometime called a "troll trap" or "cache synonym trap" on the
21264/EV6. For more details about replay traps, see the Compiler
Writer's Guide for the Alpha 21264 (Document part # EC-RJ66A-TE)
available from http://h18000.www1.hp.com/products/software/alpha-tools/documentation/current/chip-docs.html
- -vtrace library
-
- -vtrace 'library arg'
- Enables trace value-profiling, if supported by the driver. (Note that library must
be specified as an absolute pathname.) This value-profiling mode allows
you to specify what values are captured, how they are processed before
being merged into the profile files, and how the values are formatted
for printing. library should be a shared library implementing
an interface to perform these functions. See dcpivfilter(1) for
more details on the interface that library is expected to implement.
An optional argument, arg, may be specified for use by the library initialization
routine. However, note that both the library and arg must
occur together as part of the same option string, which can be specified
by using shell quoting conventions, e.g. 'library arg'.
CAVEAT: When this or other forms of value profiling are enabled, dcpid has
to spend considerable memory resources to remember the lists of most
frequent values for all profiled instructions. In the current implementation,
this could cause dcpid to run out of memory on some occasions.
If this occurs, you may want to enable only one value profiling mode
at a time.
- -vkprof
- Enables value-profiling for kernel-mode instructions.
- -vcontext
- Captures additional context information with each value sample.
Currently, the values of the ra register and memory location 0(sp) are
captured to identify the call site associated with each sample.
- -vinterp n
- Specifies that n instructions should be interpreted for
each sample. This is a low-level option that may not be supported
in future releases.
- -vfraction n
- Specifies that interpretation should happen once every n sampling
interrupts. n will be rounded down to the nearest
power of 2.
DYNAMIC ACCESS TO DCPI DATA (DADD) OPTIONS
- -dyn
- Specifies that the daemon is to run in a mode which allows "Dynamic
Access to DCPI Data (DADD)". In this mode, the daemon collects and delivers
performance data for specifically identified processes. The collected data
is not written to a profile database, but is provided interactively (via
shared memory) to client programs which have registered interest in such
data.
- -DF milliseconds
- Flush samples from the performance counter driver to dcpid and write
updated data to the appropriate shared memory regions every milliseconds milliseconds.
Defaults to every 10 milliseconds. Update periods shorter than the default
should only be used when dictated by specific user requirements. In general,
shorter periods are not recommended because the resulting increase in system
overhead can be significant.
For complete information regarding DADD, please see "DCPI/DADD: User's
and Administrator's Guide".
OTHER OPTIONS
- -help
- Print dcpid usage message and then terminate.
- -version
-
- -V is shorthand for -version
- Print dcpid version string.
- -nodaemon
-
- -D is shorthand for -nodaemon
- Do not run dcpid as a daemon. By default, dcpid places
itself in the background, detaches from its terminal, and redirects
all output to its log file.
- -nice priority
- Adjusts the scheduling priority for dcpid. Positive priority values
result in lower scheduling priority; negative priority values
result in higher scheduling priority. See nice(1) for
more information about Unix priority scheduling.
- -socket socket
-
- -s is shorthand for -socket
- Use specified local Unix socket pathname
for incoming messages from client applications
such as dcpiflush(1), dcpiepoch(1),
and dcpiquit(1).
Defaults to /dev/.dcpid0, the default
path used by these client applications and
the loader. This is a low-level option that
may not be supported in future releases.
LIMITATIONS
When collecting data using multiple "-slot"s, each slot must have at least
one event that occurs at least every 2^30 cycles, e.g., retires, cycles,
issues, or profileme mode. Dcpid does not check this requirement.
SEE ALSO
dcpi(1), dcpi2bb(1), dcpi2pix(1), dcpi2ps(1), dcpicalc(1), dcpicat(1), dcpicc(1), dcpicoverage(1), dcpictl(1), dcpidiff(1), dcpidis(1), dcpiepoch(1), dcpiflow(1), dcpiflush(1), dcpikdiff(1), dcpilabel(1), dcpildlatency(1), dcpilist(1), dcpiprof(1), dcpiprofileme(1), dcpiquit(1), dcpiscan(1), dcpisource(1), dcpistats(1), dcpisumxct(1), dcpitar(1), dcpitopcounts(1), dcpitopstalls(1), dcpiuninstall(1), dcpiupcalls(1), dcpivarg(1), dcpivcat(1), dcpiversion(1), dcpivlst(1), dcpivprofiler(1), dcpiwhatcg(1), dcpix(1), dcpiformat(4), dcpiexclusions(4)
For more information, see the DCPI project home page http://h30097.www3.hp.com/dcpi.
COPYRIGHT
Copyright 1996-2004, Hewlett-Packard Company.
All rights reserved.
|
|