Description
The Collect utility is a system monitoring tool that
records or displays specific operating system and process
data for a set of subsystems. Any set of the subsystems,
such as File systems, message Queue, ttY, or Header
can be included in or excluded from data collection.
Data can either be displayed back to the terminal or
stored in either a compressed or uncompressed data
file. Data files can be read and manipulated from the
command line or through use of command scripts.
To ensure that the Collect utility delivers reliable
statistics, it locks itself into memory using the page
locking function plock(), and by default cannot be
swapped out by the system. It also raises its priority
using the priority function nice(). However, these
measures should not have any impact on a system under
normal load, and they should have only a minimal impact
on a system under extremely high load. If required,
page locking can be disabled using the -ol command
option and the Collect utility's priority setting can
be disabled using the -on command option.
Some Collect operations use kernel data that is only
accessible to root. System administration practice
should not involve lengthy operations as root, therefore
Collect is installed with permissions set as 04750.
This setting allows group (typically system) members
to run Collect with owner setuid permissions. If this
is inappropriate in your environment, you may reset
permissions to fit your needs.
Automatic starting on reboot
You can configure Collect to automatically start
when the system is rebooted. This is particularly useful
for continuous monitoring. To do this, use the rcmgr command
with the set operation to configure the following
values in /etc/rc.config_:
cariad >rcmgr set COLLECT_AUTORUN 1
A value of 1 sets Collect to automatically start
on reboot. A value of 0 (the default) causes Collect
to not start on reboot.
cariad >rcmgr set COLLECT_ARGS ""
A null value causes Collect to start with the default
values (command options) of:
-i60,120 -f /var/adm/collect.dated/collect -H
d0:5,1w
You may select other values.
cariad >rcmgr set COLLECT_COMPRESSION 1
A value of 1 sets compression on. A value of 0 sets
compression off.
See the rcmgr(8) reference page for more
information.
Playing back multiple data
files
The Collect utility can read multiple binary data
files using the -p option and play them back
as one stream, with monotonically increasing sample
numbers. It is also possible to combine multiple binary
input files into one binary output file, by using the -p option
with the input files and the -f option with
the output file. Note that the Collect utility will
combine input files in whatever order you specify on
the command line. This means that the input files must
be in strict chronological order if you want to do
further processing of the combined output file. You
can also combine binary input files from different
systems, made at different times, with differing subsets
of subsystems for which data has been collected. Filtering
options such as -e, -s, -P,
and -D can be used with this function.
Normalization of data
Where appropriate, data is presented in units per
second. For example, disk data such as kilobytes transferred,
or the number of transfers, is always normalized for
one second. This happens no matter what time interval
is chosen. The same is true for the following data
items:
- CPU interrupts, system calls and context
switches
- memory pages out, pages in, pages zeroed,
pages reactivated, and pages copied on write
- network packets in, packets out, and collisions
- process user and system time consumed
Other data is recorded as a snapshot value. Examples
of this are: free memory pages, CPU states, disk queue
lengths, and process memory.
Data collection interval
A collection interval can be specified using the -i option
followed by an integer, optionally followed (without
spaces) by a comma or colon and another integer. If
the optional second integer is given, this is a separate
time interval which applies only to the process subsystem.
The process interval must be a multiple of the regular
interval. Collecting process information is more taxing
on system resources than are the other subsystems and
is not generally needed at the same frequency. Process
data also takes up the most space in the binary data-file.
Generally, specifying a process interval greater than
1 will significantly decrease the load the collector
places on the system being monitored.
Specifying what data to collect
Use the -S (sort) and -nX (number)
options to sort data by percentage of CPU usage and
to save only X processes. Target specific
processes using the -Plist option,
where list is a list of process identifiers,
comma-separated without blanks.
If there are many (greater than 100) disks connected
to the system being monitored, use the -D option
to monitor a particular set of disks.
Data compression
The Collect utility reads and writes gnuzip format
compressed datafiles. Compressed output is enabled
by default but can be disabled using the -oz command
option. The extension .cgz is appended to
the output filename, unless the -oz option
is specified. Older, uncompressed datafiles can be
compressed using gzip, and the resulting
files can be read by Collect in their compressed form.
Compression during collection should not generate
any additional CPU load. Because compression uses buffers
and therefore does not write to disk after every sample,
it makes fewer system calls and its overall impact
is negligible. However, because the output is buffered
there is one possible drawback. If Collect terminates
abnormally (perhaps due to a system crash) more data
samples will be lost than if compression is not used.
This should not be an important consideration for most
users, as you can specify how often data should be
written to disk.
Specifying a time range from
a playback file
You can select samples from the total period of the
time that data collection ran. Use the -C option
to specify a start time, and optionally, an end time.
The format is as follows:
[+]Year:Month:Day:Hour:Minute:Second.
The plus sign (+) indicates that the time should
be interpreted as relative to the beginning of the
collection period. If any of the fields are excluded
from the string, the corresponding values from the
start time are used in their place as the time-value
is parsed from right to left. Thus, field one is interpreted
as Second, field two (if there is one), as Minute,
and so on. For example, if the collection period is
from October 21, 1999, 16:44:03 to October 21, 1999,
16:54:55, all but minutes and seconds can be ommitted
from the command option: -C46:00,47:00 (from
16:46:00 to 16:47:00). However, if the collection ran
overnight, it is necessary to specify the day as well.
For example, if the period were Oct 21 16:44 to Oct
22 9:30, to specify a period from 23:00 to 1:00, you
must enter the following:
# -C 21:23:00:00,22:1:00:00
General command options
The following command options are useful:
If you want simultaneous text (ascii) output to the
screen while collecting to a file, use the -a option.
The -t option prefixes each data line with
a unique tag. This makes it easier for your scripts
to find and to extract data. Tags are superfluous if
you use the perl script cfilt.
The -T option shuts off collection for
all subsystems except disk, and only displays a total
MB/sec across all disks in the system. Use the -s option
with the -T option to override this behavior
and collect data for other subsystems.
The -R option causes Collect to terminate
after a specified amount of time.
All flags that can reasonably be applied to both
collection and play-back will work. The -Plist filter
option used during collection will collect data only
for the processes you specify. During playback it will
only display data for the corresponding processes.
To save space in the binary data file, you can limit
your collection to specific processes, specific disks,
or specific subsystems. However, if you want to look
at volumes of data and select different chunks at a
time, you should collect everything and later use the
filter options to select data items during playback.
Disk statistics
Note that under certain circumstances the Disk Statistics
may be only approximate. Providing you use the latest
Collect versions and operating system patches, data
is presented for all statistics except %BSY, which
is zero. In this release, ACTQ and WTQ are absolutely
accurate. For older releases of Collect, some data
fields were zero and data in some fields could be inaccurate
under certain circumstances.
Data conversion and filtering
In this release, Collect automatically reads older
datafile versions when playing back files.
You can convert an older Collect version datafile
to the current version using the -p collect_datafile option
with the -f file. During conversion you can
use most command options to extract specific data from
the input collect_datafile. For example:
- Use the -s and -e options to
select data only from particular subsystems.
- Use the -nX and -S options
to take only X processes and sort them by CPU usage.
- Use the -D option to select disks and
the -L option to select LSM volumes.
- Use the -P, -PC, -PU, -PP options
to select processes based on their identifiers.
- Use the -C option to extract data according
to specified start and stop times.
More Collect product details
|