Man Page collect.1
NAME
collect - command used for performance data collection
SYNOPSIS
collect collect-arguments target target-arguments
DESCRIPTION
The collect command runs the target process and records per-
formance data, using profiling or tracing techniques, and
global data for the process. The data can be examined with
a GUI program, analyzer, or a command-line program,
er_print. The data collection software run by the collect
command is referred to here as the Collector.
Note: You must have a license to run the GUI program.
The data from a single run of the collect command is called
an experiment. It is represented in the file system as a
directory, with various files inside that directory.
target is the path name of the executable or Java[tm] .jar
or .class file for which you want to collect performance
data. (For more information about Java profiling, see JAVA
PROFILING, below.) Executables that are targets for the
collect command can be compiled with any level of optimiza-
tion, but must use dynamic linking. If a program is stati-
cally linked, the collect command prints an error message.
In order to see annotated source using analyzer or er_print,
targets should be compiled with the -g flag, and should not
be stripped.
The collect command uses the following strategy to find its
target:
o If there is a file with the name of the target that is
marked executable, it is verified as an ELF executable that
can run on the target machine. If the file is not such a
valid ELF executable, the collect command fails.
o If there is a file with the name of the target, and it is
not executable, it is checked to see if it is a Java jar or
class file. If it is, a target name for java is inserted,
with any necessary flags, and data is collected on the
JVM[tm] machine. (The terms "Java virtual machine" and
"JVM" mean a virtual machine for the Java[tm] platform.)
o If there is no file with the name of the target, the
user's path is searched to find an executable; if one is
found, it is verified as described above.
o If no file of the current name is found, the command
looks for a file with that name and the string .class
appended; if found, the target name of java is inserted,
with the appropriate flags, as above.
o If none of these procedures can find the target, the com-
mand fails.
ARGUMENTS
If it is invoked with no arguments, collect prints a usage
message, including the default experiment configuration and
the names of any hardware counters available for profiling.
Data Specifications
-p option
Collect clock-based profiles. The allowed values of
option are:
Value Meaning
off turn off clock-based profiling
on turn on clock-based profiling with the
default profiling interval of 10 milliseconds
lo[w] turn on clock-based profiling with the low-
resolution profiling interval of 100 mil-
liseconds
hi[gh] turn on clock-based profiling with the high-
resolution profiling interval of 1 mil-
lisecond
NOTE: to do high-resolution profiling you
must enable the high-resolution system clock.
See Clock-Based Profiling, below.
n turn on clock-based profiling with a profil-
ing interval of n milliseconds.
If the value is smaller than the system clock
resolution it is set to the system clock
resolution; if it is larger than the system
clock resolution it is rounded down to the
nearest multiple of the system clock resolu-
tion.
If no explicit -p off argument is given, and no
hardware counter profiling is specified, clock-based
profiling is turned on.
-h counter[,value[,counter2[,value2]]]
Collect hardware counter overflow profiles. The
counter name can be either a standard counter name, or
the internal name, as used by cputrack(1). To list both
the standard and the internal names, type collect at
the command line. If the name is an internal name, and
it is trailed by a slash and a single digit (0 or 1),
the event register specified by that digit is used; if
not, whichever event register supports that counter is
used.
The overflow value is the number of events counted at
which the counter overflows and the overflow event is
recorded. It can be set to one the following:
Value Meaning
0 or a null string
Use the default overflow value
hi[gh] Use the high resolution overflow value. The
value h for high resolution is also supported
for compatibility.
lo[w] Use the low resolution overflow value
n Use an overflow value of n
If you specify the optional second counter and second
overflow value, the overflow value for the first
counter must be specified as one of the values listed
above. The two named counters specified must be on dif-
ferent event registers.
An experiment can specify both hardware counter profil-
ing and clock-based profiling. If hardware counter
profiling is specified, but clock-based profiling is
not explicitly specified, clock-based profiling is
turned off.
-s option
Collect synchronization tracing data. The minimum delay
threshold for tracing events is set using option. The
allowed values of option are:
Value Meaning
on turn on synchronization delay tracing and set
the threshold value by calibration at runtime
calibrate same as on
off turn off synchronization delay tracing
n turn on synchronization delay tracing with a
threshold value of n microseconds. If n is
zero, all events are traced.
all turn on synchronization delay tracing and
trace all synchronization events.
Synchronization delay tracing is turned off by default.
Synchronization events are not recorded for Java moni-
tors.
-H option
Collect heap trace data. The allowed values of option
are:
Value Meaning
on turn on tracing of memory allocation requests
off turn off tracing of memory allocation
requests
Heap tracing is turned off by default.
Heap tracing events are not recorded for Java memory
allocations.
-m option
Collect MPI tracing data. The allowed values of option
are:
Value Meaning
on turn on tracing of MPI calls
off turn off tracing of MPI calls
MPI tracing is turned off by default.
-S interval
Collect periodic samples at the interval specified (in
seconds). If interval is zero, no periodic samples are
collected. By default, periodic sampling at 1 second
intervals is enabled. The data recorded in the samples
is data for the process, and includes a timestamp and
execution statistics from the kernel, among other
things.
If no data specification arguments are supplied, clock-based
profiling data, with the default resolution, is collected.
If clock-based profiling is explicitly disabled, and neither
hardware-counter overflow profiling nor any kind of tracing
is enabled, the collect command warns that no function-level
data is being collected. The target is still executed and
global data is recorded.
Experiment Controls
-L size
Limit the amount of profiling and tracing data recorded
to size megabytes. The limit applies to the sum of all
profiling data and tracing data, but not to sample
points. The limit is only approximate, and can be
exceeded. When the limit is reached, no more profiling
or tracing data is recorded, but the experiment remains
open and samples are recorded until the target process
terminates.
The default limit on the amount of data recorded is
2000 Mbytes. To remove the limit, set size to zero.
-F option
Control whether or not descendant processes should have
their data recorded. The allowed values of option are:
Value Meaning
on record experiments on all descendant
processes
off do not record experiments on descendant
processes
Following descendant processes is off by default. See
the section "FOLLOWING DESCENDANT PROCESSES", below.
-j option
Control Java profiling when the target is a JVM
machine. The allowed values of option are:
Value Meaning
on Record profiling data for the JVM machine,
and recognize methods compiled by the Java
HotSpot[tm] virtual machine.
off Do not record Java profiling data.
See the section "JAVA PROFILING", below.
You must use -j to obtain profiling data if the target
is a JVM machine. The version of the JVM machine must
be no earlier than 1.4. Java profiling does not work
for earlier versions of the JVM machine. The -j option
is not needed if the target is a class or jar file. If
you are using a 64 bit JVM machine you must specify its
path explicitly as the target. Do not use the -d64
option for a 32 bit JVM machine. If the -j option is
set, but the target is not a JVM machine, no data is
recorded for the target. The collect command validates
the version of the JVM machine specified for Java pro-
filing.
-l signal
Record a sample point whenever the given signal is
delivered to the process.
-y signal[,r]
Control recording of data with signal. Whenever the
given signal is delivered to the process, switch
between paused (no data is recorded) and resumed (data
is recorded) states. The Collector is started in the
resumed state if the optional ,r flag is given, other-
wise it is started in the paused state. This option
does not affect the recording of sample points.
Output Controls
-o experiment_name
Use experiment_name as the name of the experiment to be
recorded. The experiment_name string must end in the
string .er; if not, an error message is printed, and no
experiment is run.
If -o is not specified, a name of the form stem.n.er is
chosen, where stem is a string, and n is a number. If
a group name has been specified with -g, stem is taken
from the group name by removing the .erg suffix. If no
group name has been specified, stem is set to the
string test.
If the collect command is launched from one of the com-
mands used to run MPI jobs and -o is not specified, the
value of n used in the name is taken from the environ-
ment variable used to define the MPI rank of that pro-
cess. Otherwise, n is set to the lowest integer not in
use.
If the name is not specified in the form stem.n.er, and
the the given name is in use, an error message is
printed, and no experiment is run. If the name is of
the form stem.n.er and the name is in use, the experi-
ment is recorded under a name corresponding the first
available value of n that is not in use. A warning is
printed if the name is changed.
-d directory_name
Place the experiment in directory directory_name. If
no directory is given, the experiment is placed in the
current working directory.
-g group_name
Add the experiment to the experiment group group_name.
The group_name string must end in the string .erg; if
not, an error is reported, and no experiment is run.
Other Arguments
-n Dry run: the target is not run, but all the details of
the experiment that would be run are printed. Turns on
-v.
-R Display the text version of the performance tools
README in the terminal window. If the README is not
found, a warning is printed. No further arguments are
examined, and no further processing is done.
-V Print the current version. No further arguments are
examined, and no further processing is done.
-v Print the current version and further detailed informa-
tion about the experiment being run.
-x Leave the target process stopped on the exit from the
exec system call, in order to allow a debugger to
attach to it.
To attach a debugger to the target once it is stopped
by collect, you must determine the PID of the process,
start the debugger, configure it to ignore SIGPROF (and
SIGEMT, if you chose to collect hardware counter data),
and then attach to the process using the PID. As the
process runs under the control of the debugger, the
Collector records an experiment.
Obsolete Arguments
-a Address space data is no longer supported. If -a is
specified, an error message is printed and the command
exits.
FOLLOWING DESCENDANT PROCESSES
Processes can create descendant processes by calling a sys-
tem library function. The Collector can collect data for
descendant processes initiated by calls to fork(2),
fork1(2), fork(3F), vfork(2), and exec(2) and its variants.
The call to vfork is replaced internally by a call to fork1.
The Collector ignores calls to system(3C), system(3F),
sh(3F), popen(3C), and similar functions, and their associ-
ated descendant processes. If the -F on argument is used,
the Collector opens a new experiment for each descendant
process inside the parent experiment. These new experiments
are named with their lineage as follows. To form the exper-
iment name for a descendant process, an underscore, a code
letter and a number are added to its creator's experiment
name. The code letter is "f" for a fork and "x" for an exec.
The number is the index of the fork or exec (whether suc-
cessful or not). For example, if the experiment name for the
initial process is "test.1.er", the experiment for the child
process created by its third fork is "test.1.er/_f3.er". If
that child process execs a new image, the corresponding
experiment name is "test.1.er/_f3_x1.er".
The Analyzer and er_print do not automatically read experi-
ments on descendant processes. You can explicitly refer to
them, as described above, or you can make a group file from
the original experiment as follows, by first running collect
-F on -g group.erg on the target. If this was the first
run, the original (founder) experiment is recorded as
group.1.er, and a group file containing only the founder
experiment is created. To add all descendant process exper-
iments to the group, type ls -1d group.1.er/*.er >>
group.erg at the command prompt. You can edit the group file
to select a subset of descendant experiments. The experi-
ments in the group are loaded by the Analyzer or by er_print
as for any other group file.
JAVA PROFILING
In this release, Java profiling consists of running an
experiment on the JVM machine as it runs the user's class or
jar files. For all interpreted methods, data is reported for
the JVM machine itself, but for methods compiled with the
Java HotSpot virtual machine, data is reported for named
methods. Data is reported in the usual way for any C, C++,
or Fortran code called by a Java target. Such code
corresponds to Java native methods.
Clock-based profiling and hardware-counter profiling are
supported. Synchronization tracing and heap tracing only
collects data on native calls for memory allocation and
calls to the various synchronization primitives. They do
not record data for Java allocations or for Java monitor
functions.
When collect inserts a target name of java into the argument
list, it examines environment variables for a path to the
java target, in the order JDK_1_4_HOME, then JDK_HOME, then
JAVA_PATH, and finally the user's PATH. For the first of
these that is set, the resultant target is verified as an
ELF executable. If it is not, collect fails with an error
indicating which environment variable was used, and the full
path name that was tried.
Java profiling is not supported for versions of the Java
[tm] 2 SDK earlier than 1.4. Attempts to profile Java code
using earlier versions fail because the java executable for
these versions is a shell script, not an ELF executable, and
the necessary functionality to support Java profiling is not
available.
USING COLLECT WITH MPI
collect can be used with MPI by simply prefacing the target
and its arguments with collect and its arguments in the com-
mand line that starts the MPI job. For example,
% mprun -np 16 a.out 3 5
can be replaced by
% mprun -np 16 collect -m on -d /tmp/mydirectory -g
run1.erg a.out 3 5
to run an MPI tracing experiment on each of the 16 MPI
processes, collecting them all in a specific directory, and
collecting them as a group. The individual experiments are
named by the MPI rank, as described above. The experiments,
as specified above, contain clock-based profiling data,
which is turned on by default, and MPI Trace data.
DATA COLLECTED
Three types of data are collected: profiling data, tracing
data and sampling data. The data packets recorded in profil-
ing and tracing include the callstack of each LWP, the LWP,
thread and CPU IDs, and some event-specific data. The data
packets recorded in sampling contain global data such as
execution statistics, but no program-specific or event-
specific data. All data packets include a timestamp.
Clock-based Profiling
The event-specific data recorded in clock-based profil-
ing is an array of counts for each accounting micro-
state. The microstate array is incremented by the sys-
tem at a prescribed frequency, and is recorded by the
Collector when a profiling signal is processed.
Clock-based profiling can run at normal frequency (10
ms.), high-resolution frequency (1 ms.), or a custom
frequency, specified in milliseconds. For high-
resolution profiling, the operating system on the
machine must be running with a high-resolution clock
routine, which can be done by putting the line:
set hires_tick=1
in the file /etc/system and rebooting. High-resolution
profiles record ten times as much data for a given run
as normal profiles. If you try to set high-resolution
profiling on a machine whose operating system does not
support it, the Collector prints a warning message and
uses the highest resolution supported. Similarly, a
custom setting that is not a multiple of the resolution
supported by the system is rounded down to the nearest
non-zero multiple of that resolution, and a warning
message is printed.
Clock-based profiling data is converted into the fol-
lowing metrics:
User CPU Time
Wall Time
Total LWP Time
System CPU Time
Wait CPU Time
User Lock Time
Text Page Fault Time
Data Page Fault Time
Other Wait Time
For multiprocessor experiments, all of the times are
summed across all LWPs in the process. Total time adds
up to the wall-clock time, multiplied by the average
number of LWPs in the process.
Hardware Counter Overflow Profiling
Hardware counter overflow profiling records the number
of events counted by the hardware counter at the time
the overflow signal was processed.
Hardware counter overflow profiling can be done on sys-
tems that support overflow profiling and that include
the hardware counter shared library, libcpc.so(3). You
must use a version of the Solaris[tm] Operating
Environment that is no earlier that the Solaris 8
Operating Environment. On UltraSPARC[tm] computers, you
must use a version of the hardware no earlier than the
UltraSPARC III hardware. On computers that do not sup-
port overflow profiling, an attempt to select hardware
counter overflow profiling generates an error.
The counters available depend on the specific CPU chip
and operating environment. The list of counters can be
determined by running the collect command with no argu-
ments, which prints out a usage message that contains
the names of the counters. The counters that have
aliases are displayed first in the list, followed by a
list of all counters.
The lines of output for an aliased counter are format-
ted as follows:
CPU Cycles (cycles = Cycle_cnt/0) 9999991 hi=1000003, lo=100000007
In this line, the first field, "CPU Cycles", is the
metric name. The second field, "cycles", gives the
alias name that can be used in the -h counter... argu-
ment. The third field, "Cycle_cnt/0", gives the inter-
nal name as used by cputrack(1) and the register number
on which that counter can be used. The next field is
the default overflow interval, the following field is
the default high-resolution overflow interval, and the
last field is the default low-resolution overflow
interval.
Lines of output for the non-aliased counters are for-
matted as follows:
Cycle_cnt/0 events 1000003 hi=100003, lo=9999991
In this line, the first field, "Cycle_cnt/0", gives the
internal name as used by cputrack(1) and the register
number on which that counter can be used. The string
"Cycle_cnt/0 events" is the metric name for this
counter. The next field is the default overflow inter-
val, the following field is the default high-resolution
overflow interval, and the last field is the default
low-resolution overflow interval.
For counters that count in cycles, the metrics reported
are converted by default to inclusive and exclusive
times, but can optionally be shown as event counts.
For counters that do not count in cycles, the metrics
reported are inclusive and exclusive event counts.
Synchronization Delay Tracing
Synchronization delay tracing records all calls to the
various thread synchronization routines where the
real-time delay in the call exceeds a specified thres-
hold. The data packet contains timestamps for entry and
exit to the synchronization routines, the thread ID and
the LWP ID at the time the request is initiated. (Syn-
chronization requests from a thread can be initiated on
one LWP, but complete on another.)
Synchronization delay tracing data is converted into
the following metrics:
Synchronization Delay Events
Synchronization Wait Time
Heap Tracing
Heap tracing records all calls to malloc, free, real-
loc, and memalign, with the size of the block
requested, its address, and for realloc, the previous
address.
Heap tracing data is converted into the following
metrics:
Leaks
Bytes Leaked
Allocations
Bytes Allocated
Leaks are defined as allocations that are not freed.
If a zero-length block is allocated, it counts as an
allocation with zero bytes allocated. If a zero-length
block is not freed, it counts as a leak with zero bytes
leaked.
MPI Tracing
MPI tracing records calls to the MPI library for func-
tions that can take a significant amount of time to
complete. The following functions from the MPI library
are traced: MPI_Send, MPI_Bsend, MPI_Rsend, MPI_Ssend,
MPI_Recv, MPI_Sendrecv, MPI_Sendrecv_replace, MPI_Wait,
MPI_Waitall, MPI_Waitany, MPI_Waitsome, MPI_Win_fence,
MPI_Win_lock, MPI_Allgather, MPI_Allgatherv,
MPI_Allreduce, MPI_Alltoall, MPI_Alltoallv,
MPI_Barrier, MPI_Bcast, MPI_Gather, MPI_Gatherv,
MPI_Reduce, MPI_Reduce_scatter, MPI_Scan, MPI_Scatter,
MPI_Scatterv.
MPI tracing data is converted into the following
metrics:
MPI Time
MPI Sends
MPI Bytes Sent
MPI Receives
MPI Bytes Received
Other MPI Calls
The MPI Bytes Received metric uses the actual number of
bytes for blocking calls, but uses the buffer size for
non-blocking calls. Metrics that are computed for col-
lective operations such as gather, scatter and reduce
have the maximum possible values for these operations.
No reduction in the values are made due to optimization
of the collective operations.
Sampling and Global Data
Sampling refers to the process of generating markers
along the time line of execution. At each sample point,
execution statistics are recorded. All of the data
recorded at sample points is global to the program, and
does not map to function-level metrics.
Samples are always taken at the start of the process,
and at its termination. By default or if a non-zero -S
argument is specified, samples are taken periodically
at the specified interval. In addition, samples can be
taken by using the libcollector(3) API.
The data recorded at each sample point consists of
microstate accounting information from the kernel,
along with various other statistics maintained within
the kernel.
RESTRICTIONS
The Collector interposes on some signal-handling routines to
ensure that its use of SIGPROF signals for clock-based pro-
filing and SIGEMT for hardware-counter overflow profiling is
not disrupted by the target program. The Collector library
re-installs its own signal handler if the target program
installs a signal handler. The Collector's signal handler
sets a flag that ensures that system calls are not inter-
rupted to deliver signals. This setting could change the
behavior of the target program.
The Collector interposes on setitimer(2) to ensure that the
profiling timer is not available to the target program if
clock-based profiling is enabled.
The Collector interposes on functions in the hardware
counter library, libcpc.so, so that an application cannot
use hardware counters while the Collector is collecting per-
formance data. The interposed functions return a value of
-1.
Hardware counter profiling cannot be run on a system where
cpustat is running, because cpustat takes control of the
counters, and does not let a user process use them.
Java profiling is not supported with versions of the JAVA 2
SDK earlier than 1.4.
Data is not collected on descendant processes that are
created to use the setuid attribute.
Applications that call vfork(2) have these calls replaced by
a call to fork1(2).
SEE ALSO
collector(1), dbx(1), er_archive(1), er_cp(1), er_export(1),
er_mv(1), er_print(1), er_rm(1), libcollector(3), and the
manual
Program Performance Analysis Tools