Updated 2002/03/28 |
Forte[tm] Developer 7: Program Performance Analysis Tools Readme |
Contents
- Introduction
- About the Forte Developer 7 Program Performance Analysis Tools
- New and Changed Features
- Software Corrections
- Problems and Workarounds
- Limitations and Incompatibilities
- Documentation Errors
- Required Patches
A. Introduction
This document contains information about the Forte[tm] Developer 7 program performance analysis tools. These tools include commands for collection, manipulation and analysis of program performance data, and a graphical user interface, the Performance Analyzer, for display of performance data. The term Collector is used in this document for the tools that collect performance data and their underlying libraries. These tools are the collect command, the dbx collector subcommands, and the performance data collection features in the IDE.
Note:The Performance Analyzer GUI and the IDE are part of the Forte[tm] for Java[tm] 4, Enterprise Edition for the Solaris[tm] operating environment (SPARC[tm] Platform Edition) versions 8 and 9.
This document describes the new features and software corrections that are introduced in this release and lists known problems, limitations, and incompatibilities. Information in this document updates and extends information in the software manuals.
Information in the release notes updates and extends information in all readme files. To access the release notes and the complete Forte Developer documentation set, go to the documentation index at file:/opt/SUNWspro/docs/index.html.
To view the text version of this readme, type one of the following at a command prompt:
more /opt/SUNWspro/READMEs/analyzer
collect -R
To view the HTML version of this readme, go to file:/opt/SUNWspro/docs/index.html.
Note - If your Forte Developer 7 software is not installed in the /opt directory, ask your system administrator for the equivalent path on your system.
Note - In this document the terms "Java[tm] virtual machine" and "JVM[tm]" mean a virtual machine for the Java[tm] platform.
B. About the Program Performance Analysis Tools
This release of the Forte Developer 7 program performance analysis tools is available on the Solaris operating environment (SPARC Platform Edition) versions 7, 8 and 9, with the exception that the GUI interfaces are not available on version 7. The Performance Analyzer GUI and the IDE are part of the Forte for Java 4, Enterprise Edition for the Solaris operating environment (SPARC Platform Edition) versions 8 and 9.
The Forte Developer 7 program performance analysis tools collect statistical profiles of a program's performance and trace calls to critical library routines, and display the data in tabular and graphical form. The data that is collected is converted into performance metrics. Metrics can be viewed in tabular form at the load object, function, source line or instruction level. The tools provide a means of navigating program structure that is useful for identifying functions and paths within the code that are responsible for resource usage, inefficiencies or time delays. The Performance Analyzer GUI can also display the performance data in a timeline display.
C. New and Changed Features
This section describes the new and changed features of the performance tools since the Forte Developer 6 update 2 release. For information about other Forte Developer components, see the What's New manual. To access this manual on your local system or network, go to file:/opt/SUNWspro/docs/index.html. You can also access this manual by going to http://docs.sun.com.
Data collection features
The following list describes new or changed data collection capabilities.
Clock-based profiling and hardware-counter profiling can now be done simultaneously.
Low-resolution profiling options have been added, to reduce the amount of data collected on executables that run for a long time.
The hardware counter overflow values have been adjusted.
Periodic sampling is available with the collect command, using the -S on option.
An approximate limit to the amount of data collected can be set using collect -L, dbx collector limit, or the Collector tab.
Profiling of Java programs can be done with the collect command. The Collector collects information on the Java[tm] virtual machine and any methods compiled by the Java HotSpot[tm] virtual machine. Java profiling is not available for versions of the Java[tm] 2 Software Development Kit earlier than 1.4.
Experiments on descendant processes can be recorded using the -F on option of the collect command. Performance data can be collected on descendant processes created by calls to fork(2), fork1(2), vfork(2), fork(3F), and exec(2) and its variants. Calls to vfork are replaced internally by calls to fork1. Calls to system(3C), system(3F), sh(3F) and popen(3C), are ignored.
Tracing of memory allocations and deallocations ("heap tracing") can be recorded using the -H on option of the collect command or the dbx collector heaptrace command. Java memory allocations are not traced. Two new er_print commands have been added to support memory allocation metrics, allocs and leaks.
Tracing of MPI calls has been separated from synchronization delay tracing, and now collects more data on the MPI calls that are traced. MPI tracing data can be recorded using the -m on option of the collect command or the dbx collector mpitrace command.
An API for providing information to the Collector about dynamically-compiled functions has been provided.
A Fortran API to the Collector library has been implemented.
The -n option of the collect command has changed its meaning. It does not run the target program, but prints details of the experiment that would have been run.
Address space data collection is no longer supported.
The dbx collector command includes subcommands to record a sample and to control whether dbx records a sample when it stops the target process.
Experiments are no longer kept in a hidden directory. The experiment name is now the name of an ordinary Unix directory. Older versions of the tools can not read newer experiments.
The robustness of data collection has been improved. When an application installs a signal handler, the Collector re-installs its signal handler as the primary handler and passes signals on to the application's handler, so that profiling signals are not lost. The Collector also prevents an application from using the hardware counters if hardware-counter overflow profiling is enabled, and ensures that a application cannot interfere with clock-based profiling.
The text version of this readme is displayed by typing collect -R at the command prompt.
For more information, refer to the collect(1), collector(1) and libcollector(3) man pages.
Data presentation features
Note: You must have a license to use the Performance Analyzer GUI and to read the analyzer(1) man page.
The Performance Analyzer GUI has been completely redesigned and re-implemented in Java. It consists of a menu bar, a toolbar and a panel with a split pane. Each pane contains tabs for the display of data. The left pane contains the Functions, Callers-Callees, Source, Disassembly, Timeline, Statistics, LeakList and Experiments tabs. The right pane contains the Summary, Event and Legend tabs. For a description of the new GUI, see the analyzer(1) man page or the What's New topic in the Performance Analyzer online help. Features that are entirely new are described below.
The Timeline tab shows a graphical representation of all the events recorded in an experiment. It also shows the sample data, which was previously shown in the Overview display. The Event tab shows details of an event that is selected in the Timeline tab, and the Legend tab shows the color coding for the functions in the call stack as displayed in the Timeline tab. The Event tab and the Legend tab are only enabled when the Timeline tab is selected.
The Find tool can be used to search for high-metric lines in the Source and Disassembly tabs, as well as to search for plain text in the Functions, Callers-Callees, Source and Disassembly tabs.
The Experiments tab shows information on each experiment and on the load objects used by the collection target.
Memory allocation and deallocation data is converted into allocation and leak metrics. The event data is shown in the Timeline tab. The data is also available as a list of aggregated events in the LeakList tab of the GUI, or with the leaks and allocs commands of er_print.
MPI tracing data is converted into various MPI metrics. The event data is shown in the Timeline tab.
Preferences for visible metrics, sort metrics and source and disassembly displays can be saved from the Set Data Presentation dialog box.
The handling of load objects in both the Performance Analyzer and er_print has changed. Instead of removing the data from the display for a load object, the data is aggregated and presented for the load object as a whole. The changed presentation applies to the function list and the callers-callees list. The selection of load objects is done in the Show/Hide Functions dialog box in the Performance Analyzer. The er_print commands objects, object_list and object_select have changed as a consequence of the new handling of load objects.
Thresholds for highlighting high-metric lines in annotated source code and annotated disassembly code can be set in the Performance Analyzer using the Set Data Presentation dialog box, and can also be set with two new er_print commands, sthresh and dthresh.
Function and callers-callees data for a single function can be printed in er_print with the fsingle and csingle commands.
Metrics for source lines can be displayed on the interleaved source code in the Disassembly tab.
For more information, refer to the analyzer(1) and er_print(1) man pages and the Performance Analyzer online help.
D. Software Corrections
The following bugs in the Forte Developer 6 update 2 release have been fixed.
Cannot Collect All Synchronization Events With dbx collector Subcommands
er_mv Corrupts Original Experiment When There Is Insufficient Space
Detaching From an Attached Process in dbx Before Closing Experiment Causes Hang
Collection Fails for Processes That Call fork, exec or system
Hardware Counter Profiling of Applications That Use libcpc.so Produce Corrupt Experiments.
Incorrect Behavior of dbx collector Subcommands
The dbx collector data collection and output subcommands address_space, hwprofile, profile, sample, synctrace and store were accepted silently and applied to the next experiment, if any, when an experiment was active. They are now ignored with a warning. (4445393)
Cannot Collect All Synchronization Events With dbx collector Subcommands
The dbx collector synctrace threshold command erroneously reported an error for a zero threshold value. A zero threshold value is now accepted. (4455260)
er_mv Corrupts Original Experiment When There Is Insufficient Space
When er_mv moved an experiment, it deleted files in the source as they were copied. If there was insufficient space in the destination, the experiment was split between the source and the destination, and the source experiment was incomplete. Now er_mv copies the experiment to the destination and only removes the original experiment if the copy is completed. (4421263)
Detaching From an Attached Process in dbx Before Closing Experiment Causes Hang
If you attached dbx to a process, enabled data collection (using collector enable), and then tried to detach from the process before disabling data collection, dbx could not complete any more commands, including the detach command. Now, data collection is disabled and the experiment closed when a detach command is issued. (4456506)
Collection Fails for Processes That Call fork, exec or system
When a process called fork, exec or system, the data collection process failed, with either a corrupted experiment or termination of the process. In this release, the collect command correctly handles calls to the routines fork(2), fork1(2), vfork(2), fork(3F), exec(2) and its variants, system(3C), system(3F), sh(3F) and popen(3C), and continues to collect data on the parent process, and can also collect data on the descendant process. Calls to vfork are replaced internally by calls to fork1. Data collection using the dbx collector commands continues to collect data on the parent process but does not collect data on descendant processes.
Data collection fails if you link statically to libc. (4367629)
Using -d Before -o in collect Produces an Error
Using the -d option before the -o option in the collect command erroneously produced the following error message.
Experiment name may only be set onceThese options can now be specified in any order and produce the correct result.
Hardware Counter Profiling of Applications That Use libcpc.so Produce Corrupt Experiments.
Collecting hardware counter overflow data on an application that uses the library libcpc.so resulted in a corrupted experiment. The collector library now ensures that it has control of the hardware counters and prevents the application from using them. The control is achieved by interposing on calls to functions in libcpc.so and returning a value of -1 for calls from the application, indicating that the counters are in use. A corrupted experiment can still result if dbx is attached to a process that uses libcpc.so. (4434079)
E. Problems and Workarounds
This section discusses known software problems and possible workarounds for those problems. Some problems might be fixed in a patch. For updates, check Forte Developer Hot Product News at http://www.sun.com/forte/fcc/hotnews.html.
Some problems are the result of bugs in the Solaris operating environment and can be fixed by installing the appropriate patches. For further information, see the Required Patches section in this Readme. Some bugs that appear to be in the Analyzer might actually be Collector bugs. Bugs in the compilers and dbx can also affect the Analyzer. Some problems that are not due to bugs are also described in this section.
Bugs in the Performance Tools
Performance Analyzer Cannot Handle More Than 2 GBytes of Data.
Defective experiments are created if _32 or _64 bit variants of LD_PRELOAD and LD_AUDIT are set.
Performance Analyzer Cannot Handle More Than 2 GBytes of Data.
The Performance Analyzer cannot read and process more than about 2 GBytes of data. You can use er_print to analyze an experiment or several experiments whose combined data size is larger than 2 Gbytes. Some other workarounds are to record less data by controlling the collection parameters, collect data on a part of the program instead of the whole program, or set a data limit. See the collect(1), collector(1), and libcollector(3) man pages and the Program Perfomance Analysis Tools manual for more information. (4505739)
Cannot Print Summary Tab or Event Tab Data.
The Performance Analyzer cannot print the data in the Summary tab or the Event tab. To print summary data for a function or a load object, you can use the er_print command. (4286674)
Profiling Fails in Multithreaded Java Applications
The Performance Analyzer and er_print fail when reading experiments recorded for multithreaded Java applications. The data in the experiment is not correctly recorded. (4649137)
Defective experiments are created if _32 or _64 bit variants of LD_PRELOAD and LD_AUDIT are set.
The collect command produces empty experiments if any of the environment variables LD_PRELOAD_32, LD_PRELOAD_64, LD_AUDIT_32, or LD_AUDIT_64 are set. The workaround is to use dbx collector commands for data collection.
Double Counting of Metrics on Parallel Directive Lines
Metrics that are reported on parallelization compiler directive lines in annotated source code are double-counted. The metrics on the source lines in the parallel do, for or section blocks of code are correct. There are also some double counting errors at the function level. (4656193)
Pause/Resume Events Are Not Shown in Timeline
When data collection is paused, no data is recorded, and none is shown in the Timeline tab. However, the Timeline tab should indicate that data collection was paused, rather than lead the user to think no events occurred. (4514519)
Problems That Can Be Fixed With Solaris Operating Environment Patches
The following bugs can be fixed by installing the appropriate patches to the Solaris operating environment. See the Required Patches section in this Readme for more information.
Lost Clock-Based Profiling Data for LWPs
Under some circumstances profiling interrupts (SIGPROF) for one or more LWPs may be lost. When this happens, data displayed does not include thread profile metrics for threads run on those LWPs. This happens most often with unbound threads in the Solaris 7 operating environment. A workaround is to use the alternate threads library in /usr/lib/lwp. (4248299)
Lost Hardware Counter Profiling Interrupts
When a program is running with unbound threads, the interrupt from a hardware counter overflow (SIGEMT) is occasionally lost and cannot be recovered. The workaround is to use bound threads, or if you are running under the Solaris 8 operating environment, to use the alternate threads library in /usr/lib/lwp. The alternate threads library uses bound threads even in support of the unbound-thread APIs. (4352643)
Clock-Based Profiling Inaccuracies on Loaded Systems
Profiling an application when there is a load on the system can result in significant undercount of User CPU time, up to 20%. The missing User CPU time shows up as either System CPU time or as Wait-CPU time. (4509116)
Poor Scalability Past 32 CPUs
An application that uses more than 32 CPUs or threads can run much slower when performance data is being collected. (4273174, 4304367)
Other Problems
Altered Behavior With Applications That Install Signal Handlers
Incorrect Values for Wait CPU Metric in Statistics Display and Samples
Altered Behavior With Applications That Install Signal Handlers
Collecting performance data on an application that installs a signal handler can cause altered behavior of the collector or of the application.
When the collector library is preloaded, the collector's signal handler always re-installs itself as the primary handler, and it passes on signals that it does not use to any other handler. However, because it does not interrupt system calls, an application that installs a signal handler that does interrupt system calls can show changed behavior. In the case of the asynchronous I/O library, libaio.so, which uses SIGPROF for asynchronous cancel operations, asynchronous cancel requests arrive late. (4397578)
If you attach dbx to the application without preloading the collector library, the collector installs its signal handler as the primary handler when collection is enabled. However, any signal handler installed subsequently takes precedence over the collector's signal handler. If this signal handler does not pass on SIGPROF and SIGEMT signals to the collector's signal handler, profiling data is lost.
Data Collection Problems When dbx is Attached to a Process
If you attach dbx to a running process without preloading the collector library, libcollector.so, there are a number of errors that can occur.
You cannot collect any tracing data: synchronization wait tracing, heap tracing, or MPI tracing. Tracing data is collected by interposing on various libraries, and if libcollector.so is not preloaded, the interposition cannot be done.
If the program installs a signal handler after dbx is attached to the process, and the signal handler does not pass on the SIGPROF and SIGEMT signals, profiling data and sampling data is lost. (4397578)
If the program uses the asynchronous I/O library, libaio.so, clock-based profiling data and sampling data is lost, because libaio.so uses SIGPROF for asynchronous cancel operations.
If the program uses the hardware counter library, libcpc.so, hardware-counter overflow profiling experiments are corrupted because both the collector and the program are using the library. If the hardware counter library is loaded after dbx is attached to the process, the hardware-counter experiment can succeed provided references to the libcpclibrary functions are resolved by a general search rather than a search in libcpc.so.
If the program calls setitimer(2), clock-based profiling experiments can be corrupted because both the collector and the program are using the timer.
Incorrect Values for Wait CPU Metric in Statistics Display and Samples
Incorrect values for the Wait CPU metric are sometimes recorded in the sample packets and the global statistics. These values appear in the Statistics tab of the Performance Analyzer and affect the display of samples in the Timeline tab. (4615617)
Lost Clock Profiling Data Over a Small Time Period
Clock profiling data can appear to be lost over a period of several seconds when the system clock is being synchronized with an external source. During this time, the system clock is incremented until it is synchronized. Profile signals are delivered at the set interval, but the time stamp recorded in the profile packets includes any increment that is made between signal deliveries. (4509104)
Data Collection Aborts With a Stack Overflow.
Sometimes the Collector can fail with a stack overflow error. This happens because the Collector uses the application's stack and the stack size for the application is too small to support use by the Collector. The workaround is to increase the stack size by at least 8 Kbytes. See the limit(1) man page for details. For parallel applications that use the multitasking library, the stack size for each thread must also be set using the STACKSIZE environment variable.
Incomplete Experiment When Program Calls exec.
When the program on which performance data is being collected successfully calls exec(2) or any of its variants, the experiment is terminated abnormally. Although the experiment can still be read by the Performance Analyzer or er_print, you should run er_archive(1) for the experiment on the computer on which the data was collected, to ensure that the load objects used by the program were archived correctly.
False Recursion With Tail Call Optimization.
For some functions that make tail-call optimized calls from a shared object (PIC code) and require the determination of the global offset table address in order to reference a global variable, the optimized code is incorrectly reported as recursive, and the real caller is lost. (4656890)
F. Limitations and Incompatibilities
This section discusses limitations and incompatibilities with systems or other software.
Performance Analyzer Requirements
The Performance Analyzer requires the Java[tm] 2 Software Development Kit, Standard Edition, in a version no earlier than 1.4. If you use an earlier version, the Performance Analyzer runs but could fail, not function correctly or perform poorly.
Profiling Java Programs
To collect profiling data on a Java program you must use a version of the Java[tm] 2 Software Development Kit, Standard Edition, no earlier than 1.4.
Hardware-Counter Overflow Profiling
Hardware-counter overflow profiling is not supported on UltraSPARC[tm] processors earlier than the UltraSPARC III series.
Hardware-counter overflow profiling is not supported on versions of the operating environment that precede the Solaris 8 release.
Some early versions of UltraSPARC III hardware do not support profiling based on User DTLB or ITLB misses. They only support TLB counters for kernel-mode.
The Collector cannot collect hardware-counter overflow data if cputrack is running on the system because cputrack takes control of the hardware counters.
Library Interposition
The Collector interposes on various system functions, including signal handling, fork and exec calls, the hardware counter library and some timing functions, to ensure that it can collect valid data. If a program uses any of these functions, its behavior can change. In particular, the profiling timer and the hardware counters are not available to a program when profiling is enabled, and system calls are not interrupted to deliver signals. This behavior affects the use of the asynchronous I/O library, libaio.so, which does interrupt system calls to deliver signals. These interpositions do not take place if you attach dbx to a running process without preloading the collector library, libcollector.so, and then enable data collection.
Finding Source and Object Files
The executable name that is generated when the debugger is attached to a process can be a relative path, not an absolute path, or the path, even though absolute, might not be accessible to the Analyzer. Similar problems can arise with object files loaded from an archive (.a).
The Performance Analyzer searches for the files in the following places, in the order given, until it finds a match.
- It searches for the file inside the experiment.
- It searches for the file using the given path.
- It extracts the basename from the given path (the name following the last "/") and searches for the file in the current working directory, that is, as ./<basename>.
If it does not find the file, it generates an error or warning, showing the path as it originally appeared in the experiment.
To enable the Performance Analyzer to find the source file, you can set up a symbolic link from the current directory that points to the actual location of the file, or you can copy the file into the experiment.
Experiment Incompatibility
The Performance Analyzer cannot load experiments created with versions of the Collector prior to the Forte Developer 7 software release.
Use of setuid
If the process calls setuid or executes setuid files, the Collector can fail to create an experiment due to permission problems.
See the collect(1) man page for more information about restrictions on data collection.
G. Documentation Errors
There is no new information at this time.
H. Required Patches
Some of the problems with the performance analysis tools originate in bugs in the Solaris operating environment. To fix these bugs, you should install the relevant patches. To obtain a list of required patches, you can type the collect command at the command prompt with no arguments. The patches can be downloaded from http://sunsolve.sun.com. If you are using the Solaris 8 operating environment, you should install an update that is no earlier than update 5 before installing patches.
The following problems can be encountered by the Collector and Performance Analyzer when the patches are not installed:
- Programs that use libaio and invoke aio_cancel() abort during data collection with a variety of error messages, including the following:
dbx: Cannot read status for 1@1--No such file or directory dbx: Warning: proc state race condition encountered!Multithreaded executables cause a SEGV during data collection. Sometimes the core dump occurs in the thread library code, and sometimes it occurs in sigacthandler() for the SIGPROF signal.
- Multithreaded executables can fail during collection with various dbx error messages, including those listed under the first bullet and messages reporting the following:
generic libthread_db.so errorMultithreaded executables can fail during collection with a libthread panic relating to a signal fault in a critical section.
Data for multithreaded executables can be missing, because at some point the threads library masks the profiling signal and all subsequent data is lost.
When a multiprocessor application is running with unbound threads, the interrupt from a hardware counter overflow (SIGEMT) is occasionally lost and cannot be recovered.
Under some circumstances profiling interrupts (SIGPROF) for one or more LWPs can be lost. When this happens, data displayed does not include thread profile metrics for threads run on those LWPs. This happens most often with unbound threads in the Solaris 7 operating environment.
An application that uses more than 32 CPUs or threads can run much slower when performance data is being collected.
Copyright © 2002 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms.