Updated 2002/03/28

Forte[tm] Developer 7: Program Performance Analysis Tools Readme

Contents

  1. Introduction
  2. About the Forte Developer 7 Program Performance Analysis Tools
  3. New and Changed Features
  4. Software Corrections
  5. Problems and Workarounds
  6. Limitations and Incompatibilities
  7. Documentation Errors
  8. Required Patches

 


A. Introduction

This document contains information about the Forte[tm] Developer 7 program performance analysis tools. These tools include commands for collection, manipulation and analysis of program performance data, and a graphical user interface, the Performance Analyzer, for display of performance data. The term Collector is used in this document for the tools that collect performance data and their underlying libraries. These tools are the collect command, the dbx collector subcommands, and the performance data collection features in the IDE.

Note:The Performance Analyzer GUI and the IDE are part of the Forte[tm] for Java[tm] 4, Enterprise Edition for the Solaris[tm] operating environment (SPARC[tm] Platform Edition) versions 8 and 9.

This document describes the new features and software corrections that are introduced in this release and lists known problems, limitations, and incompatibilities. Information in this document updates and extends information in the software manuals.

Information in the release notes updates and extends information in all readme files. To access the release notes and the complete Forte Developer documentation set, go to the documentation index at file:/opt/SUNWspro/docs/index.html.

To view the text version of this readme, type one of the following at a command prompt:

   more /opt/SUNWspro/READMEs/analyzer

   collect -R

To view the HTML version of this readme, go to file:/opt/SUNWspro/docs/index.html.

Note - If your Forte Developer 7 software is not installed in the /opt directory, ask your system administrator for the equivalent path on your system.

Note -  In this document the terms "Java[tm] virtual machine" and "JVM[tm]" mean a virtual machine for the Java[tm] platform.

 


B. About the Program Performance Analysis Tools

This release of the Forte Developer 7 program performance analysis tools is available on the Solaris operating environment (SPARC Platform Edition) versions 7, 8 and 9, with the exception that the GUI interfaces are not available on version 7. The Performance Analyzer GUI and the IDE are part of the Forte for Java 4, Enterprise Edition for the Solaris operating environment (SPARC Platform Edition) versions 8 and 9.

The Forte Developer 7 program performance analysis tools collect statistical profiles of a program's performance and trace calls to critical library routines, and display the data in tabular and graphical form. The data that is collected is converted into performance metrics. Metrics can be viewed in tabular form at the load object, function, source line or instruction level. The tools provide a means of navigating program structure that is useful for identifying functions and paths within the code that are responsible for resource usage, inefficiencies or time delays. The Performance Analyzer GUI can also display the performance data in a timeline display.

 


C. New and Changed Features

This section describes the new and changed features of the performance tools since the Forte Developer 6 update 2 release. For information about other Forte Developer components, see the What's New manual. To access this manual on your local system or network, go to file:/opt/SUNWspro/docs/index.html. You can also access this manual by going to http://docs.sun.com.

Data collection features

The following list describes new or changed data collection capabilities.

For more information, refer to the collect(1), collector(1) and libcollector(3) man pages.

Data presentation features

Note: You must have a license to use the Performance Analyzer GUI and to read the analyzer(1) man page.

For more information, refer to the analyzer(1) and er_print(1) man pages and the Performance Analyzer online help.

 


D. Software Corrections

The following bugs in the Forte Developer 6 update 2 release have been fixed.

  1. Incorrect Behavior of dbx collector Subcommands

  2. Cannot Collect All Synchronization Events With dbx collector Subcommands

  3. er_mv Corrupts Original Experiment When There Is Insufficient Space

  4. Detaching From an Attached Process in dbx Before Closing Experiment Causes Hang

  5. Collection Fails for Processes That Call fork, exec or system

  6. Using -d Before -o in collect Produces an Error

  7. Hardware Counter Profiling of Applications That Use libcpc.so Produce Corrupt Experiments.

 

  1. Incorrect Behavior of dbx collector Subcommands

    The dbx collector data collection and output subcommands address_space, hwprofile, profile, sample, synctrace and store were accepted silently and applied to the next experiment, if any, when an experiment was active. They are now ignored with a warning. (4445393)

  2. Cannot Collect All Synchronization Events With dbx collector Subcommands

    The dbx collector synctrace threshold command erroneously reported an error for a zero threshold value. A zero threshold value is now accepted. (4455260)

  3. er_mv Corrupts Original Experiment When There Is Insufficient Space

    When er_mv moved an experiment, it deleted files in the source as they were copied. If there was insufficient space in the destination, the experiment was split between the source and the destination, and the source experiment was incomplete. Now er_mv copies the experiment to the destination and only removes the original experiment if the copy is completed. (4421263)

  4. Detaching From an Attached Process in dbx Before Closing Experiment Causes Hang

    If you attached dbx to a process, enabled data collection (using collector enable), and then tried to detach from the process before disabling data collection, dbx could not complete any more commands, including the detach command. Now, data collection is disabled and the experiment closed when a detach command is issued. (4456506)

  5. Collection Fails for Processes That Call fork, exec or system

    When a process called fork, exec or system, the data collection process failed, with either a corrupted experiment or termination of the process. In this release, the collect command correctly handles calls to the routines fork(2), fork1(2), vfork(2), fork(3F), exec(2) and its variants, system(3C), system(3F), sh(3F) and popen(3C), and continues to collect data on the parent process, and can also collect data on the descendant process. Calls to vfork are replaced internally by calls to fork1. Data collection using the dbx collector commands continues to collect data on the parent process but does not collect data on descendant processes.

    Data collection fails if you link statically to libc. (4367629)

  6. Using -d Before -o in collect Produces an Error

    Using the -d option before the -o option in the collect command erroneously produced the following error message.

    Experiment name may only be set once
    

    These options can now be specified in any order and produce the correct result.

  7. Hardware Counter Profiling of Applications That Use libcpc.so Produce Corrupt Experiments.

    Collecting hardware counter overflow data on an application that uses the library libcpc.so resulted in a corrupted experiment. The collector library now ensures that it has control of the hardware counters and prevents the application from using them. The control is achieved by interposing on calls to functions in libcpc.so and returning a value of -1 for calls from the application, indicating that the counters are in use. A corrupted experiment can still result if dbx is attached to a process that uses libcpc.so. (4434079)

 


E. Problems and Workarounds

This section discusses known software problems and possible workarounds for those problems. Some problems might be fixed in a patch. For updates, check Forte Developer Hot Product News at http://www.sun.com/forte/fcc/hotnews.html.

Some problems are the result of bugs in the Solaris operating environment and can be fixed by installing the appropriate patches. For further information, see the Required Patches section in this Readme. Some bugs that appear to be in the Analyzer might actually be Collector bugs. Bugs in the compilers and dbx can also affect the Analyzer. Some problems that are not due to bugs are also described in this section.

 

 

Bugs in the Performance Tools
  1. Performance Analyzer Cannot Handle More Than 2 GBytes of Data.

  2. Cannot Print Summary Tab or Event Tab Data.

  3. Profiling Fails in Multithreaded Java Applications

  4. Defective experiments are created if _32 or _64 bit variants of LD_PRELOAD and LD_AUDIT are set.

  5. Double Counting of Metrics on Parallel Directive Lines

  6. Pause/Resume Events Are Not Shown in Timeline

 

  1. Performance Analyzer Cannot Handle More Than 2 GBytes of Data.

    The Performance Analyzer cannot read and process more than about 2 GBytes of data. You can use er_print to analyze an experiment or several experiments whose combined data size is larger than 2 Gbytes. Some other workarounds are to record less data by controlling the collection parameters, collect data on a part of the program instead of the whole program, or set a data limit. See the collect(1), collector(1), and libcollector(3) man pages and the Program Perfomance Analysis Tools manual for more information. (4505739)

  2. Cannot Print Summary Tab or Event Tab Data.

    The Performance Analyzer cannot print the data in the Summary tab or the Event tab. To print summary data for a function or a load object, you can use the er_print command. (4286674)

  3. Profiling Fails in Multithreaded Java Applications

    The Performance Analyzer and er_print fail when reading experiments recorded for multithreaded Java applications. The data in the experiment is not correctly recorded. (4649137)

  4. Defective experiments are created if _32 or _64 bit variants of LD_PRELOAD and LD_AUDIT are set.

    The collect command produces empty experiments if any of the environment variables LD_PRELOAD_32, LD_PRELOAD_64, LD_AUDIT_32, or LD_AUDIT_64 are set. The workaround is to use dbx collector commands for data collection.

  5. Double Counting of Metrics on Parallel Directive Lines

    Metrics that are reported on parallelization compiler directive lines in annotated source code are double-counted. The metrics on the source lines in the parallel do, for or section blocks of code are correct. There are also some double counting errors at the function level. (4656193)

  6. Pause/Resume Events Are Not Shown in Timeline

    When data collection is paused, no data is recorded, and none is shown in the Timeline tab. However, the Timeline tab should indicate that data collection was paused, rather than lead the user to think no events occurred. (4514519)

 

Problems That Can Be Fixed With Solaris Operating Environment Patches

The following bugs can be fixed by installing the appropriate patches to the Solaris operating environment. See the Required Patches section in this Readme for more information.

  1. Lost Clock-Based Profiling Data for LWPs

  2. Lost Hardware Counter Profiling Interrupts

  3. Clock-Based Profiling Inaccuracies on Loaded Systems

  4. Poor Scalability Past 32 CPUs

 

  1. Lost Clock-Based Profiling Data for LWPs

    Under some circumstances profiling interrupts (SIGPROF) for one or more LWPs may be lost. When this happens, data displayed does not include thread profile metrics for threads run on those LWPs. This happens most often with unbound threads in the Solaris 7 operating environment. A workaround is to use the alternate threads library in /usr/lib/lwp. (4248299)

  2. Lost Hardware Counter Profiling Interrupts

    When a program is running with unbound threads, the interrupt from a hardware counter overflow (SIGEMT) is occasionally lost and cannot be recovered. The workaround is to use bound threads, or if you are running under the Solaris 8 operating environment, to use the alternate threads library in /usr/lib/lwp. The alternate threads library uses bound threads even in support of the unbound-thread APIs. (4352643)

  3. Clock-Based Profiling Inaccuracies on Loaded Systems

    Profiling an application when there is a load on the system can result in significant undercount of User CPU time, up to 20%. The missing User CPU time shows up as either System CPU time or as Wait-CPU time. (4509116)

  4. Poor Scalability Past 32 CPUs

    An application that uses more than 32 CPUs or threads can run much slower when performance data is being collected. (4273174, 4304367)

 

Other Problems
  1. Altered Behavior With Applications That Install Signal Handlers

  2. Data Collection Problems When dbx is Attached to a Process

  3. Incorrect Values for Wait CPU Metric in Statistics Display and Samples

  4. Lost Clock Profiling Data Over a Small Time Period

  5. Data Collection Aborts With a Stack Overflow.

  6. Incomplete Experiment When Program Calls exec.

  7. False Recursion With Tail Call Optimization.

 

  1. Altered Behavior With Applications That Install Signal Handlers

    Collecting performance data on an application that installs a signal handler can cause altered behavior of the collector or of the application.

    When the collector library is preloaded, the collector's signal handler always re-installs itself as the primary handler, and it passes on signals that it does not use to any other handler. However, because it does not interrupt system calls, an application that installs a signal handler that does interrupt system calls can show changed behavior. In the case of the asynchronous I/O library, libaio.so, which uses SIGPROF for asynchronous cancel operations, asynchronous cancel requests arrive late. (4397578)

    If you attach dbx to the application without preloading the collector library, the collector installs its signal handler as the primary handler when collection is enabled. However, any signal handler installed subsequently takes precedence over the collector's signal handler. If this signal handler does not pass on SIGPROF and SIGEMT signals to the collector's signal handler, profiling data is lost.

  2. Data Collection Problems When dbx is Attached to a Process

    If you attach dbx to a running process without preloading the collector library, libcollector.so, there are a number of errors that can occur.

    • You cannot collect any tracing data: synchronization wait tracing, heap tracing, or MPI tracing. Tracing data is collected by interposing on various libraries, and if libcollector.so is not preloaded, the interposition cannot be done.

    • If the program installs a signal handler after dbx is attached to the process, and the signal handler does not pass on the SIGPROF and SIGEMT signals, profiling data and sampling data is lost. (4397578)

    • If the program uses the asynchronous I/O library, libaio.so, clock-based profiling data and sampling data is lost, because libaio.so uses SIGPROF for asynchronous cancel operations.

    • If the program uses the hardware counter library, libcpc.so, hardware-counter overflow profiling experiments are corrupted because both the collector and the program are using the library. If the hardware counter library is loaded after dbx is attached to the process, the hardware-counter experiment can succeed provided references to the libcpclibrary functions are resolved by a general search rather than a search in libcpc.so.

    • If the program calls setitimer(2), clock-based profiling experiments can be corrupted because both the collector and the program are using the timer.

  3. Incorrect Values for Wait CPU Metric in Statistics Display and Samples

    Incorrect values for the Wait CPU metric are sometimes recorded in the sample packets and the global statistics. These values appear in the Statistics tab of the Performance Analyzer and affect the display of samples in the Timeline tab. (4615617)

  4. Lost Clock Profiling Data Over a Small Time Period

    Clock profiling data can appear to be lost over a period of several seconds when the system clock is being synchronized with an external source. During this time, the system clock is incremented until it is synchronized. Profile signals are delivered at the set interval, but the time stamp recorded in the profile packets includes any increment that is made between signal deliveries. (4509104)

  5. Data Collection Aborts With a Stack Overflow.

    Sometimes the Collector can fail with a stack overflow error. This happens because the Collector uses the application's stack and the stack size for the application is too small to support use by the Collector. The workaround is to increase the stack size by at least 8 Kbytes. See the limit(1) man page for details. For parallel applications that use the multitasking library, the stack size for each thread must also be set using the STACKSIZE environment variable.

  6. Incomplete Experiment When Program Calls exec.

    When the program on which performance data is being collected successfully calls exec(2) or any of its variants, the experiment is terminated abnormally. Although the experiment can still be read by the Performance Analyzer or er_print, you should run er_archive(1) for the experiment on the computer on which the data was collected, to ensure that the load objects used by the program were archived correctly.

  7. False Recursion With Tail Call Optimization.

    For some functions that make tail-call optimized calls from a shared object (PIC code) and require the determination of the global offset table address in order to reference a global variable, the optimized code is incorrectly reported as recursive, and the real caller is lost. (4656890)

 


F. Limitations and Incompatibilities 

This section discusses limitations and incompatibilities with systems or other software.

  1. Performance Analyzer Requirements

  2. Profiling Java Programs

  3. Hardware-Counter Overflow Profiling

  4. Library Interposition

  5. Finding Source and Object Files

  6. Experiment Incompatibility

  7. Use of setuid

 

  1. Performance Analyzer Requirements

    The Performance Analyzer requires the Java[tm] 2 Software Development Kit, Standard Edition, in a version no earlier than 1.4. If you use an earlier version, the Performance Analyzer runs but could fail, not function correctly or perform poorly.

  2. Profiling Java Programs

    To collect profiling data on a Java program you must use a version of the Java[tm] 2 Software Development Kit, Standard Edition, no earlier than 1.4.

  3. Hardware-Counter Overflow Profiling

    Hardware-counter overflow profiling is not supported on UltraSPARC[tm] processors earlier than the UltraSPARC III series.

    Hardware-counter overflow profiling is not supported on versions of the operating environment that precede the Solaris 8 release.

    Some early versions of UltraSPARC III hardware do not support profiling based on User DTLB or ITLB misses. They only support TLB counters for kernel-mode.

    The Collector cannot collect hardware-counter overflow data if cputrack is running on the system because cputrack takes control of the hardware counters.

  4. Library Interposition

    The Collector interposes on various system functions, including signal handling, fork and exec calls, the hardware counter library and some timing functions, to ensure that it can collect valid data. If a program uses any of these functions, its behavior can change. In particular, the profiling timer and the hardware counters are not available to a program when profiling is enabled, and system calls are not interrupted to deliver signals. This behavior affects the use of the asynchronous I/O library, libaio.so, which does interrupt system calls to deliver signals. These interpositions do not take place if you attach dbx to a running process without preloading the collector library, libcollector.so, and then enable data collection.

  5. Finding Source and Object Files

    The executable name that is generated when the debugger is attached to a process can be a relative path, not an absolute path, or the path, even though absolute, might not be accessible to the Analyzer. Similar problems can arise with object files loaded from an archive (.a).

    The Performance Analyzer searches for the files in the following places, in the order given, until it finds a match.

    1. It searches for the file inside the experiment.
    2. It searches for the file using the given path.
    3. It extracts the basename from the given path (the name following the last "/") and searches for the file in the current working directory, that is, as ./<basename>.

    If it does not find the file, it generates an error or warning, showing the path as it originally appeared in the experiment.

    To enable the Performance Analyzer to find the source file, you can set up a symbolic link from the current directory that points to the actual location of the file, or you can copy the file into the experiment.

  6. Experiment Incompatibility

    The Performance Analyzer cannot load experiments created with versions of the Collector prior to the Forte Developer 7 software release.

  7. Use of setuid

    If the process calls setuid or executes setuid files, the Collector can fail to create an experiment due to permission problems.

See the collect(1) man page for more information about restrictions on data collection.

 


G. Documentation Errors 

There is no new information at this time.

 


H. Required Patches

Some of the problems with the performance analysis tools originate in bugs in the Solaris operating environment. To fix these bugs, you should install the relevant patches. To obtain a list of required patches, you can type the collect command at the command prompt with no arguments. The patches can be downloaded from http://sunsolve.sun.com. If you are using the Solaris 8 operating environment, you should install an update that is no earlier than update 5 before installing patches.

The following problems can be encountered by the Collector and Performance Analyzer when the patches are not installed:

 


Copyright © 2002 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms.