PIRL

PIRL.Conductor
Class Conductor

java.lang.Object
  extended by PIRL.Conductor.Conductor
All Implemented Interfaces:
Management

public class Conductor
extends Object
implements Management

Conductor is a queue management mechanism for the sequential processing of data files.

A Conductor processes a list of data files through a list of procedures to be invoked on each file. These lists are obtained from a Database as a pair of tables. The list of files is contained in a Sources table and the procedures are defined in a Procedures table. A pair of Sources and Procedures tables is known as a Pipeline; each Sources record is processed in the order it occurs in the table (FIFO) by each procedure specified by a Procedures record in their sequence number order. A Procedures table may be considered to be a "procedure pipeline" of operations to be applied sequentially to each source file in listed order. Each pipeline has a name that is used to find its database tables where the pipeline tables are named:

<pipeline>_Sources
<pipeline>_Procedures

A Conductor must be run with a command line argument that specifies the pipeline it is to process. Multiple Conductors may safely process the same pipeline from the same or separate host systems.

Database

Conductor requires a database containing the pipeline tables to be accessible. It is accessed using the PIRL Database package. This package abstracts the particulars of database access which may be located on the local system or a remote server, as specified in the Host parameter of the configuration file.

Sources Table

A Sources table must contain at least these fields:

Source_Number
Must be a non-NULL integer value unique for each record. While the order does not matter, it is easiest to make this a field with a value that is automatically assigned and incremented by the database server.
Source_ID
A string that identifies the file in some user specific manner. It is used, along with the Source_Number, to produce what is expected to be a unique log filename. If the field value is NULL or empty then the filename portion of the Source_Pathname, with any extension removed, will be used and this field will be updated.
Source_Pathname
This is the pathname of the file that is to be processed. The pathname is in the syntax of the host system. It is not required that the pathname be fully qualified, only that the file can be found using the pathname. This field value must not be NULL or empty.
Conductor_ID
This is a text field that must be NULL (not just empty) for each record to be processed. It is filled in with the name of the Conductor host system, possibly supplemented by the process ID of the Conductor (see the System Dependencies, Conductor_ID section, below), when the record is acquired for processing. This field provides an exclusive lock against other Conductor processes acquiring the record: Once a Conductor sets this field the record is no longer available for processing.

Status
An indicator of the status of each procedure invoked on the source file is recorded in this field. As each procedure is invoked on the source file this field is kept current with the status of the procedure. The format of the field value is described for the Status_Indicators method which, along with other Status_xxx methods, provides a convenient means for other Java classes to interpret and manipulate these field values. Note: This field should initially be NULL or empty. It is important that this field only be modified consistent with the operation of Conductor (i.e. change at your own risk!).
Log_Pathname
The pathname for the log file where the processing of the source file is recorded. Normally this field is left empty (or NULL) and the Log_Pathname (or Log_Directory) parameter of the Configuration file will be used. If either of these is a pathname that refers to an existing directory, then Conductor will generate an appropriate log filename (see the description of the Log_Directory and Log_Filename parameters, below). If this field is empty and both parameters are empty or not present, the log file will be written to the current working directory (the Log_Directory parameter is empty) However, if either this field or the Log_Pathname (or Log_Directory) parameter is not empty and does not refer to an existing directory then the pathname is to a regular file then that file will be appended with the source file log (a new file will be created if it does not yet exist). Both this field value and any parameter used will be reference resolved if not empty. N.B.: If a pathname to an existing log file pathname is not specified by this field or a configuration parameter then any existing file at the generated pathname is overwritten. This field is always updated by whatever actual log file pathname is used.

A Sources table may contain the required fields in any order, and it may contain additional fields as desired. For example, it is strongly recommended that a timestamp field be provided that will automatically be updated with the last update time of each record.

Procedures Table

A Procedures table must contain at least these fields:

Sequence
A real number value that orders the procedure in the sequence of processing the source file. The values need not be sequential nor must they be in any particular order in the table. All of the procedure records will be sorted numerically on this field value so that processing of the source file will occur in sequence order. It is strongly recommended that the values be unique in the table, but this is not required; however there is no certainty of the order of processing for procedures with the same sequence number.
Command_Line
The command line to be submitted to the system for executing the procedure. The command line may contain embedded field and/or parameter references to be substituted with database field values and/or configuration parameter values. Obviously this field must be neither empty nor NULL.
Success_Status
An integer value that matches the exits status of the procedure when the procedure has completed successfully. The value can be text that is subject to embedded reference resolving, but must ultimately be convertible to an integer value. If this field is NULL or empty and the Success_Message field is not, then the latter field is used instead. If both fields are empty then the Empty_Success_Any parameter control how this will be interpreted. This field may contain text that is reference resolved but must produce an integer value.
Success_Message
Text to match on the procedure output lines - either stdout or stderr - to determine if the procedure completed successfully. This field is only used if the Success_Status field is empty or NULL. The text in this field is reference resolved. The match against the lines of procedure output treats the resolved text as a regular expression pattern.
Time_Limit
The maximum amount of time, in seconds, to wait for the procedure to complete. The value can be text that is subject to embedded reference resolving. After resolving any references the result is treated as a mathematical expression that must produce a single integer value. A 0 (zero) or negative value indicates an unlimited wait time. An empty or NULL value is equivalent to 0. If the procedure does not complete within the specified amount of time it is killed.
On_Failure
A command line just like the Command_Line field. If the procedure defined by the Command_Line field fails to complete successfully, then the procedure defined by the On_Failure field is run. No time limit is applied to this procedure.

A Procedures table may contain the required fields in any order, and it may contain additional fields as desired. If a "Description" field is present it is used as a text description of each procedure included in the processing log. It is also recommended that a timestamp field be provided that will automatically be updated with the last update time of each record.

Database Connection Resilience

A Conductor maintains a connection with the Database server while it is operating. If any access to the Database by Conductor fails due to a loss of the connection, Conductor will attempt to reconnect and, if successful, repeat the access operation again (a connection failure of the repeated access operation does not result in a reconnection attempt). If the reconnection fails because the connection can not be established with the server, the attempt will be retried after a delay period of 5 minutes (this can be overridden with the RECONNECT_DELAY_PARAMETER). Up to 16 retries (this can be overridden with the RECONNECT_TRIES_PARAMETER) will be attempted before a database access failure is deemed to have occurred. N.B.: Database connection resilience does not automatically apply to any database access operations by pipeline procedures.

Configuration

When a Conductor is started it first reads its Configuration file. This is "Conductor.conf" by default, but another filename may be specified on the command line. The configuration file contains parameters definitions in Parameter Value Language (PVL) format. This file contains the information needed to access the database used by Conductor as well as any other parameters that may be useful to resolve parameter references embedded in field values. Since references may contain nested references it is quite appropriate for users to provide configuration parameters with values that are database field references (perhaps with complex conditionals and multiple field combinations) so that Command_Line (for example) definitions use the user specified parameter references rather than the more complicated definitions. This also makes it easy to modify the field reference definitions, just by editing the configuration file, without necessarily needing to change the contents of a Procedures pipeline table.

N.B.: By default, when the configuration file is read during startup by the application's main method should any parameters have the same pathname the last duplicate encountered is given preference. This is especially important to keep in mind when the configuration file includes another configuration file, such as a site-wide configuration. For example, a site-wide configuration file included in all pipeline-specific configuration files (a typical scenario) might have a Conductor group that includes default parameters such as Stop_on_Failure with a small value (e.g. 1 or 2) to prevent a bug in a pipeline procedure from generating a large number of source processing failures, while a configuration file for a specific pipeline that is expected to have failures (which may actually be branches off to some other pipeline depending on the outcome of some condition testing procedure) might have a Conductor group that includes a Stop_on_Failure with a large (or possibly zero) value. As long as the site-wide configuration file is included before the pipeline-specific Conductor/Stop_on_Failure parameter is specified the latter will take precedence over the former.

Parameters

For convenience, a set of parameters are automatically provided in the configuration by Conductor when it runs:

Configuration_Source
The name of initial configuration file source. This may be the name of a file on the Conductor host system or a URL for a file obtained remotely.
Conductor_ID
The Conductor identification (see the System Dependencies, Conductor_ID section, below).
Database_Server_Name
The name of the database Server configuration parameters group. If no Server name could be determined, this parameter will not be included.
Hostname
The name of the host system where Conductor is running.
Database_Hostname
The hostname of the database server system.
Database_Type
The type of database server, as known to the Database access package.
Pipeline
The simple pipeline name (without any catalog prefix).
Catalog
The name of the database catalog where the pipeline tables are located.
Sources_Table
The full name of the table, including the Catalog prefix, containing the Source file records.
Procedures_Table
The full name of the table containing the procedure definitions.
Require_Stage_Manager
During construction a Conductor will attempt to establish a communication connection with a Stage_Manager. If this fails and the value of this parameter has been set to "True" or "Yes" (case insensitive) an IOException will be thrown; otherwise the Conductor will proceed without an Stage_Manager.

Dynamic source parameters

The following parameters are reset for each source record being processed:

Log_Directory
The pathname to the directory where the log file is written. If a Log_Pathname field value is present it is used to determine the Log_Directory, but only for the current source record. Otherwise the Log_Pathname configuration parameter is used. If it is not present or empty the Log_Directory parameter is used instead. The default Log_Directory is Conductor's current working directory.
Log_Filename
The filename (without the directory path) of the source log file. If the neither the source record Log_Pathname field nor the Log_Pathname or Log_Directory parameters has a non-empty value that does not refer to an existing directory - a value that does not refer to an existing directory is taken to be the pathname to the log file - then a default filename will be generated that has the form:
<Pipeline>-<Source_ID>_<Source_Number>.log

The Pipeline name includes the leading database catalog name separated by a period ('.') character.

The Source_ID and Source_Number are obtained from the current source record. Note, however, that there is a chance that the Source_ID will include characters that are unsafe for use as part of a filename. Assuming that the only unsafe character is the system property "file.separator" character ('/' for Unix), it will be replaced with a percent ('%') character.

Note: If either the source record field or configuration parameter value is not empty and does not refer to a an existing directory, then that value will be used unconditionally to determine the log filename.

Source_Number
The Source_Number field value of the current source record.
Source_ID
The Source_ID field value of the current source record.
Source_Pathname
The Source_Pathname field value of the current source record in the filesystem's fully qualified (absolute) form.
Source_Directory
The directory path portion of the Source_Pathname value.
Source_Filename
The filename portion (without the directory pathname) of the Source_Pathname value.
Source_Filename_Extension
The portion of the Source_Filename value following the last period ('.') character in the name. This will be the empty string if there is no extension.
Source_Filename_Root
The portion of the Source_Filename value without the extension (the portion preceding the last period character). This may be the empty string.

Dynamic procedure parameters

The following parameters are reset for each procedure record being processed:

Total_Procedure_Records
The total number of procedure defintion records in the procedures table. Note: This a a dynamic parameter because the procedures table is refreshed each time pipeline processing is started and the table may have been changed while the Conductor was waiting.
Procedure_Count
The procedure definition record count for the current, or last, procedure sequence. The first procedure definition record has a count of one. A Procedure_Count of zero means that processing for the current source record has not yet commenced.
Sequence
The Sequence field value of the current, or last. procedure record.
Completion_Number
The completion number for the last procedure that was executed. If the procedure ran to completion, whether it was successful or not, it will be the exit status value (a non-negative integer value) of the procedure. If the procedure did not complete for any reason it will be a negative Conductor completion code; the Status_Conductor_Code_Description static method may be used to obtain a brief, one line description of this code.

Conductor control parameters

The following configuration parameters will be used if they are present:

Unresolved_Reference
The value to use for an unresolved reference. By default an unresolved referenced throws a Database_Exception. This can be specified with a value beginning with the word "throw" (case insensitive). Note: All parameters used by Conductor are reference resolved. Those with unresolved references that would throw an exception are deemed to be missing parameters. Those values that have incorrect reference syntax are left unresolved.
Empty_Success_Any
If "true", when Success_Status and Success_Message are both empty or NULL the corresponding procedure is always deemed successful when it completes. Otherwise this condition implies a zero (0) Success_Status. Default: false.
Max_Source_Records
The maximum number of source records to obtain at any one time from the database. This prevents memory exhaustion if the number of unprocessed source records is very large.
Poll_Interval
The amount of time (in seconds) to wait before trying to obtain more unprocessed source records when querying the source table found no unprocessed records. If this value is zero or negative Conductor processing will stop instead of waiting to try again. Default: 30.
Source_Available_Tries
The number of tries that will be made to confirm that the Source_Pathname is accessible - i.e. exists as a regular file that can be read - before giving up and declaring the file to be inaccessible. After each accessibility check failure, and before the next try, a ten second pause will be provided. The intention is to give filesystem directory caches time to be synchronized with newly created files on remote filesystems. A maximum of 90 retries (15 minutes wait time) is allowed to prevent Conductor from waiting indefinitely. Default: 3.
Reconnect_Tries
The maximum number of times to reconnection tries if the database connection is lost. Default: 16.
Reconnect_Delay
The delay, in seconds, between reconnection retry attempts. Default: 300.
Stop_on_Failure
The number of sequential source processing failures that will cause Conductor to stop further processing. Zero means source failures will never cause processing to stop. "True" or "Yes" is equivalent to 1; "False" or "No" is equivalent to 0. Default: Set by the DEFAULT_STOP_ON_FAILURE static constant (0);
Notify
A list of zero or more email address that will be sent a notification if Conductor processing unexpectedly halts. The reason processing halted will be in the email message.
Monitor_Width
The width of the Monitor pane (pixels). Default: 700.
Monitor_Height
The height of the Monitor pane (pixels). Default: 550.
Monitor_Location_X
The initial horizontal screen position of the Monitor window. Default: 300.
Monitor_Location_Y
The initial vertical screen position of the Monitor window. Default: 100.
Monitor_Max_Lines
The maximum number of lines retained in the scrolling text pane. These lines may not contain more characters than specified by Monitor_Max_Characters. If the value is less than or equal to zero there is no limit on the number of lines. Default: 8192.
Monitor_Max_Characters
The maximum number of characters retained in the scrolling text pane. These characters may not contain more lines than specified by Monitor_Max_Lines. If the value is less than or equal to zero there is no limit on the number of characters. Default: 2097152.
Splash_Screen
While Conductor is starting in monitor mode it will, by default, display a spash screen. To disable this feature set the Splash_Screen value to "false" or 0. The value may also be set to the minimum number of seconds that the splash screen should remain visible; not less than three seconds (unless 0) is allowed.

Processing

Procedure Records

After Conductor has been initialized using its configuration file and connected to the database it begins to process the pipeline. Note: If Conductor is run with a GUI monitor (using the -Monitor command line option), processing does not begin until the Start button is pressed. The records from the Procedures table are read, confirmed that they contain all necessary fields, and sorted into Sequence number order. In monitor mode pipeline processing may be suspended by pressing the Stop button - which will change to say Wait until the current source record processing is complete, and then it will change to say Start - and resumed by pressing the Start button again. Each time pipeline processing is started the Procedures records are read again, but otherwise changes to the Procedures records can safely be made while Conductor is running.

Source Records

When pipeline processing has been started Conductor begins processing records from its Sources table. All unprocessed records, up to a maximum (set by default to 1000 to prevent memory exhaustion) configurable with the Max_Source_Records parameter, are read into an internal cache. An unprocessed record has a NULL in its Conductor_ID field. These records are processed in first-in first-out (FIFO) order.

An exclusive lock must be acquired on a record before it can be processed. To acquire a lock on a source record an attempt is made to update the record's Conductor_ID field to the Conductor's identification value (hostname and possibly process ID) with the condition that the field value is currently NULL. The update operation by the database server is atomic; once the operation is started by the database server it will go to completion without the possibility of interruption by any other database operation. This guarantees that only one process will be able to gain access to any source record even in the context of multiple processes contending for the same record at the same time. If some other Conductor has already acquired the record, as indicated by the failure of the update operation (because the Conductor_ID field is no longer NULL as required), the record will be removed from the cache and the Conductor will try to acquire a lock on the next record in the cache. If the update succeeds then the Conductor has acquired exclusive control of the record. It will be safe to process the source without concern that some other process may interfere.

The first step in processing a source record is to open a log file to record the processing. If the source record Log_Pathname field is empty the user's Log_Pathname configuration parameter will be used. If that is absent or empty the Log_Directory parameter will be used. If that, too, is absent or empty a default filename will be generated - as described in the Log_Filename parameter description, above - and the log file will be written the current working directory. If either the source record field or configuration parameter is not empty the value is resolved for any embedded references. If the resulting pathname refers to an existing directory the default filename is added. Otherwise the pathname is taken to refer to a file to which processing log output is to be appended (the file will be created if it does not exist). : In all other cases - whenever a default filename is generated - an existing file will be overwritten. The log file pathname that is used is always updated to the source record Log_Pathname field. Note: A source record will not be processed without a log file; it is fatal to Conductor not to have a writable log file available for each source record.

The log file always begins with the Conductor class identification:

PIRL.Conductor.Conductor (2.20 2008/11/01 05:33:06)

This line is immediately followed by a SOURCE_FILE_LOG_DELIMITER line which is expected to be 70 equals ('=') characters. This is followed by a date and time stamp and the source record description including the database server type and hostname, the fully qualified name of the Sources Table (Sources_Table) and the Source_Number, Source_ID, and Source_Pathname values.

The Status field of the source record is checked for any status indicators from possible previous processing. If present they are logged. If the last status indicator is for a failure condition this is logged and any further processing of the source is skipped. Note: It is possible to (re)start processing of a source midway in its procedures pipeline. This is done by first setting its Status field to include status indicators for procedures to be skipped; e.g. by removing a failure indicator at the end of the list after correcting the cause of the failure. Then the Conductor_ID field is set to NULL (as long as this is done last it will be safe even if the actively being processed by a Conductor). When the source record is acquired by a Conductor its processing will begin with the next procedure without a status indicator, and log output will be appended to its previous log file if present or a new file if needed. Caution: Procedure pipelines to be used in this way must, of course, ensure that any dependencies on previous procedures are taken into account.

At this point configuration parameters dependent on the current source record are updated.

The Source_Pathname file is confirmed to be a normal file (not a directory) that is readable. If the file is not accessible the Status field of the source record is updated in the database with the INACCESSIBLE_FILE Conductor completion code, the Completion_Number parameter is set to the same value, the condition is logged and further processing of the source record is canceled.

Procedures Pipeline

The source now enters the procedures pipeline. The procedure definition records are all cached and sorted by their Sequence number when Conductor first starts. Thus changes to a Procedures table will not take effect until after a Conductor (re)starts, and it is safe to change a Procedures table while a Conductor is running.

Each procedure record is applied to the current source record in Sequence number order; the Sequence parameter is updated before each procedure is processed.

Embedded References

All of the required Procedure fields, except the Sequence number, may contain embedded references. Each embedded reference is effectively a variable that is substituted with the value from a database field or configuration parameter specified by the reference. References may be arbitrarily nested; for example, the condition for selecting a record in a database field reference may be a parameter reference supplied in a configuration parameter. Reference resolution is also recursive; the value obtained from resolving a reference may itself contain embedded references. Thus a parameter reference may resolve to a parameter value that contains references. This allows the values of database fields to contain references to user defined parameters that are set as desired in the configuration file without needing to change the contents of database tables to effect the change.

Reference resolved values in Procedure fields allow dynamic definition of procedure attributes. References that are unresolved are fatal to Conductor unless the Unresolved_Reference configuration parameter has been set to a substitute string (e.g. ""). References that have incorrect syntax (e.g. unbalanced curly brace enclosures) are always fatal to Conductor.

Procedure Execution

The Command_Line value is reference resolved and parsed into an initial command name and command arguments. An empty or NULL Command_Line is fatal. Before each procedure is run the log file is written with the PROCEDURE_LOG_DELIMITER line. This is followed by a date and time stamp, the Sequence number, the Description field value (if it is not empty), and then the command line to be executed.

The command name and arguments are passed to the Java Runtime for execution as a Process by the host operating system. Note: the command is not run in a shell. It is, however, quite appropriate to run shell, or any other interpreted language, scripts (e.g. PERL). The only restriction on the procedure to be run is that it is accessible and executable. If the procedure can not be executed for any reason the source record's Status field value is appended with Conductor's NO_PROCEDURE error status. Otherwise the Status field is updated with the host system Process ID for the executed procedure; this is always an integer value greater than 1 that uniquely identifies the executing procedure in the host operating system.

All standard output from the procedure is copied into the log file with an annotation before each line that indicates whether the source is the procedure's stdout or stderr streams. Because these are separate streams read by asynchronous threads attached to each process stream there can be no guarantee of the relative logging order of lines from the two sources; while each stream is always logged in the order in which the procedure output to it, the uncertainties of system stream buffering and thread scheduling are likely to result in lines from stdout appearing in the log before or after where they might occur relative to stderr lines appearing in a shell terminal listing.

Conductor waits for the procedure to complete before proceeding. However, it will not wait longer than the number of seconds from the Time_Limit field (which may have embedded references and may be a mathematical expression). If the value is NULL, empty or zero then there is no limit to the amount of time Conductor will wait for the procedure to complete. It is generally a good idea to place a maximum running time limit on any procedure that could become "hung" (for example in a loop or on an inaccessible). If the time limit is reached Conductor will destroy the procedure. This is done by sending the procedure a terminate signal (SIGTERM). This signal can be caught by the procedure so it has an opportunity to clean up open files or child processes of its own. For scripts that have launched long running computational programs it is correct practice to catch the terminate signal and halt these programs; failure to do so is likely to leave these child programs running as orphans. If a procedure does not catch the terminate signal it will be automatically terminated by the operating system. If Conductor must terminate a procedure due to a timeout the source record's Status field will be updated with Conductor's PROCEDURE_TIMEOUT error status and the log will be written with notice of the timeout. If the procedure completes normally, then the standard output streams are drained and copied to the log file and the exit status from the procedure is also noted in the log file.

Procedure Status

When the procedure execution is done the Completion_Number parameter is updated. This will be a negative value if the procedure did not run to completion (could not be executed or exceed the Time_Limit), otherwise it will be the procedure's exit status value.

When a procedure completes normally Conductor uses either the Success_Status or Success_Message field values to determine if the procedure completed successfully. Usually the exit status is set by the procedure to a value that indicates if it succeeded. However, it may be necessary to examine the output of the procedure if the exit status is not reliable. There may also be unfortunate cases where there is no reliable indicator and all that can be done is assume that because the procedure completed it was successful.

If neither the Success_Status or Success_Message field values has been set to a non-empty value and an Empty_Success_Any configuration parameter was found with a "true" value then the procedure success of the proedure is implied (i.e. in this case the procedure is always successful if it completes normally); otherwise the Success_Status value is asserted to be "0".

If the Success_Status field is not empty it is reference resolved. The result is evaluated as a logical expression and if a result is obtained it determines if the procedure.was successful. Typically, the expression uses a reference to the Completion_Number parameter. The logical operators &, |, ~, =, <, >, <>, <=, >= may be used. The words "and", "or", and "not" can be used in place of &, |, and ~. Caution: Use &, not &&; |, not ||; ~, not !; =, not ==; and <>, not !=. A logical expression may contain embedded numeric expressions as well.

If the Success_Status does not contain a valid logical expression it is evaluated as a numeric expression. If the result, cast as an integer value, is equal to the procedure's exit status, then the procedure succeeded; otherwise it failed. A numeric expression may simply be a constant value (the symbols "pi" and "e" are recognized as constants) or may use the +, -, *, /, ^ operators; ** may be used instead of the ^ exponentiation operator. The tertiary operator ? with : may be used following an embedded logical expression such that if the logical expression is true then the following value before the : is used, else the value after the : is used (e.g. (4<5)?1:2 evaluates to 1). The functions sin, cos, tan, cot, sec, csc, arcsin, arccos, arctan, exp, ln, log2, log10, sqrt, cubert, abs, round, floor, ceiling, trunc may also be used with their argument following inside parentheses.

The Success_Message, if not empty, is used if the Success_Status is empty. It is referenced resolved and then matched, as a regular expression, against what was obtained from the procedure's stdout and stderr. If there is a match with either output, then the procedure succeeded; otherwise it failed. A resolved Success_Message value that does not produce a valid regular expression is fatal to Conductor. Note: Regular expressions are very powerful expression matching syntax similar to that used by PERL, but also can be daunting to the beginner.

Regardless of the outcome of procedure execution the Status field of the source record is updated in the database with the procedure's status indicator. This indicator always includes the Conductor completion code which can be translated into a descriptive line of text by the Status_Conductor_Code_Description static method. If the procedure completed with an exit status that value is included in the status indicator. The meaning of this value is procedure dependent. Of course the log file is also annotated accordingly.

On Failure

When Conductor determines that the procedure completed successfully it repeats the procedure execution operation with the next procedure in the pipeline Sequence. If Conductor determines that the procedure did not complete successfully, then it resolves any embedded references in the On_Failure field value and uses that as a command line for a procedure to be executed. This procedure is executed without any time limit. In the log file, where a normal procedure would have a PROCEDURE_LOG_DELIMITER the On_Failure procedure has an ON_FAILURE_PROCEDURE_LOG_DELIMITER and no Description. Although the completion status of this procedure is not included in the final status indicator of the source record's Status field it is noted in the log file.

When the number of sequential source processing failures reaches the Stop_on_Failure amount further processing is halted after the On_Failure procedure has been run.

When operating in Monitor mode the default is to stop processing on each failure and wait for the user to restart processing. In batch mode, however, Conductor will send an email message to the Notify list reporting this condition.

Sources Completion

The completion of the last of the Procedures in pipeline sequence, or the first On_Failure procedure, completes the processing of a source record. The log file is now closed. While there is another source record in the cache Conductor will continue trying to acquire an exclusive database lock. Once the cache is exhausted Conductor refreshes it from the Sources table with any new unprocessed records. If no unprocessed records are available then Conductor will sleep for the number of seconds indicated by the Poll_Interval configuration parameter. If no Poll_Interval parameter is present the default interval of 30 seconds is used. If the interval is less than or equal to 0, then Conductor processing will stop when it can find no source records to process.

System Dependencies

Java

Conductor is known to compile and run correctly with Java 1.4 and 1.5. Java was chosen for the implementation to maximize portability: as long as the host system provides a standard Java environment it should be able to run Conductor.

Process Patch

Note: The implementation of the abstract Process class distributed with the Java foundation runtime classes does not provide access to the host system process ID nor allow limiting the wait time for the process to complete. Conductor is distributed with modified versions of the java.lang.Process and java.lang.UNIXProcess class implementation code in the Conductor/Process.patch subdirectory that corrects this deficiency by providing the Process.ID() method and a Process.waitFor(int) method that includes the timeout argument. In order for Conductor to run, these classes must be updated with the modified versions in the Java Foundation Classes (JFC) jar file used by the Java Virtual Machine (JVM) on the host system. The Conductor/Process.patch/README file describes what needs to be done. While applying the Process patch is very simple it is likely to require assistance from a systems administrator with the appropriate system permissions. The modifications only provide access to information already present in Sun's implementation; they do not in any way affect or alter the original functionality nor have any effect on JVM security.

Conductor_ID

The Conductor_ID value will include the process ID of Conductor if it is available. Obtaining the process ID (PID) of Conductor - i.e. the Java Virtual Machine (JVM) that runs the Java classes - requires using a Java Native Interface (JNI) to the host system function that provides this information. Though this is trivial to implement it is outside the pure Java implementation of Conductor. Without the JNI code the Conductor_ID will only be the hostname of the system on which Conductor is running. With the JNI code the Conductor_ID will include the JVM PID after the hostname separated by a colon (':') character. The availability of the JVM PID will not have any effect on Conductor's operation. However, it is quite useful to have the JVM PID for procedures to use in disambiguating filenames in a parallel processing shared storage environment, and it can assist in systems administration work. A Native_Methods.c file that provides JNI access to the required system function is included in the source code distribution of Conductor. When the Conductor source code is compiled Native_Methods.c is also compiled to produce a dynamically loadable Native_Methods.so (or .jnilib on Apple OS X/Darwin systems) shared object library file in the Conductor/. subdirectory, where is the name of the host operating system (e.g. Darwin, FreeBSD, Linux or SunOS) and is the host system hardware architecture (e.g. i386, powerpc, x86_64 or sparc). N.B.: The library file must be copied to a location on the host system where it can be found by the dynamic linker (e.g. /usr/local/lib) when Conductor runs. The JNI library file can be built separately from the Java class files - if, for example, multiple operating systems and/or architecures are being used - by running "make jni" (GNU make may be named gmake) in the Conductor source code directory. The Native_Methods.c file requires the $(JNI_ROOT)/include/jni.h file included with the Java Software Development Kit (SDK) distribution. $(JNI_ROOT) is /usr/java by default, but a JNI_ROOT environment variable can be set with an alternative location before the JNI library is compiled.

Version:
2.20
Author:
Bradford Castalia, Christian Schaller - UA/PIRL
See Also:
PIRL.Database

Field Summary
static int BAD_REGEX
          Conductor procedure completion code.
protected  String Catalog
          The name of the database catalog containing the pipeline tables.
static String CATALOG_PARAMETER
          Conductor Configuration parameters.
static int COMMAND_LINE_FIELD
          Procedures table fields indexes.
static String CONDUCTOR_GROUP
          Conductor Configuration parameters.
static int CONDUCTOR_ID_FIELD
          Sources table fields indexes.
static String CONDUCTOR_ID_PARAMETER
          Conductor Configuration parameters.
static String CONFIGURATION_SOURCE_PARAMETER
          Conductor Configuration parameters.
static String DATABASE_HOSTNAME_PARAMETER
          Conductor Configuration parameters.
static String DATABASE_SERVER_NAME_PARAMETER
          Conductor Configuration parameters.
static String DATABASE_SERVER_PARAMETER
          Conductor Configuration parameters.
static String DATABASE_TYPE_PARAMETER
          Conductor Configuration parameters.
static String DEFAULT_CONFIGURATION_FILENAME
          The default configuration filename.
static int DEFAULT_DUPLICATE_PARAMETER_ACTION
          The default action should a duplicate parameter pathname occur in the Conductor Configuration file.
static boolean DEFAULT_EMPTY_SUCCESS_ANY
          The default for whether or not empty Success_Status and Success_Message fields in a procedure definition may imply any completion of the procedure is a success.
static int DEFAULT_POLL_INTERVAL
          The default polling interval, in seconds, for unprocessed source records when the no unprocessed source records are obtained.
static int DEFAULT_STOP_ON_FAILURE
          The default maximum number of sequential source processing failures.
static int DESCRIPTION_FIELD
          Procedures table fields indexes.
static String EMPTY_SUCCESS_ANY_PARAMETER
          Conductor Configuration parameters.
static int EXIT_COMMAND_LINE_SYNTAX
          Command line syntax problem exit status (1).
static int EXIT_CONFIGURATION_PROBLEM
          Configuration problem exit status (2).
static int EXIT_DATABASE_PROBLEM
          Configuration problem exit status (3).
static int EXIT_IO_FAILURE
          I/O failure exit status (4).
static int EXIT_PROCESS_PATCH
          Required Process patch not present exit status (5).
static int EXIT_STAGE_MANAGER
          The required Stage_Manager connection could not be established.
static int EXIT_SUCCESS
          Conductor success exit status (0).
static int EXIT_TOO_MANY_FAILURES
          The number of sequential source processing failures reached the Stop-on-Failure amount.
static int EXIT_UNEXPECTED_EXCEPTION
          An unexpected exception occured (9).
static String[] FAILURE_DESCRIPTION
          Conductor status failure code descriptions.
static int HALTED
          Processing state: A failure condition caused processing to halt.
static String HOSTNAME_PARAMETER
          Conductor Configuration parameters.
static String ID
          Class identification name with source code version and date.
static int INACCESSIBLE_FILE
          Conductor procedure completion code.
static int INVALID_DATABASE_ENTRY
          Conductor procedure completion code.
static String LOG_DIRECTORY_PARAMETER
          Conductor Configuration parameters.
static String LOG_FILENAME_PARAMETER
          Conductor Configuration parameters.
static int LOG_PATHNAME_FIELD
          Sources table fields indexes.
static String LOG_PATHNAME_PARAMETER
          Conductor Configuration parameters.
static int MAX_SOURCE_RECORDS_DEFAULT
          The default maximum number of unprocessed source records that will be obtained when the Conductor cache is refreshed.
static String MAX_SOURCE_RECORDS_PARAMETER
          Conductor Configuration parameters.
static int MIN_SOURCE_RECORDS
          The minimum value for the MAX_SOURCE_RECORDS_PARAMETER in the configuration.
protected static String NL
           
static int NO_PROCEDURE
          Conductor procedure completion code.
static String NOTIFY_PARAMETER
          Conductor Configuration parameters.
static int ON_FAILURE_FIELD
          Procedures table fields indexes.
static String ON_FAILURE_PROCEDURE_LOG_DELIMITER
          Marks the beginning of On_Failure procedure processing in a log file.
protected static String Pipeline
          The name of the pipeline (.) being managed.
static String PIPELINE_PARAMETER
          Conductor Configuration parameters.
static String POLL_INTERVAL_PARAMETER
          Conductor Configuration parameters.
static int POLLING
          Processing state: No unprocessed source records are available and poll interval for new records is positive.
static String PROCEDURE_COMPLETION_NUMBER_PARAMETER
          Conductor Configuration parameters.
static String PROCEDURE_COUNT_PARAMETER
          Conductor Configuration parameters.
static int PROCEDURE_FAILURE
          Conductor procedure completion code.
static String PROCEDURE_LOG_DELIMITER
          Marks the beginning of procedure processing in a log file.
protected  Vector<Vector<String>> Procedure_Records
          The content of the pipeline procedures table, without the field names, sorted by sequence number.
static String PROCEDURE_SEQUENCE_PARAMETER
          Conductor Configuration parameters.
static int PROCEDURE_SUCCESS
          Conductor procedure completion code.
static int PROCEDURE_TIMEOUT
          Conductor procedure completion code.
static String[] PROCEDURES_FIELD_NAMES
          Procedures table field names.
protected  Fields_Map Procedures_Map
           
protected  String Procedures_Table
          The name of the pipeline procedures table in the database.
static String PROCEDURES_TABLE_NAME_SUFFIX
          Procedures table name suffix.
static String PROCEDURES_TABLE_PARAMETER
          Conductor Configuration parameters.
static String RECONNECT_DELAY_PARAMETER
          Conductor Configuration parameters.
static String RECONNECT_TRIES_PARAMETER
          Conductor Configuration parameters.
static boolean Require_Stage_Manager
          If true and a Stage_Manager connection can not be established Conductor will throw an exception; otherwise Conductor will proceed with a Stage_Manager.
static String REQUIRE_STAGE_MANAGER_PARAMETER
          Conductor Configuration parameters.
protected  Reference_Resolver Resolver
          The Reference_Resolver object being used.
static String RESOLVER_DEFAULT_VALUE
          The default value to be used by the Reference_Resolver if a reference can not be resolved.
static int RUN_TO_WAIT
          Processing state: When the current source record completes processing the WAITING state will be entered unless a failure condition caused the HALTED state to occur.
static int RUNNING
          Processing state: Source records are being processing.
static int SEQUENCE_FIELD
          Procedures table fields indexes.
static int SOURCE_AVAILABLE_TRIES_DEFAULT
          Default number of source file availability tests.
static int SOURCE_AVAILABLE_TRIES_MAX
          Maximum number of source file availability tests.
static String SOURCE_AVAILABLE_TRIES_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_DIRECTORY_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_FAILURE_COUNT
          Conductor Configuration parameters.
static String SOURCE_FILE_LOG_DELIMITER
          Marks the beginning of source file processing in a log file.
static String SOURCE_FILENAME_EXTENSION_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_FILENAME_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_FILENAME_ROOT_PARAMETER
          Conductor Configuration parameters.
static int SOURCE_ID_FIELD
          Sources table fields indexes.
static String SOURCE_ID_PARAMETER
          Conductor Configuration parameters.
static int SOURCE_NUMBER_FIELD
          Sources table fields indexes.
static String SOURCE_NUMBER_PARAMETER
          Conductor Configuration parameters.
static int SOURCE_PATHNAME_FIELD
          Sources table fields indexes.
static String SOURCE_PATHNAME_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_SUCCESS_COUNT
          Conductor Configuration parameters.
static String[] SOURCES_FIELD_NAMES
          Sources table field names.
static String SOURCES_TABLE_NAME_SUFFIX
          Sources table name suffix.
static String SOURCES_TABLE_PARAMETER
          Conductor Configuration parameters.
static String STAGE_MANAGER_PASSWORD_PARAMETER
          Conductor Configuration parameters.
static String STAGE_MANAGER_PORT_PARAMETER
          Conductor Configuration parameters.
static String STAGE_MANAGER_TIMEOUT_PARAMETER
          Conductor Configuration parameters.
static int STATUS_FIELD
          Sources table fields indexes.
static String STDERR_NAME
          Prefix applied to procedure stderr lines.
static String STDOUT_NAME
          Prefix applied to procedure stdout lines.
static String STOP_ON_FAILURE_PARAMETER
          Conductor Configuration parameters.
static int SUCCESS_MESSAGE_FIELD
          Procedures table fields indexes.
static int SUCCESS_STATUS_FIELD
          Procedures table fields indexes.
protected  Configuration The_Configuration
          The Configuration object containing the configuration parameters.
protected  Database The_Database
          The Database object used to access the database server.
static int TIME_LIMIT_FIELD
          Procedures table fields indexes.
static String TOTAL_FAILURE_COUNT
          Conductor Configuration parameters.
static String TOTAL_PROCEDURE_RECORDS_PARAMETER
          Conductor Configuration parameters.
static int UNRESOLVABLE_REFERENCE
          Conductor procedure completion code.
static String UNRESOLVED_REFERENCE_PARAMETER
          Conductor Configuration parameters.
static String UNRESOLVED_REFERENCE_THROWS
          Conductor Configuration parameters.
static int WAITING
          Processing state: Idle; waiting for a start request.
 
Constructor Summary
protected Conductor()
          Constructs an uninititalized Conductor.
  Conductor(String pipeline, Configuration configuration)
          Constructs a Conductor for a pipeline from a Configuration.
  Conductor(String pipeline, Configuration configuration, String database_server_name)
          Constructs a Conductor for a pipeline from a Configuration.
 
Method Summary
 Management Add_Log_Writer(Writer writer)
           
 Management Add_Processing_Listener(Processing_Listener listener)
           
static String Config_Pathname(String name)
          Get an absolute Configuration pathname.
protected  String Config_Value(String name)
          Get a String parameter value from the configuration.
protected  boolean Config_Value(String name, Object value)
          Set a parameter in the configuration.
 Configuration Configuration()
          Get the Conductor Configuration.
protected  Database_Exception Connect_to_Database()
          Establish the Database connection.
 boolean Connected_to_Stage_Manager()
          Test if the Conductor is connected to the Stage_Manager.
 Management Enable_Log_Writer(Writer writer, boolean enable)
           
protected  Vector<Vector<String>> Get_Procedures_Table()
          Get the Procedures_Table from the Database.
 Message Identity()
          Get the Conductor identity.
protected  void Load_Procedure_Records()
          Load the Procedure_Records table.
protected  boolean Load_Source_Records()
          Load the Sources_Records table.
protected  void Log_Message(String message)
          Logs a message to the Logger.
protected  void Log_Message(String message, AttributeSet style)
          Logs a message to the Logger.
static void main(String[] args)
           
static String[] Parse_Command_Line(String command_line)
          Parse a String into command line arguments.
 int Poll_Interval()
           
 Management Poll_Interval(int seconds)
          Set the time interval to poll for unprocessed source records.
protected  void Postconfigure(Configuration configuration)
          Update the configuration and application control values after the Database and Reference_Resolver have been constructred.
protected  Configuration Preconfigure(Configuration configuration)
          Set the effective configuration.
 Vector<Vector<String>> Procedures()
          Get the procedures table.
static boolean Process_Patched()
          Confirms that the JFC Process class has been patched.
 Exception Processing_Exception()
           
 int Processing_State()
          Get the current Conductor processing state.
 void Quit()
          Immediately stop processing and exit.
 boolean Remove_Log_Writer(Writer writer)
           
 boolean Remove_Processing_Listener(Processing_Listener listener)
           
 Management Reset_Sequential_Failures()
           
 String Resolver_Default_Value()
           
 Management Resolver_Default_Value(String value)
           
 int Sequential_Failures()
           
 Vector<Vector<String>> Sources()
          Get the sources table.
 void Start()
          Start pipeline processing.
 Processing_Changes State()
          Get the current Conductor processing conditions state.
static String Status_Conductor_Code_Description(int code)
          Gets a description String for a Conductor procedure completion code.
static int Status_Conductor_Code(String status)
          Gets the Conductor procedure completion status code value from a procedure status indicator String.
static String Status_Field_Value(Vector<String> status)
          Assemble a properly formatted String for a Source table Status field value.
static String Status_Indicator(int conductor_status)
          Assemble a properly formatted procedure status indicator String as used in a Sources table Status field value.
static String Status_Indicator(int conductor_status, int procedure_status)
          Assemble a properly formatted procedure status indicator String as used in a Sources table Status field value.
static Vector<String>