PIRL

PIRL.Conductor
Class Conductor

java.lang.Object
  extended by PIRL.Conductor.Conductor
All Implemented Interfaces:
Management

public class Conductor
extends Object
implements Management

Conductor is a queue management mechanism for the sequential processing of data files.

A Conductor processes a list of data source files through a list of procedures to be invoked on each file. These lists are obtained from a Database as a pair of tables. The list of files is contained in a Sources table and the procedures are defined in a Procedures table. A pair of Sources and Procedures tables constitutes a Pipeline: each Sources record is processed in the order it occurs in the table (FIFO); each procedure specified by a Procedures record is executed in sequence number order. Each pipeline has a name that is used to find its database tables, where the pipeline tables are named:

<pipeline>_Sources
<pipeline>_Procedures

A Conductor must be run with a command line argument that specifies the pipeline it is to process. Multiple Conductors may safely process the same pipeline from the same or separate host systems.

Database

Conductor requires a database containing the pipeline tables to be accessible. It is accessed using the PIRL Database package. This package abstracts the particulars of database access. The Conductor configuration file specifies the database access information.

Sources Table

A Sources table must contain at least these fields:

Source_Number
Must be a non-NULL integer value unique for each record. While the order does not matter, it is easiest to make this a field with a value that is automatically assigned and incremented by the database server.
Source_ID
A string that identifies the file in some user specific manner. It is used, along with the Source_Number, to produce what is expected to be a unique log filename. If the field value is NULL or empty then the filename portion of the Source_Pathname, with any extension removed, will be used and this field will be updated.
Source_Pathname
This is the pathname of the file that is to be processed. The pathname is in the syntax of the host system. It is not required that the pathname be fully qualified, only that the file can be found using the pathname. This field value must not be NULL or empty.
Conductor_ID
This is a text field that must be NULL (not just empty) for each record to be processed. It is filled in with the name of the Conductor host system, possibly supplemented by the process ID of the Conductor (see the System Dependencies, Conductor_ID section, below), when the record is acquired for processing. This field provides an exclusive lock against other Conductor processes acquiring the record: Once a Conductor sets this field the record is no longer available for processing.
Status
An indicator of the status of each procedure invoked on the source file is recorded in this field. As each procedure is invoked on the source file this field is kept current with the status of the procedure. The format of the field value is described for the Status_Indicators method which, along with other Status_xxx methods, provides a convenient means for other Java classes to interpret and manipulate these field values. Note: This field should initially be NULL or empty. It is important that this field only be modified consistent with the operation of Conductor (i.e. change at your own risk!).
Log_Pathname
The pathname for the log file where the processing of the source file is recorded. Normally this field is left empty (or NULL) and the Log_Pathname (or Log_Directory) parameter of the Configuration file will be used. If either of these is a pathname that refers to an existing directory, then Conductor will generate an appropriate log filename (see the description of the Log_Directory and Log_Filename parameters, below). If this field is empty and both parameters are empty or not present, the log file will be written to the current working directory (the Log_Directory parameter is empty) However, if either this field or the Log_Pathname (or Log_Directory) parameter is not empty and does not refer to an existing directory then the pathname is to a regular file then that file will be appended with the source file log (a new file will be created if it does not yet exist). Both this field value and any parameter used will be reference resolved if not empty. N.B.: If a pathname to an existing log file pathname is not specified by this field or a configuration parameter then any existing file at the generated pathname is overwritten. This field is always updated by whatever actual log file pathname is used.

A Sources table may contain the required fields in any order, and it may contain additional fields as desired. For example, it is recommended that a timestamp field be provided that will automatically be updated with the last update time of each record.

Procedures Table

A Procedures table must contain at least these fields:

Sequence
A real number value that orders the procedure in the sequence of processing the source file. The values need not be sequential nor must they be in any particular order in the table. All of the procedure records will be sorted numerically on this field value so that processing of the source file will occur in sequence order. It is strongly recommended that the values be unique in the table, but this is not required; however there is no certainty of the order of processing for procedures with the same sequence number.
Command_Line
The command line to be submitted to the system for executing the procedure. The command line may contain embedded field and/or parameter references to be substituted with database field values and/or configuration parameter values. Obviously this field must be neither empty nor NULL.
Success_Status
An integer value that matches the exits status of the procedure when the procedure has completed successfully. The value can be text that is subject to embedded reference resolving, but must ultimately be convertible to an integer value. If this field is NULL or empty and the Success_Message field is not, then the latter field is used instead. If both fields are empty then the Empty_Success_Any parameter control how this will be interpreted. This field may contain text that is reference resolved but must produce an integer value.
Success_Message
Text to match on the procedure output lines - either stdout or stderr - to determine if the procedure completed successfully. This field is only used if the Success_Status field is empty or NULL. The text in this field is reference resolved. The match against the lines of procedure output treats the resolved text as a regular expression pattern.
Time_Limit
The maximum amount of time, in seconds, to wait for the procedure to complete. The value can be text that is subject to embedded reference resolving. After resolving any references the result is treated as a mathematical expression that must produce a single integer value. A 0 (zero) or negative value indicates an unlimited wait time. An empty or NULL value is equivalent to 0. If the procedure does not complete within the specified amount of time it is killed.
On_Failure
A command line just like the Command_Line field. If the procedure defined by the Command_Line field fails to complete successfully, then the procedure defined by the On_Failure field is run. No time limit is applied to this procedure.

A Procedures table may contain the required fields in any order, and it may contain additional fields as desired. If a "Description" field is present, which is higly recommended, it is used as a text description of each procedure included in the processing log. It is also recommended that a timestamp field be provided that will automatically be updated with the last update time of each record.

Database Connection Resilience

A Conductor maintains a connection with the Database server while it is operating. If any access to the Database by Conductor fails due to a loss of the connection, Conductor will attempt to reconnect and, if successful, repeat the access operation again (a connection failure of the repeated access operation does not result in a reconnection attempt). If the reconnection fails because the connection can not be established with the server, the attempt will be retried after a delay period of 5 minutes (this can be overridden with the RECONNECT_DELAY_PARAMETER). Up to 16 retries (this can be overridden with the RECONNECT_TRIES_PARAMETER) will be attempted before a database access failure is deemed to have occurred. N.B.: Database connection resilience does not automatically apply to any database access operations by pipeline procedures.

Configuration

When a Conductor is started it first reads its Configuration file. This is "Conductor.conf" by default, but another filename may be specified on the command line. The configuration file contains parameter definitions in Parameter Value Language (PVL) format. This file contains the information needed to access the database used by Conductor as well as any other parameters that may be useful to resolve parameter references embedded in procedure definition record field values. Since references may contain nested references it is quite appropriate for users to provide configuration parameters with values that are database field references (perhaps with complex conditionals and multiple field combinations) so that Command_Line (for example) definitions use the user specified parameter references rather than the more complicated definitions. This also makes it easy to modify the field reference definitions, just by editing the configuration file, without necessarily needing to change the contents of a Procedures pipeline table.

N.B.: By default, when the configuration file is read during startup by the application's main method should any parameters have the same pathname the last duplicate encountered is given preference. This is especially important to keep in mind when the configuration file includes another configuration file, such as a site-wide configuration. For example, a site-wide configuration file included in all pipeline-specific configuration files (a typical scenario) might have a Conductor group that includes default parameters such as Stop_on_Failure with a small value (e.g. 1 or 2) to prevent a bug in a pipeline procedure from generating a large number of source processing failures, while a configuration file for a specific pipeline that is expected to have failures (which may actually be branches off to some other pipeline depending on the outcome of some condition testing procedure) might have a Conductor group that includes a Stop_on_Failure with a large (or possibly zero) value. As long as the site-wide configuration file is included before the pipeline-specific Conductor/Stop_on_Failure parameter is specified the latter will take precedence over the former.

Parameters

The Conductor automatically provides a set of parameters in the configuration Conductor group:

Class_ID
The fully qualified Conductor class name with its revision number and date.
Configuration_Source
The name of the configuration file source. This may be the name of a file on the Conductor host system or a URL for a file obtained remotely.
Conductor_ID
The Conductor identification (see the System Dependencies, Conductor_ID section, below).
Database_Server_Name
The name of the database Server configuration parameters group. If no Server name could be determined, this parameter will not be included.
Hostname
The fully qualified hostname of the system where Conductor is running. If the hostname can not be obtained the IP (internet protocol) address will be used.
Database_Hostname
The hostname of the database server system.
Database_Type
The type of database server, as known to the Database access package.
Pipeline
The simple pipeline name (without any catalog prefix).
Catalog
The name of the database catalog where the pipeline tables are located.
Sources_Table
The full name of the table, including the Catalog prefix, containing the Source file records.
Procedures_Table
The full name of the table containing the procedure definitions.

Dynamic source parameters

The following parameters are reset for each source record being processed:

Log_Directory
The pathname to the directory where the log file is written. If a Log_Pathname field value is present it is used to determine the Log_Directory, but only for the current source record. Otherwise the Log_Pathname configuration parameter is used. If it is not present or empty the Log_Directory parameter is used instead. The default Log_Directory is Conductor's current working directory.
Log_Filename
The filename (without the directory path) of the source log file. If the neither the source record Log_Pathname field nor the Log_Pathname or Log_Directory parameters has a non-empty value that does not refer to an existing directory - a value that does not refer to an existing directory is taken to be the pathname to the log file - then a default filename will be generated that has the form:
<Pipeline>-<Source_ID>_<Source_Number>.log

The Pipeline name includes the leading database catalog name separated by a period ('.') character.

The Source_ID and Source_Number are obtained from the current source record. Note, however, that there is a chance that the Source_ID will include characters that are unsafe for use as part of a filename. Assuming that the only unsafe character is the system property "file.separator" character ('/' for Unix), it will be replaced with a percent ('%') character.

Note: If either the source record field or configuration parameter value is not empty and does not refer to a an existing directory, then that value will be used unconditionally to determine the log filename.

Source_Number
The Source_Number field value of the current source record.
Source_ID
The Source_ID field value of the current source record.
Source_Pathname
The Source_Pathname field value of the current source record in the filesystem's fully qualified (absolute) form.
Source_Directory
The directory path portion of the Source_Pathname value.
Source_Filename
The filename portion (without the directory pathname) of the Source_Pathname value.
Source_Filename_Extension
The portion of the Source_Filename value following the last period ('.') character in the name. This will be the empty string if there is no extension.
Source_Filename_Root
The portion of the Source_Filename value without the extension (the portion preceding the last period character). This may be the empty string.

Dynamic procedure parameters

The following parameters are reset for each procedure record being processed:

Total_Procedure_Records
The total number of procedure definition records in the procedures table. Note: This a a dynamic parameter because the procedures table is refreshed each time pipeline processing is started and the table may have been changed while the Conductor was waiting.
Procedure_Count
The procedure definition record count for the current, or last, procedure sequence. The first procedure definition record has a count of one. A Procedure_Count of zero means that processing for the current source record has not yet commenced.
Sequence
The Sequence field value of the current, or last. procedure record.
Completion_Number
The completion number for the last procedure that was executed. If the procedure ran to completion, whether it was successful or not, it will be the exit status value (a non-negative integer value) of the procedure. If the procedure did not complete for any reason it will be a negative Conductor completion code; the Status_Conductor_Code_Description static method may be used to obtain a brief, one line description of this code.

Conductor control parameters

The following configuration parameters will be used if they are present in the Conductor group or parent of this group:

Unresolved_Reference
The value to use for an unresolved reference. By default an unresolved referenced throws a Database_Exception. This can be specified with a value beginning with the word "throw" (case insensitive). Note: All parameters used by Conductor are reference resolved. Those with unresolved references that would throw an exception are deemed to be missing parameters. Those values that have incorrect reference syntax are left unresolved.
Empty_Success_Any
If "true", when Success_Status and Success_Message are both empty or NULL the corresponding procedure is always deemed successful when it completes. Otherwise this condition implies a zero (0) Success_Status. Default: false.
Min_Source_Records
The minimum number of source records to be processed in batch mode before Conductor stops. Default: 1.
Max_Source_Records
The maximum number of source records to obtain at any one time from the database. This prevents memory exhaustion if the number of unprocessed source records is very large. Must not be less than Min_Source_Records. Default: 1000.
Poll_Interval
The amount of time (in seconds) to wait before trying to obtain more unprocessed source records when querying the source table found no unprocessed records. If this value is zero or negative Conductor processing will stop instead of waiting to try again. Default: 30.
Source_Available_Tries
The number of tries that will be made to confirm that the Source_Pathname is accessible - i.e. exists as a regular file that can be read - before giving up and declaring the file to be inaccessible. After each accessibility check failure, and before the next try, a ten second pause will be provided. The intention is to give filesystem directory caches time to be synchronized with newly created files on remote filesystems. A maximum of 180 retries (30 minutes wait time) is allowed to prevent Conductor from waiting indefinitely. If the value is negative no Source_Pathname confirmation will be done. This can be useful if the pipeline does not process a source file. Default: 12.
Reconnect_Tries
The maximum number of reconnection tries if the database connection is lost. Default: 16.
Reconnect_Delay
The delay, in seconds, between database reconnection retry attempts. Default: 300.
Stop_on_Failure
The number of sequential source processing failures that will cause Conductor to stop further processing. Zero means source failures will never cause processing to stop. "true" or "yes" is equivalent to 1; "false" or "no" is equivalent to 0. Default: 0;
Notify
A list of zero or more email address that will be sent a notification if Conductor processing halts. The reason processing halted will be in the email message.

Stage_Manager parameters

The Conductor will try to connect to a Stage_Manager process on the local host system. The Stage_Manager provides remote Management capabilities. A host system may be running multiple Stage_Maangers, each using its own communications port, to provide mutiple Theater management contexts for different sets of Conductors. A Conductor can operate in only one Theater as determined by its Stage_Manager parameters. The following configuration parameters, which must be in a Stage_Manager sub-group of the Conductor group, control the Conductor connection to the Theater's Stage_Manager:

Require_Stage_Manager
If "enabled", "true", "yes" or 1 a connection to a Stage_Manager is required for this Conductor or it will not run. N.B.: This parameter is only read once when the Conductor is first configured; when the Conductor is reconfigured on being restarted after a Stop or Halt state the initial value is reset in the internal Configuration regardless of any change to the external configuration source.
Timeout
The amount of time (seconds) to wait for a Stage_Manager connection to complete before the connection attempt fails.
Password
The password required to authenticate a Stage_Manager connection. The value is a text string of any length. If this parameter is present its value is masked out in the internal Configuration. If this parameter is not present, or has an empty value, and the Stage_Manager requires an authenticated connection the connection attempt will fail.
HELLO_PORT_PARAMETER_NAME - Get and Set
The port number to use when listening for a Stage_Manager "Hello" broadcast that it is ready for connections.
HELLO_ADDRESS_PARAMETER_NAME - Get and Set
The multicast address to use when listening for a Stage_Manager "Hello" broadcast that it is ready for connections.

Processing

Procedure Records

After the Conductor has been initialized using its configuration file and connected to the database it begins to process the pipeline. Note: If the Conductor is run with a Manager (using the -Manager command line option), processing does not begin immediately; the Conductor will wait until it is told to start by the Manager. The records from the Procedures table are read, confirmed that they contain all necessary fields, and sorted into Sequence number order. A Conductor may be told to stop processing by a Manager (including a remote Manager), which will cause the Conductor to stop further processing when the current source record processing is complete. Processing is resumed with the next source record when a Manager sends the start signal. Each time pipeline processing is started the Procedures is loaded again and the configuration file is read again and used to reconfigure the Conductor. Changes to the Procedures records and configuration file can safely be made while Conductor is running.

Source Records

When pipeline processing has been started Conductor begins processing records from its Sources table. All unprocessed records, up to a maximum (set by default to 1000 to prevent memory exhaustion) configurable with the Max_Source_Records parameter, are read into an internal cache. An unprocessed record has a NULL in its Conductor_ID field. These records are processed in first-in first-out (FIFO) order.

An exclusive lock must be acquired on a record before it can be processed. To acquire a lock on a source record an attempt is made to update the record's Conductor_ID field to the Conductor's identification value (hostname and possibly process ID) with the condition that the field value is currently NULL. The update operation by the database server is atomic; once the operation is started by the database server it will go to completion without the possibility of interruption by any other database operation. This guarantees that only one process will be able to gain access to any source record even in the context of multiple processes contending for the same record at the same time. If some other Conductor has already acquired the record, as indicated by the failure of the update operation (because the Conductor_ID field is no longer NULL as required), the record will be removed from the cache and the Conductor will try to acquire a lock on the next record in the cache. If the update succeeds then the Conductor has acquired exclusive control of the record. It will be safe to process the source without concern that some other process may interfere.

The first step in processing a source record is to open a log file to record the processing. If the source record Log_Pathname field is empty the user's Log_Pathname configuration parameter will be used. If that is absent or empty the Log_Directory parameter will be used. If that, too, is absent or empty a default filename will be generated - as described in the Log_Filename parameter description, above - and the log file will be written the current working directory. If either the source record field or configuration parameter is not empty the value is resolved for any embedded references. If the resulting pathname refers to an existing directory the default filename is added. Otherwise the pathname is taken to refer to a file to which processing log output is to be appended (the file will be created if it does not exist). : In all other cases - whenever a default filename is generated - an existing file will be overwritten. The log file pathname that is used is always updated to the source record Log_Pathname field. Note: A source record will not be processed without a log file; it is fatal to Conductor not to have a writable log file available for each source record.

The log file always begins with the Conductor class identification:

PIRL.Conductor.Conductor (2.47 2012/04/16 06:04:09)

This line is immediately followed by a SOURCE_FILE_LOG_DELIMITER line which is expected to be 70 equals ('=') characters. This is followed by a date and time stamp and the source record description including the database server type and hostname, the fully qualified name of the Sources Table (Sources_Table) and the Source_Number, Source_ID, and Source_Pathname values.

The Status field of the source record is checked for any status indicators from possible previous processing. If present they are logged. If the last status indicator is for a failure condition this is logged and any further processing of the source is skipped. Note: It is possible to (re)start processing of a source midway in its procedures pipeline. This is done by first setting its Status field to include status indicators for procedures to be skipped; e.g. by removing a failure indicator at the end of the list after correcting the cause of the failure. Then the Conductor_ID field is set to NULL (as long as this is done last it will be safe even if the actively being processed by a Conductor). When the source record is acquired by a Conductor its processing will begin with the next procedure without a status indicator, and log output will be appended to its previous log file if present or a new file if needed. Caution: Procedure pipelines to be used in this way must, of course, ensure that any dependencies on previous procedures are taken into account.

At this point configuration parameters dependent on the current source record are updated.

The Source_Pathname file is confirmed to be a normal file (not a directory) that is readable. If the file is not accessible the Status field of the source record is updated in the database with the INACCESSIBLE_FILE Conductor completion code, the Completion_Number parameter is set to the same value, the condition is logged and further processing of the source record is canceled.

Procedures Pipeline

The source now enters the procedures pipeline. The procedure definition records are all cached and sorted by their Sequence number when Conductor first starts. Thus changes to a Procedures table will not take effect until after a Conductor (re)starts, and it is safe to change a Procedures table while a Conductor is running.

Each procedure record is applied to the current source record in Sequence number order; the Sequence parameter is updated before each procedure is processed.

Embedded References

All of the required Procedure fields, except the Sequence number, may contain embedded references. Each embedded reference is effectively a variable that is substituted with the value from a database field or configuration parameter specified by the reference. References may be arbitrarily nested; for example, the condition for selecting a record in a database field reference may be a parameter reference supplied in a configuration parameter. Reference resolution is also recursive; the value obtained from resolving a reference may itself contain embedded references. Thus a parameter reference may resolve to a parameter value that contains references. This allows the values of database fields to contain references to user defined parameters that are set as desired in the configuration file without needing to change the contents of database tables to effect the change.

Reference resolved values in Procedure fields allow dynamic definition of procedure attributes. References that are unresolved are fatal to Conductor unless the Unresolved_Reference configuration parameter has been set to a substitute string (e.g. ""). References that have incorrect syntax (e.g. unbalanced curly brace enclosures) are always fatal to Conductor.

Procedure Execution

The Command_Line value is reference resolved and parsed into an initial command name and command arguments. An empty or NULL Command_Line is fatal. Before each procedure is run the log file is written with the PROCEDURE_LOG_DELIMITER line. This is followed by a date and time stamp, the Sequence number, the Description field value (if it is not empty), and then the command line to be executed.

The command name and arguments are passed to the Java Runtime for execution as a Process by the host operating system. Note: the command is not run in a shell. It is, however, quite appropriate to run shell, or any other interpreted language, scripts (e.g. PERL). The only restriction on the procedure to be run is that it is accessible and executable. If the procedure can not be executed for any reason the source record's Status field value is appended with Conductor's NO_PROCEDURE error status. Otherwise the Status field is updated with the host system Process ID for the executed procedure; this is always an integer value greater than 1 that uniquely identifies the executing procedure in the host operating system.

All standard output from the procedure is copied into the log file with an annotation before each line that indicates whether the source is the procedure's stdout or stderr streams. Because these are separate streams read by asynchronous threads attached to each process stream there can be no guarantee of the relative logging order of lines from the two sources; while each stream is always logged in the order in which the procedure output to it, the uncertainties of system stream buffering and thread scheduling are likely to result in lines from stdout appearing in the log before or after where they might occur relative to stderr lines appearing in a shell terminal listing.

Conductor waits for the procedure to complete before proceeding. However, it will not wait longer than the number of seconds from the Time_Limit field (which may have embedded references and may be a mathematical expression). If the value is NULL, empty or zero then there is no limit to the amount of time Conductor will wait for the procedure to complete. It is generally a good idea to place a maximum running time limit on any procedure that could become "hung" (for example in a loop or on an inaccessible). If the time limit is reached Conductor will destroy the procedure. This is done by sending the procedure a terminate signal (SIGTERM). This signal can be caught by the procedure so it has an opportunity to clean up open files or child processes of its own. For scripts that have launched long running computational programs it is correct practice to catch the terminate signal and halt these programs; failure to do so is likely to leave these child programs running as orphans. If a procedure does not catch the terminate signal it will be automatically terminated by the operating system. If Conductor must terminate a procedure due to a timeout the source record's Status field will be updated with Conductor's PROCEDURE_TIMEOUT error status and the log will be written with notice of the timeout. If the procedure completes normally, then the standard output streams are drained and copied to the log file and the exit status from the procedure is also noted in the log file.

Procedure Status

When the procedure execution is done the Completion_Number parameter is updated. This will be a negative value if the procedure did not run to completion (could not be executed or exceed the Time_Limit), otherwise it will be the procedure's exit status value.

When a procedure completes normally Conductor uses either the Success_Status or Success_Message field values to determine if the procedure completed successfully. Usually the exit status is set by the procedure to a value that indicates if it succeeded. However, it may be necessary to examine the output of the procedure if the exit status is not reliable. There may also be unfortunate cases where there is no reliable indicator and all that can be done is assume that because the procedure completed it was successful.

If neither the Success_Status or Success_Message field values has been set to a non-empty value and an Empty_Success_Any configuration parameter was found with a "true" value then the procedure success of the proedure is implied (i.e. in this case the procedure is always successful if it completes normally); otherwise the Success_Status value is asserted to be "0".

If the Success_Status field is not empty it is reference resolved. The result is evaluated as a logical expression and if a result is obtained it determines if the procedure.was successful. Typically, the expression uses a reference to the Completion_Number parameter. The logical operators &, |, ~, =, <, >, <>, <=, >= may be used. The words "and", "or", and "not" can be used in place of &, |, and ~. Caution: Use &, not &&; |, not ||; ~, not !; =, not ==; and <>, not !=. A logical expression may contain embedded numeric expressions as well.

If the Success_Status does not contain a valid logical expression it is evaluated as a numeric expression. If the result, cast as an integer value, is equal to the procedure's exit status, then the procedure succeeded; otherwise it failed. A numeric expression may simply be a constant value (the symbols "pi" and "e" are recognized as constants) or may use the +, -, *, /, ^ operators; ** may be used instead of the ^ exponentiation operator. The tertiary operator ? with : may be used following an embedded logical expression such that if the logical expression is true then the following value before the : is used, else the value after the : is used (e.g. (4<5)?1:2 evaluates to 1). The functions sin, cos, tan, cot, sec, csc, arcsin, arccos, arctan, exp, ln, log2, log10, sqrt, cubert, abs, round, floor, ceiling, trunc may also be used with their argument following inside parentheses.

The Success_Message, if not empty, is used if the Success_Status is empty. It is referenced resolved and then matched, as a regular expression, against what was obtained from the procedure's stdout and stderr. If there is a match with either output, then the procedure succeeded; otherwise it failed. A resolved Success_Message value that does not produce a valid regular expression is fatal to Conductor. Note: Regular expressions are very powerful expression matching syntax similar to that used by PERL, but also can be daunting to the beginner.

Regardless of the outcome of procedure execution the Status field of the source record is updated in the database with the procedure's status indicator. This indicator always includes the Conductor completion code which can be translated into a descriptive line of text by the Status_Conductor_Code_Description static method. If the procedure completed with an exit status that value is included in the status indicator. The meaning of this value is procedure dependent. Of course the log file is also annotated accordingly.

On Failure

When Conductor determines that the procedure completed successfully it repeats the procedure execution operation with the next procedure in the pipeline Sequence. If Conductor determines that the procedure did not complete successfully, then it resolves any embedded references in the On_Failure field value and uses that as a command line for a procedure to be executed. This procedure is executed without any time limit. In the log file, where a normal procedure would have a PROCEDURE_LOG_DELIMITER the On_Failure procedure has an ON_FAILURE_PROCEDURE_LOG_DELIMITER and no Description. Although the completion status of this procedure is not included in the final status indicator of the source record's Status field it is noted in the log file.

When the number of sequential source processing failures reaches the Stop_on_Failure amount further processing is halted after the On_Failure procedure has been run.

When operating under the direction of a Manager (local and/or remote) if the Conductor halts for any reason it will send an email message to its Notify list and then wait to be told to start processing again. Without the possibility of a Manager to take charge the Conductor will exit after halting.

Sources Completion

The completion of the last of the Procedures in pipeline sequence, or the first On_Failure procedure, completes the processing of a source record. The log file is now closed. While there is another source record in the cache Conductor will continue trying to acquire an exclusive database lock. Once the cache is exhausted Conductor refreshes it from the Sources table with any new unprocessed records. If no unprocessed records are available then Conductor will sleep for the number of seconds indicated by the Poll_Interval configuration parameter. If no Poll_Interval parameter is present the default interval of 30 seconds is used. If the interval is less than or equal to 0, then Conductor processing will stop when it can find no source records to process.

System Dependencies

Java

Conductor is known to compile and run correctly with Java 1.4 and 1.5. Java was chosen for the implementation to maximize portability: as long as the host system provides a standard Java environment it should be able to run Conductor.

Conductor_ID

The Conductor_ID value will include the process ID of Conductor if it is available. Obtaining the process ID (PID) of Conductor - i.e. the Java Virtual Machine (JVM) that runs the Java classes - requires using a Java Native Interface (JNI) to the host system function that provides this information. Though this is trivial to implement it is outside the pure Java implementation of Conductor. Without the JNI code the Conductor_ID will only be the hostname of the system on which Conductor is running. With the JNI code the Conductor_ID will include the JVM PID after the hostname separated by a colon (':') character. The availability of the JVM PID will not have any effect on Conductor's operation. However, it is quite useful to have the JVM PID for procedures to use in disambiguating filenames in a parallel processing shared storage environment, and it can assist in systems administration work. A Native_Methods.c file that provides JNI access to the required system function is included in the source code distribution of Conductor. When the Conductor source code is compiled Native_Methods.c is also compiled to produce a dynamically loadable Native_Methods.so (or .jnilib on Apple OS X/Darwin systems) shared object library file in the Conductor/. subdirectory, where is the name of the host operating system (e.g. Darwin, FreeBSD, Linux or SunOS) and is the host system hardware architecture (e.g. i386, powerpc, x86_64 or sparc). N.B.: The library file must be copied to a location on the host system where it can be found by the dynamic linker (e.g. /usr/local/lib) when Conductor runs. The JNI library file can be built separately from the Java class files - if, for example, multiple operating systems and/or architecures are being used - by running "make jni" (GNU make may be named gmake) in the Conductor source code directory. The Native_Methods.c file requires the $(JNI_ROOT)/include/jni.h file included with the Java Software Development Kit (SDK) distribution. $(JNI_ROOT) is /usr/java by default, but a JNI_ROOT environment variable can be set with an alternative location before the JNI library is compiled.

Version:
2.47
Author:
Bradford Castalia, Christian Schaller - UA/PIRL
See Also:
PIRL.Database, PIRL.Conductor.Maestro

Field Summary
static int BAD_REGEX
          Conductor procedure completion code.
static int BATCH_POLL_INTERVAL
          The polling interval, in seconds, for unprocessed source records when no unprocessed source records are obtained from the Sources table.
protected  String Catalog
          The name of the database catalog containing the pipeline tables.
static String CATALOG_PARAMETER
          Conductor Configuration parameters.
static int COMMAND_LINE_FIELD
          Procedures table fields indexes.
static String CONDUCTOR_GROUP
          Conductor Configuration parameters.
static int CONDUCTOR_ID_FIELD
          Sources table fields indexes.
static String CONDUCTOR_ID_PARAMETER
          Conductor Configuration parameters.
static String CONFIGURATION_SOURCE_PARAMETER
          Conductor Configuration parameters.
static String DATABASE_HOSTNAME_PARAMETER
          Conductor Configuration parameters.
static String DATABASE_SERVER_NAME_PARAMETER
          Conductor Configuration parameters.
static String DATABASE_SERVER_PARAMETER
          Conductor Configuration parameters.
static String DATABASE_TYPE_PARAMETER
          Conductor Configuration parameters.
static String DEFAULT_CONFIGURATION_FILENAME
          The default configuration filename.
static int DEFAULT_DUPLICATE_PARAMETER_ACTION
          The default action should a duplicate parameter pathname occur in the Conductor Configuration file.
static boolean DEFAULT_EMPTY_SUCCESS_ANY
          The default for whether or not empty Success_Status and Success_Message fields in a procedure definition may imply any completion of the procedure is a success.
static int DEFAULT_POLL_INTERVAL
          The polling interval, in seconds, for unprocessed source records when no unprocessed source records are obtained from the Sources table.
static int DEFAULT_STOP_ON_FAILURE
          The default maximum number of sequential source processing failures.
static int DESCRIPTION_FIELD
          Procedures table fields indexes.
static String EMPTY_SUCCESS_ANY_PARAMETER
          Conductor Configuration parameters.
static int EXIT_COMMAND_LINE_SYNTAX
          Command line syntax problem exit status (1).
static int EXIT_CONFIGURATION_PROBLEM
          Configuration problem exit status (2).
static int EXIT_DATABASE_PROBLEM
          Configuration problem exit status (3).
static int EXIT_IO_FAILURE
          I/O failure exit status (4).
static int EXIT_STAGE_MANAGER
          The required Stage_Manager connection could not be established.
static int EXIT_SUCCESS
          Conductor success exit status (0).
static int EXIT_TOO_MANY_FAILURES
          The number of sequential source processing failures reached the Stop-on-Failure amount.
static int EXIT_UNEXPECTED_EXCEPTION
          An unexpected exception occured (9).
static String[] FAILURE_DESCRIPTION
          Conductor status failure code descriptions.
static int HALTED
          Processing state: A failure condition caused processing to halt.
static String HELLO_ADDRESS_PARAMETER_NAME
          Conductor Configuration parameters.
static String HELLO_PORT_PARAMETER_NAME
          Conductor Configuration parameters.
static String HOSTNAME_PARAMETER
          Conductor Configuration parameters.
static String ID
          Class identification name with source code version and date.
static int INACCESSIBLE_FILE
          Conductor procedure completion code.
static int INVALID_DATABASE_ENTRY
          Conductor procedure completion code.
static String LOG_DIRECTORY_PARAMETER
          Conductor Configuration parameters.
static String LOG_FILENAME_PARAMETER
          Conductor Configuration parameters.
static int LOG_PATHNAME_FIELD
          Sources table fields indexes.
static String LOG_PATHNAME_PARAMETER
          Conductor Configuration parameters.
static int MAX_SOURCE_RECORDS_DEFAULT
          The maximum number of unprocessed source records that will be obtained when the Conductor cache is refreshed.
static String MAX_SOURCE_RECORDS_PARAMETER
          Conductor Configuration parameters.
static int MIN_SOURCE_RECORDS_DEFAULT
          The minimum value for the Max_Source_Records value.
static String MIN_SOURCE_RECORDS_PARAMETER
          Conductor Configuration parameters.
protected static String NL
           
static int NO_PROCEDURE
          Conductor procedure completion code.
static String NOTIFY_PARAMETER
          Conductor Configuration parameters.
static int ON_FAILURE_FIELD
          Procedures table fields indexes.
static String ON_FAILURE_PROCEDURE_LOG_DELIMITER
          Marks the beginning of On_Failure procedure processing in a log file.
protected static String Pipeline
          The name of the pipeline (.) being managed.
static String PIPELINE_PARAMETER
          Conductor Configuration parameters.
static String POLL_INTERVAL_PARAMETER
          Conductor Configuration parameters.
static int POLLING
          Processing state: No unprocessed source records are available and the poll interval for new records is positive.
static String PROCEDURE_COMPLETION_NUMBER_PARAMETER
          Conductor Configuration parameters.
static String PROCEDURE_COUNT_PARAMETER
          Conductor Configuration parameters.
static int PROCEDURE_FAILURE
          Conductor procedure completion code.
static String PROCEDURE_LOG_DELIMITER
          Marks the beginning of procedure processing in a log file.
protected  Vector<Vector<String>> Procedure_Records
          The content of the pipeline procedures table, without the field names, sorted by sequence number.
static String PROCEDURE_SEQUENCE_PARAMETER
          Conductor Configuration parameters.
static int PROCEDURE_SUCCESS
          Conductor procedure completion code.
static int PROCEDURE_TIMEOUT
          Conductor procedure completion code.
static String[] PROCEDURES_FIELD_NAMES
          Procedures table field names.
protected  Fields_Map Procedures_Map
           
protected  String Procedures_Table
          The name of the pipeline procedures table in the database.
static String PROCEDURES_TABLE_NAME_SUFFIX
          Procedures table name suffix.
static String PROCEDURES_TABLE_PARAMETER
          Conductor Configuration parameters.
static String RECONNECT_DELAY_PARAMETER
          Conductor Configuration parameters.
static String RECONNECT_TRIES_PARAMETER
          Conductor Configuration parameters.
static boolean Require_Stage_Manager
          Flag that determines if the Conductor requires a Stage_Manager.
static String REQUIRE_STAGE_MANAGER_PARAMETER
          Conductor Configuration parameters.
protected  Reference_Resolver Resolver
          The Reference_Resolver object being used.
static String RESOLVER_DEFAULT_VALUE
          The default value to be used by the Reference_Resolver if a reference can not be resolved.
static int RUN_TO_WAIT
          Processing state: When the current source record completes processing the WAITING state will be entered unless a failure condition caused the HALTED state to occur.
static int RUNNING
          Processing state: Source records are being processed.
static int SEQUENCE_FIELD
          Procedures table fields indexes.
static int SOURCE_AVAILABLE_NO_CHECK
          When the SOURCE_AVAILABLE_TRIES_PARAMETER is this value source file availability confirmation is disabled.
static int SOURCE_AVAILABLE_TRIES_DEFAULT
          Default number of source file availability tests.
static int SOURCE_AVAILABLE_TRIES_MAX
          Maximum number of source file availability tests.
static String SOURCE_AVAILABLE_TRIES_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_DIRECTORY_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_FAILURE_COUNT
          Conductor Configuration parameters.
static String SOURCE_FILE_LOG_DELIMITER
          Marks the beginning of source file processing in a log file.
static String SOURCE_FILENAME_EXTENSION_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_FILENAME_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_FILENAME_ROOT_PARAMETER
          Conductor Configuration parameters.
static int SOURCE_ID_FIELD
          Sources table fields indexes.
static String SOURCE_ID_PARAMETER
          Conductor Configuration parameters.
static int SOURCE_NUMBER_FIELD
          Sources table fields indexes.
static String SOURCE_NUMBER_PARAMETER
          Conductor Configuration parameters.
static int SOURCE_PATHNAME_FIELD
          Sources table fields indexes.
static String SOURCE_PATHNAME_PARAMETER
          Conductor Configuration parameters.
static String SOURCE_SUCCESS_COUNT
          Conductor Configuration parameters.
static String[] SOURCES_FIELD_NAMES
          Sources table field names.
static String SOURCES_TABLE_NAME_SUFFIX
          Sources table name suffix.
static String SOURCES_TABLE_PARAMETER
          Conductor Configuration parameters.
static String STAGE_MANAGER_PASSWORD_PARAMETER
          Conductor Configuration parameters.
static String STAGE_MANAGER_PORT_PARAMETER
          Conductor Configuration parameters.
static String STAGE_MANAGER_TIMEOUT_PARAMETER
          Conductor Configuration parameters.
static int STATUS_FIELD
          Sources table fields indexes.
static String STDERR_NAME
          Prefix applied to procedure stderr lines.
static String STDOUT_NAME
          Prefix applied to procedure stdout lines.
static String STOP_ON_FAILURE_PARAMETER
          Conductor Configuration parameters.
static int SUCCESS_MESSAGE_FIELD
          Procedures table fields indexes.
static int SUCCESS_STATUS_FIELD
          Procedures table fields indexes.
protected  Configuration The_Configuration
          The Configuration object containing the configuration parameters.
protected  Database The_Database
          The Database object used to access the database server.
static int TIME_LIMIT_FIELD
          Procedures table fields indexes.
static String TOTAL_FAILURE_COUNT
          Conductor Configuration parameters.
static String TOTAL_PROCEDURE_RECORDS_PARAMETER
          Conductor Configuration parameters.
static int UNRESOLVABLE_REFERENCE
          Conductor procedure completion code.
static String UNRESOLVED_REFERENCE_PARAMETER
          Conductor Configuration parameters.
static String UNRESOLVED_REFERENCE_THROWS
          Conductor Configuration parameters.
static int WAITING
          Processing state: Idle; waiting for a start request.
 
Constructor Summary
protected Conductor()
          Constructs an uninititalized Conductor.
  Conductor(String pipeline, Configuration configuration)
          Construct a Conductor for a pipeline from a Configuration.
  Conductor(String pipeline, Configuration configuration, String database_server_name)
          Construct a Conductor for a pipeline from a Configuration.
 
Method Summary
 Management Add_Log_Writer(Writer writer)
          Register a Writer to receive processing log stream output.
 Management Add_Processing_Listener(Processing_Listener listener)
          Register a processing state change listener.
static String Config_Pathname(String name)
          Get an absolute Conductor Configuration pathname.
protected  String Config_Value(String name)
          Get a String parameter value from the configuration.
protected  boolean Config_Value(String name, Object value)
          Set a parameter in the configuration.
 Configuration Configuration()
          Get the Conductor Configuration.
protected  Database_Exception Connect_to_Database()
          Establish the Database connection.
 boolean Connected_to_Stage_Manager()
          Test if the Conductor is connected to a Stage_Manager.
 Management Enable_Log_Writer(Writer writer, boolean enable)
          Enable or disable output to a registered log stream Writer.
protected  Vector<Vector<String>> Get_Procedures_Table()
          Get the Procedures_Table from the Database.
 Message Identity()
          Get the identity description Message for this Conductor.
protected  void Load_Procedure_Records()
          Load the Procedure_Records table.
protected  boolean Load_Source_Records()
          Load the Sources_Records table.
protected  void Log_Message(String message)
          Logs a message to the Logger.
protected  void Log_Message(String message, AttributeSet style)
          Logs a message to the Logger.
static void main(String[] args)
          Instantiate a Conductor application.
static String[] Parse_Command_Line(String command_line)
          Parse a String into command line arguments.
 int Poll_Interval()
          Get the interval at which the Conductor will poll for unprocessed source records.
 Management Poll_Interval(int seconds)
          Set the time interval to poll for unprocessed source records.
protected  void Postconfigure(Configuration configuration)
          Update the configuration and application control values after the Database and Reference_Resolver have been constructred.
protected  Configuration Preconfigure(Configuration configuration)
          Set the effective configuration.
 Vector<Vector<String>> Procedures()
          Get the procedures table.
 Exception Processing_Exception()
          Get the exception that caused Conductor processing to halt.
 int Processing_State()
          Get the current Conductor processing state.
 void Quit()
          Immediately stop processing and exit.
 boolean Remove_Log_Writer(Writer writer)
          Unregister a log Writer.
 boolean Remove_Processing_Listener(Processing_Listener listener)
          Unregister a processing state change listener.
 Management Reset_Sequential_Failures()
          Reset the count of sequential source processing failures that the Conductor has accumulated.
protected  String Resolve(String reference)
          Resolve a reference
 String Resolver_Default_Value()
          Get the Conductor default Reference_Resolver value.
 Management Resolver_Default_Value(String value)
          Set the Conductor default Reference_Resolver value.
 int Sequential_Failures()
          Get the count of sequential source processing failures that the Conductor has accumulated.
 Vector<Vector<String>> Sources()
          Get the current cache of source records.
 void Start()
          Start pipeline processing.
 Processing_Changes State()
          Get the current Conductor processing conditions state.
static String Status_Conductor_Code_Description(int code)
          Get a description String for a Conductor procedure completion code.
static int Status_Conductor_Code(String status)
          Get the Conductor procedure completion status code value from a procedure status indicator String.
static String Status_Field_Value(Vector<String> status)
          Assemble a properly formatted String for a Source table Status field value.
static String Status_Indicator(int conductor_status)
          Assemble a properly formatted procedure status indicator String as used in a Sources table Status field value.
static String Status_Indicator(int conductor_status, int procedure_status)
          Assemble a properly formatted procedure status indicator String as used in a Sources table Status field value.
static Vector<String> Status_Indicators(String status_field)
          Parse a Source table Status field value into a Vector of procedure status indicator Strings.
static int Status_Procedure_Exit_Value(String status)
          Get the procedure exit status value from a procedure status indicator String.
 int Stop_on_Failure()
          Get the number of Conductor sequential source processing failures at which to stop processing.
 Management Stop_on_Failure(int failure_count)
          Set the sequential failure limit at which to halt processing source records.
 void Stop()
          Stop pipeline processing after the current source processing has completed.
static void Usage()
          Prints the command line usage syntax.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ID

public static final String ID
Class identification name with source code version and date.

See Also:
Constant Field Values

DEFAULT_CONFIGURATION_FILENAME

public static final String DEFAULT_CONFIGURATION_FILENAME
The default configuration filename.

See Also:
Constant Field Values

The_Configuration

protected Configuration The_Configuration
The Configuration object containing the configuration parameters.


DEFAULT_DUPLICATE_PARAMETER_ACTION

public static final int DEFAULT_DUPLICATE_PARAMETER_ACTION
The default action should a duplicate parameter pathname occur in the Conductor Configuration file.

See Also:
Constant Field Values

CONDUCTOR_GROUP

public static final String CONDUCTOR_GROUP
Conductor Configuration parameters.

See Also:
Constant Field Values

CONFIGURATION_SOURCE_PARAMETER

public static final String CONFIGURATION_SOURCE_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

DATABASE_SERVER_NAME_PARAMETER

public static final String DATABASE_SERVER_NAME_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

DATABASE_SERVER_PARAMETER

public static final String DATABASE_SERVER_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

HOSTNAME_PARAMETER

public static final String HOSTNAME_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

DATABASE_HOSTNAME_PARAMETER

public static final String DATABASE_HOSTNAME_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

DATABASE_TYPE_PARAMETER

public static final String DATABASE_TYPE_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

PIPELINE_PARAMETER

public static final String PIPELINE_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

CATALOG_PARAMETER

public static final String CATALOG_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

PROCEDURES_TABLE_PARAMETER

public static final String PROCEDURES_TABLE_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCES_TABLE_PARAMETER

public static final String SOURCES_TABLE_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

UNRESOLVED_REFERENCE_PARAMETER

public static final String UNRESOLVED_REFERENCE_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

UNRESOLVED_REFERENCE_THROWS

public static final String UNRESOLVED_REFERENCE_THROWS
Conductor Configuration parameters.

See Also:
Constant Field Values

EMPTY_SUCCESS_ANY_PARAMETER

public static final String EMPTY_SUCCESS_ANY_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

MIN_SOURCE_RECORDS_PARAMETER

public static final String MIN_SOURCE_RECORDS_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

MAX_SOURCE_RECORDS_PARAMETER

public static final String MAX_SOURCE_RECORDS_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

POLL_INTERVAL_PARAMETER

public static final String POLL_INTERVAL_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_AVAILABLE_TRIES_PARAMETER

public static final String SOURCE_AVAILABLE_TRIES_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

STOP_ON_FAILURE_PARAMETER

public static final String STOP_ON_FAILURE_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

RECONNECT_TRIES_PARAMETER

public static final String RECONNECT_TRIES_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

RECONNECT_DELAY_PARAMETER

public static final String RECONNECT_DELAY_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

NOTIFY_PARAMETER

public static final String NOTIFY_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_NUMBER_PARAMETER

public static final String SOURCE_NUMBER_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_PATHNAME_PARAMETER

public static final String SOURCE_PATHNAME_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_ID_PARAMETER

public static final String SOURCE_ID_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

CONDUCTOR_ID_PARAMETER

public static final String CONDUCTOR_ID_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_DIRECTORY_PARAMETER

public static final String SOURCE_DIRECTORY_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_FILENAME_PARAMETER

public static final String SOURCE_FILENAME_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_FILENAME_ROOT_PARAMETER

public static final String SOURCE_FILENAME_ROOT_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_FILENAME_EXTENSION_PARAMETER

public static final String SOURCE_FILENAME_EXTENSION_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

LOG_PATHNAME_PARAMETER

public static final String LOG_PATHNAME_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

LOG_DIRECTORY_PARAMETER

public static final String LOG_DIRECTORY_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

LOG_FILENAME_PARAMETER

public static final String LOG_FILENAME_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_SUCCESS_COUNT

public static final String SOURCE_SUCCESS_COUNT
Conductor Configuration parameters.

See Also:
Constant Field Values

SOURCE_FAILURE_COUNT

public static final String SOURCE_FAILURE_COUNT
Conductor Configuration parameters.

See Also:
Constant Field Values

TOTAL_FAILURE_COUNT

public static final String TOTAL_FAILURE_COUNT
Conductor Configuration parameters.

See Also:
Constant Field Values

TOTAL_PROCEDURE_RECORDS_PARAMETER

public static final String TOTAL_PROCEDURE_RECORDS_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

PROCEDURE_COUNT_PARAMETER

public static final String PROCEDURE_COUNT_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

PROCEDURE_SEQUENCE_PARAMETER

public static final String PROCEDURE_SEQUENCE_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

PROCEDURE_COMPLETION_NUMBER_PARAMETER

public static final String PROCEDURE_COMPLETION_NUMBER_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

REQUIRE_STAGE_MANAGER_PARAMETER

public static final String REQUIRE_STAGE_MANAGER_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

STAGE_MANAGER_PASSWORD_PARAMETER

public static final String STAGE_MANAGER_PASSWORD_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

STAGE_MANAGER_PORT_PARAMETER

public static final String STAGE_MANAGER_PORT_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

STAGE_MANAGER_TIMEOUT_PARAMETER

public static final String STAGE_MANAGER_TIMEOUT_PARAMETER
Conductor Configuration parameters.

See Also:
Constant Field Values

HELLO_PORT_PARAMETER_NAME

public static final String HELLO_PORT_PARAMETER_NAME
Conductor Configuration parameters.

See Also:
Constant Field Values

HELLO_ADDRESS_PARAMETER_NAME

public static final String HELLO_ADDRESS_PARAMETER_NAME
Conductor Configuration parameters.

See Also:
Constant Field Values

The_Database

protected Database The_Database
The Database object used to access the database server.


Catalog

protected String Catalog
The name of the database catalog containing the pipeline tables.


Pipeline

protected static String Pipeline
The name of the pipeline (.) being managed.


Resolver

protected Reference_Resolver Resolver
The Reference_Resolver object being used.


RESOLVER_DEFAULT_VALUE

public static final String RESOLVER_DEFAULT_VALUE
The default value to be used by the Reference_Resolver if a reference can not be resolved.

A null value means that the Reference_Resolver will throw an exception if a reference can not be resolved.

This value may be overriden by the UNRESOLVED_REFERENCE_PARAMETER of the configuration file. A parameter value of UNRESOLVED_REFERENCE_THROWS (case insensitive) is equivalent to a null default value.


SOURCES_FIELD_NAMES

public static final String[] SOURCES_FIELD_NAMES
Sources table field names.


SOURCE_NUMBER_FIELD

public static int SOURCE_NUMBER_FIELD
Sources table fields indexes.


SOURCE_PATHNAME_FIELD

public static int SOURCE_PATHNAME_FIELD
Sources table fields indexes.


SOURCE_ID_FIELD

public static int SOURCE_ID_FIELD
Sources table fields indexes.


CONDUCTOR_ID_FIELD

public static int CONDUCTOR_ID_FIELD
Sources table fields indexes.


STATUS_FIELD

public static int STATUS_FIELD
Sources table fields indexes.


LOG_PATHNAME_FIELD

public static int LOG_PATHNAME_FIELD
Sources table fields indexes.


SOURCES_TABLE_NAME_SUFFIX

public static final String SOURCES_TABLE_NAME_SUFFIX
Sources table name suffix.

See Also:
Constant Field Values

MIN_SOURCE_RECORDS_DEFAULT

public static final int MIN_SOURCE_RECORDS_DEFAULT
The minimum value for the Max_Source_Records value. The value may be set in the configuration by the MIN_SOURCE_RECORDS_PARAMETER. The default is MIN_SOURCE_RECORDS_DEFAULT.

When operating in batch mode, this value control the minimum number of source records to be processed before stopping.

See Also:
MAX_SOURCE_RECORDS_DEFAULT, Constant Field Values

MAX_SOURCE_RECORDS_DEFAULT

public static final int MAX_SOURCE_RECORDS_DEFAULT
The maximum number of unprocessed source records that will be obtained when the Conductor cache is refreshed.

This value may be set in the configuration by the MAX_SOURCE_RECORDS_PARAMETER. The default is MAX_SOURCE_RECORDS_DEFAULT.

See Also:
MIN_SOURCE_RECORDS_DEFAULT, Constant Field Values

SOURCE_AVAILABLE_TRIES_DEFAULT

public static final int SOURCE_AVAILABLE_TRIES_DEFAULT
Default number of source file availability tests.

Due to NFS filesystems latency it is possible that a source file that has just been created and registered in a Sources table by a Conductor on one system will not appear if accessed too soon by a Conductor on another system. The workaround is to repeatedly try to see the file, with a ten second delay between tries. The difficulty is knowing how many tries to make before giving up. As of this writing it seems that sufficient tries should be allowed for up to two minutes of trying.

The Configuration SOURCE_AVAILABLE_TRIES_PARAMETER can be used to override the default.

See Also:
Constant Field Values

SOURCE_AVAILABLE_TRIES_MAX

public static final int SOURCE_AVAILABLE_TRIES_MAX
Maximum number of source file availability tests. A maximum of 30 minutes of trying is allowed.

See Also:
SOURCE_AVAILABLE_TRIES_DEFAULT, Constant Field Values

SOURCE_AVAILABLE_NO_CHECK

public static final int SOURCE_AVAILABLE_NO_CHECK
When the SOURCE_AVAILABLE_TRIES_PARAMETER is this value source file availability confirmation is disabled.

See Also:
Constant Field Values

DEFAULT_POLL_INTERVAL

public static final int DEFAULT_POLL_INTERVAL
The polling interval, in seconds, for unprocessed source records when no unprocessed source records are obtained from the Sources table.

This value may be set by the configuration POLL_INTERVAL_PARAMETER.

A value of zero sets batch mode: Conductor is to stop when no unprocessed source records are available and at least Min_Source_Records have been processed. If polling for more sources is required in batch mode a polling interval of BATCH_POLL_INTERVAL seconds will be used.

See Also:
MIN_SOURCE_RECORDS_DEFAULT, Constant Field Values

BATCH_POLL_INTERVAL

public static final int BATCH_POLL_INTERVAL
The polling interval, in seconds, for unprocessed source records when no unprocessed source records are obtained from the Sources table.

This value may be set by the configuration POLL_INTERVAL_PARAMETER.

A value of zero sets batch mode: Conductor is to stop when no unprocessed source records are available and at least Min_Source_Records have been processed. If polling for more sources is required in batch mode a polling interval of BATCH_POLL_INTERVAL seconds will be used.

See Also:
MIN_SOURCE_RECORDS_DEFAULT, Constant Field Values

PROCEDURES_FIELD_NAMES

public static final String[] PROCEDURES_FIELD_NAMES
Procedures table field names.


SEQUENCE_FIELD

public static final int SEQUENCE_FIELD
Procedures table fields indexes.

See Also:
Constant Field Values

DESCRIPTION_FIELD

public static final int DESCRIPTION_FIELD
Procedures table fields indexes.

See Also:
Constant Field Values

COMMAND_LINE_FIELD

public static final int COMMAND_LINE_FIELD
Procedures table fields indexes.

See Also:
Constant Field Values

SUCCESS_STATUS_FIELD

public static final int SUCCESS_STATUS_FIELD
Procedures table fields indexes.

See Also:
Constant Field Values

SUCCESS_MESSAGE_FIELD

public static final int SUCCESS_MESSAGE_FIELD
Procedures table fields indexes.

See Also:
Constant Field Values

TIME_LIMIT_FIELD

public static final int TIME_LIMIT_FIELD
Procedures table fields indexes.

See Also:
Constant Field Values

ON_FAILURE_FIELD

public static final int ON_FAILURE_FIELD
Procedures table fields indexes.

See Also:
Constant Field Values

Procedures_Map

protected Fields_Map Procedures_Map

PROCEDURES_TABLE_NAME_SUFFIX

public static final String PROCEDURES_TABLE_NAME_SUFFIX
Procedures table name suffix.

See Also:
Constant Field Values

Procedures_Table

protected String Procedures_Table
The name of the pipeline procedures table in the database.


Procedure_Records

protected Vector<Vector<String>> Procedure_Records
The content of the pipeline procedures table, without the field names, sorted by sequence number.


PROCEDURE_SUCCESS

public static final int PROCEDURE_SUCCESS
Conductor procedure completion code.

See Also:
Status_Indicators(String), Constant Field Values

PROCEDURE_FAILURE

public static final int PROCEDURE_FAILURE
Conductor procedure completion code.

See Also:
Status_Indicators(String), Constant Field Values

INACCESSIBLE_FILE

public static final int INACCESSIBLE_FILE
Conductor procedure completion code.

See Also:
Status_Indicators(String), Constant Field Values

UNRESOLVABLE_REFERENCE

public static final int UNRESOLVABLE_REFERENCE
Conductor procedure completion code.

See Also:
Status_Indicators(String), Constant Field Values

NO_PROCEDURE

public static final int NO_PROCEDURE
Conductor procedure completion code.

See Also:
Status_Indicators(String), Constant Field Values

PROCEDURE_TIMEOUT

public static final int PROCEDURE_TIMEOUT
Conductor procedure completion code.

See Also:
Status_Indicators(String), Constant Field Values

BAD_REGEX

public static final int BAD_REGEX
Conductor procedure completion code.

See Also:
Status_Indicators(String), Constant Field Values

INVALID_DATABASE_ENTRY

public static final int INVALID_DATABASE_ENTRY
Conductor procedure completion code.

See Also:
Status_Indicators(String), Constant Field Values

FAILURE_DESCRIPTION

public static final String[] FAILURE_DESCRIPTION
Conductor status failure code descriptions.

See Also:
Status_Conductor_Code_Description(int)

DEFAULT_EMPTY_SUCCESS_ANY

public static final boolean DEFAULT_EMPTY_SUCCESS_ANY
The default for whether or not empty Success_Status and Success_Message fields in a procedure definition may imply any completion of the procedure is a success.

This value may be overridden by the EMPTY_SUCCESS_ANY_PARAMETER configuruation parameter.

See Also:
Constant Field Values

DEFAULT_STOP_ON_FAILURE

public static final int DEFAULT_STOP_ON_FAILURE
The default maximum number of sequential source processing failures.

See Also:
Constant Field Values

RUNNING

public static final int RUNNING
Processing state: Source records are being processed.

See Also:
Constant Field Values

POLLING

public static final int POLLING
Processing state: No unprocessed source records are available and the poll interval for new records is positive.

Between attempts to load new source records that have not been unprocessed the processing thread will sleep for the poll interval. Once additional unprocessed source records have been obtained the RUNNING state will be entered, unless the RUN_TO_WAIT state has been set.

See Also:
Constant Field Values

RUN_TO_WAIT

public static final int RUN_TO_WAIT
Processing state: When the current source record completes processing the WAITING state will be entered unless a failure condition caused the HALTED state to occur.

N.B.: If the Conductor is repeatedly trying to confirm the existence of a source file or to reconnect to the database when the stop request is received retrying will be discontinued which can be expected to result in a source record processing failure or database connectivity failure halt.

This state is only entered when the user requests that processing stop while the RUNNING or POLLING state is in effect.

See Also:
Constant Field Values

WAITING

public static final int WAITING
Processing state: Idle; waiting for a start request.

See Also:
Constant Field Values

HALTED

public static final int HALTED
Processing state: A failure condition caused processing to halt.

The problem may be the result of the maximum number of sequential source records processing procedure failures having occured, a database access failure, or some other system error. This state remains in effect until a start request is received.

See Also:
Constant Field Values

NL

protected static final String NL

STDOUT_NAME

public static final String STDOUT_NAME
Prefix applied to procedure stdout lines.

See Also:
Constant Field Values

STDERR_NAME

public static final String STDERR_NAME
Prefix applied to procedure stderr lines.

See Also:
Constant Field Values

SOURCE_FILE_LOG_DELIMITER

public static final String SOURCE_FILE_LOG_DELIMITER
Marks the beginning of source file processing in a log file.


PROCEDURE_LOG_DELIMITER

public static final String PROCEDURE_LOG_DELIMITER
Marks the beginning of procedure processing in a log file.


ON_FAILURE_PROCEDURE_LOG_DELIMITER

public static final String ON_FAILURE_PROCEDURE_LOG_DELIMITER
Marks the beginning of On_Failure procedure processing in a log file.


EXIT_SUCCESS

public static final int EXIT_SUCCESS
Conductor success exit status (0).

See Also:
Constant Field Values

EXIT_COMMAND_LINE_SYNTAX

public static final int EXIT_COMMAND_LINE_SYNTAX
Command line syntax problem exit status (1).

See Also:
Constant Field Values

EXIT_CONFIGURATION_PROBLEM

public static final int EXIT_CONFIGURATION_PROBLEM
Configuration problem exit status (2).

See Also:
Constant Field Values

EXIT_DATABASE_PROBLEM

public static final int EXIT_DATABASE_PROBLEM
Configuration problem exit status (3).

See Also:
Constant Field Values

EXIT_IO_FAILURE

public static final int EXIT_IO_FAILURE
I/O failure exit status (4).

See Also:
Constant Field Values

EXIT_TOO_MANY_FAILURES

public static final int EXIT_TOO_MANY_FAILURES
The number of sequential source processing failures reached the Stop-on-Failure amount.

See Also:
Constant Field Values

EXIT_STAGE_MANAGER

public static final int EXIT_STAGE_MANAGER
The required Stage_Manager connection could not be established.

See Also:
Constant Field Values

EXIT_UNEXPECTED_EXCEPTION

public static final int EXIT_UNEXPECTED_EXCEPTION
An unexpected exception occured (9).

See Also:
Constant Field Values

Require_Stage_Manager

public static boolean Require_Stage_Manager
Flag that determines if the Conductor requires a Stage_Manager.

If true and a Stage_Manager connection can not be established the Conductor will throw an exception; otherwise the Conductor will proceed without a Stage_Manager.

The initial value is false;

Constructor Detail

Conductor

public Conductor(String pipeline,
                 Configuration configuration,
                 String database_server_name)
          throws Database_Exception,
                 Configuration_Exception,
                 IOException
Construct a Conductor for a pipeline from a Configuration.

The Configuration object is expected to contain all the necessary information Conductor needs to connect to the database as well as any other Conductor parameters it might use.

Parameters:
pipeline - The name of the pipeline to be managed.
configuration - A Configuration object.
database_server_name - The name of a Configuration Group that will provide the database server access parameters.
Throws:
Database_Exception - if there is a problem connecting to the database.
Configuration_Exception - if there is a problem with the configuration file.
IOException - if a connection could not be made to the Stage_Manager and Require_Stage_Manager is true;

Conductor

public Conductor(String pipeline,
                 Configuration configuration)
          throws Database_Exception,
                 Configuration_Exception,
                 IOException
Construct a Conductor for a pipeline from a Configuration.

The Configuration object is expected to contain all the necessary information Conductor needs to connect to the database as well as any other Conductor parameters it might use.

The database server access parameters will be obtained from the Configuration Group named by the first entry in the Server parameter list.

Parameters:
pipeline - The name of the pipeline to be managed.
configuration - A Configuration object.
Throws:
Database_Exception - if there is a problem connecting to the database.
Configuration_Exception - if there is a problem with the configuration file.
IOException - if a connection could not be made to the Stage_Manager and Require_Stage_Manager is true;

Conductor

protected Conductor()
Constructs an uninititalized Conductor.

Method Detail

Preconfigure

protected Configuration Preconfigure(Configuration configuration)
                              throws Configuration_Exception
Set the effective configuration.

If no Configuration is provided an effort is made to load the DEFAULT_CONFIGURATION_FILENAME.

Essential configuration information is obtained from, and set in, the CONDUCTOR_GROUP parameters of the Configuration:

Theater.CLASS_ID_PARAMETER_NAME - Set
The ID of this class.
CONFIGURATION_SOURCE_PARAMETER - Set
The source of the Configuration that is being used.
HOSTNAME_PARAMETER - Set
The Host.FULL_HOSTNAME of the host system.
CONDUCTOR_ID_PARAMETER - Set
The identifying name of this Conductor. This will be the Host.SHORT_HOSTNAME of the system. If the host system process ID can be obtained it is appended to the hostname after a colon (':') delimiter.
NOTIFY_PARAMETER - Get
A list of email address to be sent a Notify message if Conductor stops because the STOP_ON_FAILURE_PARAMETER limit has been reached or an exception has been thrown. If this parameter is not present or has an empty value no notification message will be sent.
REQUIRE_STAGE_MANAGER_PARAMETER - Get and Set
A flag that determines if a connection to a Stage_Manager is required for this Conductor. N.B.: This parameter is only read once when the Conductor is first configured; when the Conductor is reconfigured on being restarted after a Stop or Halt state the initial value is reset in the internal Configuration regardless of any change to the external configuration source. Default: Require_Stage_Manager.
STAGE_MANAGER_PORT_PARAMETER - Get and Set
The port number to use for connecting to a Stage_Manager. Default: the default port for Theater Management.
STAGE_MANAGER_TIMEOUT_PARAMETER - Get and Set
The amount of time (seconds) to wait for a Stage_Manager connection to complete before the connection attempt fails.
STAGE_MANAGER_PASSWORD_PARAMETER - Get
The password required to authenticate a Stage_Manager connection. If this parameter is present its value is masked out in the internal Configuration. If this parameter is not present, or its value is empty, and the Stage_Manager requires an authenticated connection the connection attempt will fail.
HELLO_PORT_PARAMETER_NAME - Get and Set
The port number to use when listening for a Stage_Manager "Hello" broadcast that it is ready for connections.
HELLO_ADDRESS_PARAMETER_NAME - Get and Set
The multicast address to use when listening for a Stage_Manager "Hello" broadcast that it is ready for connections.
SOURCE_SUCCESS_COUNT - Set
The number of source records that have been successfully processed since this Conductor was started.
SOURCE_FAILURE_COUNT - Set
The number of source records that have resulted in a failed procedure condition during processing.
TOTAL_FAILURE_COUNT - Set
The total number of source records that failed to be processed for any reason.

Parameters:
configuration - The Configuration to be used. If null the DEFAULT_CONFIGURATION_FILENAME will be tried. If that fails the filename will be qualified relative to the location of this class in the CLASSPATH.
Returns:
The Configuration that was used. This will be the same as the specified configuation if it was not null; otherwise it will be the default configuration that was loaded.
Throws:
Configuration_Exception - If the specified configuration is null and a default configuration source could not be found and loaded; or there was a problem setting a configuration value.

Postconfigure

protected void Postconfigure(Configuration configuration)
                      throws Configuration_Exception
Update the configuration and application control values after the Database and Reference_Resolver have been constructred.

The following CONDUCTOR_GROUP parameters of the Configuration are affected:

DATABASE_SERVER_NAME_PARAMETER - Set
The name of the Group of parameters that provided the database server access information used to establishs a connection.
DATABASE_TYPE_PARAMETER - Set
The Type of Data_Port class used to access the database server.
DATABASE_HOSTNAME_PARAMETER - Set
The hostname where the database server is located.
CATALOG_PARAMETER - Get and Set
The name of the database catalog containing the pipeline tables. This value is only obtained once from the initial Configuration, and then it is obtained from the CONDUCTOR_GROUP only if it could not be obtained as part of the user specified pipeline name nor from the database access parameters of the DATABASE_SERVER_NAME_PARAMETER Group. This is a required value. If it can not be obtained a Configuration_Exception will be thrown. It is always set in the specified configuration.
PIPELINE_PARAMETER - Set
The name of the pipeline to be processed. This required parameter is user specified.
PROCEDURES_TABLE_PARAMETER - Set
The name of the procedures definitions table in the database.
SOURCES_TABLE_PARAMETER - Set
The name of the sources records table in the database.
RECONNECT_TRIES_PARAMETER - Get and Set
The maximum number of database server reconnect tries to do if the database connection is lost.
UNRESOLVED_REFERENCE_PARAMETER - Get and Set
The default value to be used by the Reference_Resolver if a reference can not be resolved. A parameter value of UNRESOLVED_REFERENCE_THROWS (case insensitive) means that the Reference_Resolver will throw an exception if a reference can not be resolved. If the Configuration does not contain this parameter the RESOLVER_DEFAULT_VALUE will be used; a null default value means that the Reference_Resolver will throw an exception if a reference can not be resolved.
LOG_PATHNAME_PARAMETER or LOG_DIRECTORY_PARAMETER - Get
The directory where log files will be written if a source record does not specify a log directory. The default log directory is the current working directory.
EMPTY_SUCCESS_ANY_PARAMETER - Get and Set
A flag that determines whether or not empty Success_Status and Success_Message fields in a procedure definition may imply any completion of the procedure is a success. If this parameter is not present the DEFAULT_EMPTY_SUCCESS_ANY value will be used.
MIN_SOURCE_RECORDS_PARAMETER - Get and Set
The minimum number of unprocessed source records to be obtained from the database when the Conductor cache is refreshed. When Contuctor is operating in batch mode (the polling interval is zero) this value specifies the minimum number of source records to be processed before processing will stop. The minimum for this value is 1.
MAX_SOURCE_RECORDS_PARAMETER - Get and Set
The maximum number of unprocessed source records to be obtained from the database when the Conductor cache is refreshed. This value is used to prevent excessive memory consumption when a large number of unprocessed source records are available. The minimum for this value is limited to Min_Source_Records.
SOURCE_AVAILABLE_TRIES_PARAMETER - Get and Set
The maximum number of times to try and confirm that the Source_Pathname is available before the source record is processed. If this parameter is not present the SOURCE_AVAILABLE_TRIES_DEFAULT will be used. The value is limited to be no more than SOURCE_AVAILABLE_TRIES_MAX. If the value is negative it will be set to (@link #SOURCE_AVAILABLE_NO_CHECK} and source availability confirmation will be disabled. N.B. Source availability confirmation ensures that procedures will not fail due to a missing source file; disabling this confirmation would be appropriate for a pipeline that does not process a Source_Pathname, or for which the Source_Pathname is not a filename but some other information used by a procedure.
POLL_INTERVAL_PARAMETER - Get and Set
The polling interval, in seconds, for unprocessed source records when the no unprocessed source records are obtained. A value of zero means that Conductor is to stop if when no unprocessed source records are available. If this parameter is not present the DEFAULT_POLL_INTERVAL will be used.
STOP_ON_FAILURE_PARAMETER - Get and Set
The number of sequential source record processing failures at which Conductor is to stop processing. A value less than or equal to zero means that processing is never to stop due to sequential failures. This parameter may have a boolean value which, if true, is equivalent to the number one; if false sequential failures will never cause processing to stop. If this parameter is not present the DEFAULT_STOP_ON_FAILURE value will be used.

Parameters:
configuration - The Configuration to be used. If null nothing is done.
Throws:
Configuration_Exception - If there was a problem setting a configuration value.

Config_Value

protected String Config_Value(String name)
Get a String parameter value from the configuration.

The parameter is sought in the CONDUCTOR_GROUP, but defaults are acceptable.

If a value is found it is resolved for embedded references. An unresolved reference that would throw a Database_Exception results in a null value. A syntax error (ParseException) leaves the value unchanged.

Parameters:
name - The name of the Assignment parameter for which to get a value. If not an absolute pathname an absolute pathname for the name in the CONDUCTOR_GROUP is used.
Returns:
The first value String for the named parameter. This will be null if no parameter by that name exists, or it contained an unresolvable reference.

Config_Value

protected boolean Config_Value(String name,
                               Object value)
                        throws Configuration_Exception
Set a parameter in the configuration.

The parameter is set in the CONDUCTOR_GROUP.

Parameters:
name - The name of the Assignment parameter to have its value set. If not an absolute pathname an absolute pathname for the name in the CONDUCTOR_GROUP is used.
value - An Object to use for the parameter's value. : If null, the parameter will have no value; it will be a Token.
Returns:
true if an existing parameter by the same name was replaced; false if the parameter is being set for the first time.
Throws:
Configuration_Exception - If there was a problem setting the parameter.
See Also:
Configuration.Set(String, Object)

Config_Pathname

public static String Config_Pathname(String name)
Get an absolute Conductor Configuration pathname.

If the specified name is not an absolute pathname the CONDUCTOR_GROUP is used as the root for to make an absolute pathname from the name.

Parameters:
name - The parameter name to be made absolute.
Returns:
An absolute Configuration pathname String.

Configuration

public Configuration Configuration()
Get the Conductor Configuration.

N.B.: All "password" (case insensitive) parameters will have their values masked out.

Specified by:
Configuration in interface Management
Returns:
A Configuration containing a copy of the current Conductor Configuration.
See Also:
Preconfigure(Configuration), Postconfigure(Configuration)

Connect_to_Database

protected Database_Exception Connect_to_Database()
Establish the Database connection.

The connection will be retried up to the number of times specified by the RECONNECT_TRIES_PARAMETER with the number of seconds delay between retries specified by the RECONNECT_DELAY_PARAMETER. Connection retries stop when the connection to the database is successful, the number of retries is exhausted, or an exception occurs that is not associated with a connection failure.

Returns:
The last Database_Exception that occured. This will be null if the connection was successful.

Resolve

protected String Resolve(String reference)
                  throws Database_Exception,
                         ParseException,
                         Unresolved_Reference
Resolve a reference

If the reference can not be resolved because the Database has become disconnected the Database is connected and the reference resolution is tried again.

Parameters:
reference - The reference String to be resolved.
Returns:
The resolved reference String.
Throws:
Database_Exception - If there was a problem accessing the Database.
ParseException - If the reference contained a mathematical expression that could not be correctly parsed.
Unresolved_Reference - If the reference could not be resolved and a non-null default value had not been assigned.
See Also:
Reference_Resolver

Resolver_Default_Value

public Management Resolver_Default_Value(String value)
Set the Conductor default Reference_Resolver value.

Normally when the Conductor Reference_Resolver is unable to resolve a reference it will throw an exception and enter the halted processing state, or exit if it is not connected to a Stage_Manager at the time. If, however, the Reference_Resolver default value is set to a non-null String that value will be used for the unresolved_reference instead of throwing an exception.

If the specified value is different from the current Reference_Resolver default value the Configuration UNRESOLVED_REFERENCE_PARAMETER is set to this value (or UNRESOLVED_REFERENCE_THROWS if the value is null) and processing event notification with the updated Configuration is sent to all listeners.

Specified by:
Resolver_Default_Value in interface Management
Parameters:
value - The default Reference_Resolver String value. If this starts with UNRESOLVED_REFERENCE_THROWS (case insensitive) null will be used.
Returns:
This Conductor Management object.
See Also:
Reference_Resolver

Resolver_Default_Value

public String Resolver_Default_Value()
Get the Conductor default Reference_Resolver value.

Specified by:
Resolver_Default_Value in interface Management
Returns:
The default Reference_Resolver String value.
See Also:
Resolver_Default_Value(String)

Procedures

public Vector<Vector<String>> Procedures()
Get the procedures table.

The table is expected to be delivered with the record fields in PROCEDURES_FIELD_NAMES order, and with the records in processing order.

N.B.: The entire contents of the database procedures table is delivered.

Specified by:
Procedures in interface Management
Returns:
A table Vector of record Vectors of field Strings. The first record contains the field names.

Get_Procedures_Table

protected Vector<Vector<String>> Get_Procedures_Table()
                                               throws Database_Exception
Get the Procedures_Table from the Database.

The entire Procedures_Table is obtained from the Database.

If a () disconnected Database_Exception is thrown an attempt is made to reconnect to the Database.

Returns:
A Vector containing the procedures table. Each entry is a Vector of fields with the first entry being the field names for the records that follow.
Throws:
Database_Exception - If the the procedures table could not be obtained.

Load_Procedure_Records

protected void Load_Procedure_Records()
                               throws Database_Exception,
                                      Configuration_Exception
Load the Procedure_Records table.

The entire Procedures table is cached. The first, field names, record is removed and used to construct a Field_Map of required field names/indexes to the available field names.

The database table is first loaded into a temporary records list. A temporary Fields_Map is constructed from the first, field names, record which is removed from the table. The map is used to confirm that all the required fields are present. Then the fields of each record are mapped to their expected order. All the records are then sorted on the SEQUENCE_FIELD using a String_Vector_Comparator to do the comparisons.

If the tentative new table content is different than the current Procedure_Records table the latter is set to the former, and the Procedures_Map set to the new map, while synchronized to prevent the Management from obtaining it while it is in an inconsistent state. Then the Configuration is updated: the TOTAL_PROCEDURE_RECORDS_PARAMETER is set to the number of table records (not including the field names record, which was removed to contrstuct the Fields_Map), the PROCEDURE_COUNT_PARAMETER is reset to zero, the PROCEDURE_SEQUENCE_PARAMETER is reset to the empty string, and the PROCEDURE_COMPLETION_NUMBER_PARAMETER is reset to PROCEDURE_SUCCESS. Finally a processing event report is sent with procedures changed set.

N.B.: The Procedure_Records and the Procedures_Map are not changed, nor a processing event reported, unless a valid table with different contents from the current table is loaded.

Throws:
Database_Exception - If the procedures table could not be obtained from the Database, it was empty, or any required PROCEDURES_FIELD_NAMES are missing.
Configuration_Exception - If the Configuration could not be updated with the total number of procedures records.
See Also:
Fields_Map

Sources

public Vector<Vector<String>> Sources()
Get the current cache of source records.

The table will be delivered with the record fields in SOURCES_FIELD_NAMES order, and with the records sorted by increasing SEQUENCE_FIELD order.

N.B.: Only the contents of the Conductor source records cache is delivered. The contents of the database sources table may be much, much larger.

Specified by:
Sources in interface Management
Returns:
A table Vector of record Vectors of field Strings. The first record contains the field names.

Load_Source_Records

protected boolean Load_Source_Records()
                               throws Database_Exception
Load the Sources_Records table.

If the source records cache is not empty nothing is done.

Records in the Sources table for which the Processing_Host field is NULL are cached. The maximum size of the cache is determined by the Max_Source_Records value, which can not be less than the Min_Source_Records value.

The first, field names, record from the Database is removed and used to construct the Sources_Map Field_Map of required field names/indexes to the available field names.

Returns:
true if additional source records were loaded; false if no unprocessed source records are available.
Throws:
Database_Exception - If the sources table records could not be obtained from the Database or any required SOURCES_FIELD_NAMES are missing.
See Also:
Fields_Map

Processing_State

public int Processing_State()
Get the current Conductor processing state.

The possible processing state values are:

RUNNING
Source records are being processing.
POLLING
No unprocessed source records are currently available for processing; the Conductor is polling for source records to process.
RUN_TO_WAIT
When processing of the current source record completes Conductor will go into the waiting state.
WAITING
The Conductor is waiting to be told to being processing.
HALTED
A problem condition caused the Conductor to halt processing. The problem may be the result of the maximum number of sequential failures of source record processing having occured, a database access failure, or some other system error.

The WAITING and HALTED state codes are negative; all others are positive.

Specified by:
Processing_State in interface Management
Returns:
A Conductor processing state code.

State

public Processing_Changes State()
Get the current Conductor processing conditions state.

All Processing_Changes state variables will be set to the current values from this Conductor, except the flag variables - Processing_Changes.Sources_Refreshed(boolean), Processing_Changes.Procedures_Changed(boolean) and Processing_Changes.Exiting(boolean) - will always be false.

Specified by:
State in interface Management
Returns:
A Processing_Changes object containing the values of the current Conductor processing state variables.
See Also:
Processing_Changes

Start

public void Start()
Start pipeline processing.

If a pipeline processing thread is not running one is started. Otherwise, if the RUN_TO_WAIT state is in effect it is reset to the RUNNING state.

Pipeline Processing:

The Conductor reconfigures itself by re-reading its Configuration source in case it was changed while processing was waiting. N.B.: The poll interval and stop-on-failure limit are not reset if they were set to a non-negative value.

The table of procedures is refreshed in case it was changed while processing was waiting.

Source procesing will continue indefinately unless: the polling interval for newly available sources is zero (i.e. batch mode) and at least Min_Source_Records have been processed in the current batch; processing of the current source has completed and further processing has been flagged to stop; the number of sequential procedure failures has reached its (@link #Stop_on_Failure() limit}, or an unrecoverable exception occurred during processing, in which case the processing exception will be non-null. The possible processing exception types are a Configuration_Exception, Database_Exception or IOException; any other exception is due to a programming error (any exception is caught and saved for possible subsequent retrieval);

The current contents of the source records cache is processed first. An unprocessed source record is acquired from the cache and processed. Source record acquisition is always done in the order in which source records occur in the list of records obtained from the database; there is no priority ordering. Only after the cache is emtpy and the polling interval sleep time has passed will it be refreshed. If there is no polling interval the cache will not be refreshed and processing will stop. The polling interval may be interrupted by stopping and restarting processing.

Specified by:
Start in interface Management
See Also:
Start()

Poll_Interval

public Management Poll_Interval(int seconds)
Set the time interval to poll for unprocessed source records.

The new value will override the value set when the Conductor is reconfigured after being stopped. However, if the value is negative then the value from the reconfiguration will be used. In this case a zero value will still be set for the current poll interval; this will cause the Conductor to stop if it is currently polling or when no unprocessed source records can be obtained from the database.

Specified by:
Poll_Interval in interface Management
Parameters:
seconds - The number of seconds between querying the database for unprocessed source records. If negative zero will be used.
Returns:
This Management interface.
See Also:
Poll_Interval(int)

Poll_Interval

public int Poll_Interval()
Get the interval at which the Conductor will poll for unprocessed source records.

Specified by:
Poll_Interval in interface Management
Returns:
The interval, in seconds, at which the Conductor will poll for unprocessed source records. If zero, polling has been disabled.
See Also:
Poll_Interval(int)

Stop_on_Failure

public Management Stop_on_Failure(int failure_count)
Set the sequential failure limit at which to halt processing source records.

The new value will override the value set when the Conductor is reconfigured after being stopped. However, if the value is negative then the value from the reconfiguration will be used. In this case the current value will not be changed.

Specified by:
Stop_on_Failure in interface Management
Parameters:
failure_count - The number of sequential failures at which to halt processing. Zero means never halt. If negative the current value is not changed.
Returns:
This Management interface.
See Also:
Management.Sequential_Failures(), Stop_on_Failure(int)

Stop_on_Failure

public int Stop_on_Failure()
Get the number of Conductor sequential source processing failures at which to stop processing.

Specified by:
Stop_on_Failure in interface Management
Returns:
The number of Conductor sequential source processing failures at which to stop processing. If zero sequential processing failures will never cause processing to stop.
See Also:
Stop_on_Failure(int)

Sequential_Failures

public int Sequential_Failures()
Get the count of sequential source processing failures that the Conductor has accumulated.

N.B.: A processing event notification is sent to all listeners

Specified by:
Sequential_Failures in interface Management
Returns:
The count of sequential source processing failures that the Conductor has accumulated.
See Also:
Stop_on_Failure(int), Reset_Sequential_Failures()

Reset_Sequential_Failures

public Management Reset_Sequential_Failures()
Reset the count of sequential source processing failures that the Conductor has accumulated.

If the processing state is HALTED it is reset to WAITING.

If the count of sequential source processing failures and/or the processing state was changed a processing event notification with these changes is sent to all listeners.

Specified by:
Reset_Sequential_Failures in interface Management
Returns:
This Conductor Management interface..
See Also:
Management.Stop_on_Failure(int), Reset_Sequential_Failures()

Processing_Exception

public Exception Processing_Exception()
Get the exception that caused Conductor processing to halt.

N.B.: When Conductor processing is started the previous processing exception is cleared.

Specified by:
Processing_Exception in interface Management
Returns:
The Exception that caused processing to halt. This will be null if processing did not halt as the result of an exception, or the current processing state is not halted.

Stop

public void Stop()
Stop pipeline processing after the current source processing has completed.

If the Conductor is in a positive processing state it will enter the run-to-wait state in which will enter the waiting state when the current source record completes processing. If the Conductor is in the polling state it will immediately stop polling for new source records. There will be no effect for any negative state.

Specified by:
Stop in interface Management
See Also:
Stop()

Quit

public void Quit()
Immediately stop processing and exit.

Any open log file is closed. The database server is disconnected. An exiting processing event is sent to all processing listeners. The application exits with a success status.

N.B.: If source processing is running it is aborted.

Specified by:
Quit in interface Management
See Also:
Quit()

Add_Processing_Listener

public Management Add_Processing_Listener(Processing_Listener listener)
Register a processing state change listener.

The Conductor sends its processing event notifications to all registered listeners.

Specified by:
Add_Processing_Listener in interface Management
Parameters:
listener - A Processing_Listener.
Returns:
This Conductor Management object.
See Also:
Processing_Listener

Remove_Processing_Listener

public boolean Remove_Processing_Listener(Processing_Listener listener)
Unregister a processing state change listener.

Specified by:
Remove_Processing_Listener in interface Management
Parameters:
listener - The Processing_Listener to be removed from the Management list of registered listeners.
Returns:
true If the listener was registered and is now removed; false if it was not registered.
See Also:
Add_Processing_Listener(Processing_Listener)

Parse_Command_Line

public static String[] Parse_Command_Line(String command_line)
Parse a String into command line arguments.

The command line arguments are delimited by any combination of space (' '), tab ('\t'), new-line ('\n') or carriage return ('\r') characters. However, character sequences in quotes - either single ('\'') or double ('"') quote characters - remain unbroken.

After parsing each argument String is also scanned to convert escaped characters - preceded by a backslash ('\') - into their unescaped character equivalents.

Parameters:
command_line - A String to be parsed.
Returns:
An array of argument Strings.

Status_Indicators

public static Vector<String> Status_Indicators(String status_field)
Parse a Source table Status field value into a Vector of procedure status indicator Strings.

A Source table Status field value contains a comma delimited list of procedure status indicators. Each has the form:

<PID> | <Conductor code>[(<procedure exit status>)]

The PID is the positive, non-zero value of the system's process ID for a running procedure. This value should only be present if the procedure is currently executing.

When the procedure processing has completed the PID is replaced with a Conductor procedure completion code. This will be zero if Conductor determined that the procedure completed successfully. It will be one (1) if the procedure completed unsuccessfully. It will be a negative value if the procedure could not be run to completion for any reason; the code value in this case indicates the reason.

When the procedure has been run to completion, whether successful or not, its exit status value is appended inside parentheses to the Conductor procedure completion code.

Parameters:
status_field - The String from a Source table Status field value (may be null).
Returns:
A Vector of procedure status indicator Strings.

Status_Conductor_Code

public static int Status_Conductor_Code(String status)
                                 throws NumberFormatException
Get the Conductor procedure completion status code value from a procedure status indicator String.

Parameters:
status - A procedure status indicator String.
Returns:
The Conductor procedure completion status code value.
Throws:
NumberFormatException - if a value could not be formed.
See Also:
Status_Indicators(String)

Status_Conductor_Code_Description

public static String Status_Conductor_Code_Description(int code)
Get a description String for a Conductor procedure completion code.

If the code value is not a recognized value, the description will be "Unknown procedure completion code ()."

Parameters:
code - The code value.
Returns:
A String describing the meaning of the code value.
See Also:
Status_Conductor_Code(String)

Status_Procedure_Exit_Value

public static int Status_Procedure_Exit_Value(String status)
                                       throws NumberFormatException
Get the procedure exit status value from a procedure status indicator String.

Parameters:
status - A procedure status indicator String.
Returns:
The procedure exit status value.
Throws:
NumberFormatException - if a value could not be formed.
See Also:
Status_Indicators(String)

Status_Field_Value

public static String Status_Field_Value(Vector<String> status)
Assemble a properly formatted String for a Source table Status field value.

This method is the converse of the Status_Indicators method.

Parameters:
status - A Vector of procedure status indicator Strings.
Returns:
A String suitably formatted for a Source table Status field value.

Status_Indicator

public static String Status_Indicator(int conductor_status,
                                      int procedure_status)
Assemble a properly formatted procedure status indicator String as used in a Sources table Status field value.

Parameters:
conductor_status - A conductor procedure completion code.
procedure_status - A procedure exit status value.
Returns:
A String suitably formatted for a Source table Status field value.

Status_Indicator

public static String Status_Indicator(int conductor_status)
Assemble a properly formatted procedure status indicator String as used in a Sources table Status field value.

Parameters:
conductor_status - A conductor procedure completion code.
Returns:
A String suitably formatted for a Source table Status field value.

Add_Log_Writer

public Management Add_Log_Writer(Writer writer)
Register a Writer to receive processing log stream output.

The Conductor writes its processing log reports, including the output from all pipeline procedures it runs, to all registered log Writers.

Specified by:
Add_Log_Writer in interface Management
Parameters:
writer - A Writer object.
Returns:
This Conductor Management object.
See Also:
Enable_Log_Writer(Writer, boolean), Remove_Log_Writer(Writer)

Remove_Log_Writer

public boolean Remove_Log_Writer(Writer writer)
Unregister a log Writer.

Specified by:
Remove_Log_Writer in interface Management
Parameters:
writer - A Writer object.
Returns:
true If the writer was registered and is now removed; false if it was not registered.
See Also:
Add_Log_Writer(Writer)

Enable_Log_Writer

public Management Enable_Log_Writer(Writer writer,
                                    boolean enable)
Enable or disable output to a registered log stream Writer.

Specified by:
Enable_Log_Writer in interface Management
Parameters:
writer - A Writer that has been registered to receive Conductor log stream output. If the writer is not registered to receive the Conductor log stream nothing is done.
enable - If false, Conductor log stream output to the Writer is suspended without having to unregister the Writer. If true, a Writer that has had its log stream output suspended will begin receiving it again.
Returns:
This Conductor Management object.

Log_Message

protected void Log_Message(String message,
                           AttributeSet style)
                    throws IOException
Logs a message to the Logger.

Parameters:
message - The message String to write to the Logger.
style - An AttributeSet style to apply to the message. This may be null to use the default text style.
Throws:
IOException - if the Log_File_Writer could not be written. If a Writer other than the Log_File_Writer throws an exception it is closed and removed from the Logger.

Log_Message

protected void Log_Message(String message)
                    throws IOException
Logs a message to the Logger.

The default text style is used.

Parameters:
message - The message String to write to the Logger.
Throws:
IOException - if the Log_File_Writer could not be written.
See Also:
Log_Message(String, AttributeSet)

Connected_to_Stage_Manager

public boolean Connected_to_Stage_Manager()
Test if the Conductor is connected to a Stage_Manager.

Specified by:
Connected_to_Stage_Manager in interface Management
Returns:
true if the Conductor is connected to a Stage_Manager via an open Local_Theater; false otherwise.
See Also:
Local_Theater

Identity

public Message Identity()
Get the identity description Message for this Conductor.

The identity Message contains the following parameters:

Message.ACTION_PARAMETER_NAME
Indicates an identity Message with the Message.IDENTITY_ACTION value.
Message.NAME_PARAMETER_NAME
The identity name is CONDUCTOR_GROUP.
HOSTNAME_PARAMETER
The Host.FULL_HOSTNAME of the host system.
CONDUCTOR_ID_PARAMETER
The Conductor ID is the Host.SHORT_HOSTNAME followed by a colon (':') and the system process ID of this Conductor. If the process ID can not be obtained only the short hostname is included.
CATALOG_PARAMETER
The name of the Database catalog where the pipeline tables are located.
PIPELINE_PARAMETER
The name of the Conductor pipeline being managed.
CONFIGURATION_SOURCE_PARAMETER
The source of the Configuration that is being used.
Message.CLASS_ID_PARAMETER_NAME
The ID of this Conductor class.
DATABASE_SERVER_PARAMETER
Provides the name of the Configuration parameter group that contains the database access parameters.
A parameter group named the same as the value of the DATABASE_SERVER_PARAMETER. This group contains the following parameters:

Database.TYPE
The value is taken from the database Configuration parameter of the same name.
Configuration.HOST
The value is taken from the database Configuration parameter of the same name. However, if the value is "localhost" then the Host.FULL_HOSTNAME is used instead.
Configuration.USER
The value is taken from the database Configuration parameter of the same name.

Specified by:
Identity in interface Management
Returns:
A Message containing the idenity description parameters for this Conductor.
See Also:
Identity()

main

public static void main(String[] args)
Instantiate a Conductor application.

Parameters:
args - The Conductor command line arguments.

Usage

public static void Usage()
Prints the command line usage syntax.

Usage: Conductor <Switches>
  Switches -
    [-Pipeline] <pipeline>
    [-Configuration <source>]
    [-Database|-Server <server name>]
    [-CAtalog <catalog>]
    [-Monitor]
    [-Wait-to-start]
    [-Version]
    [-Help]

The -Pipeline switch specifies which pipeline the Conductor is to manage. A pipeline must be specified. A command line argument without a preceding switch name is assumed to be the pipeline name. The pipeline name has the form:

[<catalog>.]<pipeline>

where <pipeline> is the name of the pipeline and is used as the prefix of the Sources and Procedures table names:

<pipeline>_Sources
<pipeline>_Procedures

The -Configuration option is used to specify the filename, or URL (http or ftp), where the configuration parameters are to be found. If this option is not specified the "Conductor.conf" filename will be used. If the configuration file is not in the current working directory, it will be looked for in the user's home directory.

The configuration file must contain the necessary database access information needed to identify and connect with the database server (as required by the Database constructor and its Connect method). The database "Type" parameter must be provided that specifies the type of database server (e.g. "MySQL") that will be accessed. Additional database access parameters typically provided are the server "Host" name and database "User" and "Password" access values. Depending on the type of database server and the driver and its Data_Port implementation (e.g. MySQL_Data_Port) there may be other required and optional parameters that can be included, such as a "Port" parameter to specify the database host system's network port to use for server communications.

In addition to the "Catalog" parameter (described below), the "Log_Directory" parameter may optionally be located in the "/Conductor" group. The "Log_Directory" parameter specifies the directory where log files will be written. The value of this parameter may contain embedded references for the database Reference_Resolver. Other parameters that will be used if present in the Conductor configuration group are described in the Conductor control parameters section of the Conductor class description.

The -Server or -Database option may be used to specify the Group of database server access parameters in the configuration file. A configuration file may contain more than one Group of database server access parameters where the name of each such Group must be in the Server parameter list. By default the first name in the Server list is the Group of database access parameters that will be used.

The -CAtalog option may be used to specify the name of the database catalog containing the pipeline tables. However, it is ambiguous command line syntax to use this option and include a different <catalog> in the pipeline name.

The <catalog> name may alternatively be specified by a "Catalog" parameter in the configuration file. This parameter is first sought in the "/Conductor" parameters Group, then in the database server parameters Group or any parent of that group. It is necessary that a <catalog> name be found somewhere.

The -Manage or -Monitor option may be used to run Conductor with a Manager GUI. In this mode Conductor will not proceed to process the pipeline immediately. Instead, the Manager will enable processing to be started, stopped, aborted, and the pipeline processing continuously monitored. By default the Conductor is run without a Manager.

The -Wwait-to-start option will cause the Conductor to remain in the wait state until it receives a message to start source record processing. If the Conductor is run with a Manager -wait-to-start is implicit. Remote management can be provided via a Stage_Manager and a Kapellmeister client. By default the Conductor will not wait-to-start unless it is run with a Manager.

The -Version option will cause the Conductor version identification to be listed without running the Conductor.

The -Help option will list the brief command line syntax usage and then exit.

N.B.: This method always results in a System.exit with the EXIT_COMMAND_LINE_SYNTAX status value.


PIRL

Copyright (C) \ 2003-2009 Bradford Castalia, University of Arizona