Words (PIRL Java Packages)

Field Summary
`static String`	`DEFAULT_DELIMITERS` The default word delimiters;
`static String`	`DEFAULT_MASK` The default word `mask`.
`static boolean`	`DELIMIT_AT_QUOTE` Whether or not to delimit words at quotes.
`int`	`End_Index` The index (exclusive) in the current string where the current word ends.
`static String`	`ID`
`Words.Word_Index`	`Mark_Index` The Word_Index the last `marked` word.
`static boolean`	`PARENTHESIZED_WORDS` Whether or not to treat parenthesized strings as a word.
`static boolean`	`QUOTED_WORDS` Whether or not to treat quoted strings as a word.
`int`	`Start_Index` The index in the current string where the next word starts.

Constructor Summary
`Words()` Constructs Words with no characters.
`Words(String characters)` Constructs Words from a String of characters.

Method Summary
`Words`	`Characters(String characters)` Sets the String of characters.
`boolean`	`Delimit_at_Quote()` Test if quotes will delimit words.
`Words`	`Delimit_at_Quote(boolean enable)` Enable or disable delimiting words at quotes.
`String`	`Delimiters()` Gets the current delimiters.
`Words`	`Delimiters(String delimiters)` Sets the word delimiter characters.
`Words`	`Location(int location)` Moves the word indices to a new location.
`Words`	`Mark()` Marks the current word location.
`String`	`Mask()` Gets the word mask.
`Words`	`Mask(String mask)` Sets the mask to use when words are masked.
`Words`	`Mask(Vector<String> names)` Words preceeded by any one of a set of names are masked.
`Words.Word_Index`	`Next_Location()` Moves the word indices to the location of the next word.
`String`	`Next_Word()` Gets the next word.
`boolean`	`Parenthesized_Words()` Test if parenthesized strings are treated as single words.
`Words`	`Parenthesized_Words(boolean enable)` Enable or disable the treatment of parenthesized strings as words.
`boolean`	`Quoted_Words()` Test if quoted strings are treated as single words.
`Words`	`Quoted_Words(boolean enable)` Enable or disable the treatment of quoted strings as words.
`Words`	`Restore()` Restores the current word to the last marked location.
`Vector<String>`	`Split()` Splits the remaining characters into words.
`Vector<String>`	`Split(int limit)` Splits the remaining characters into words.
`String`	`Substring(int start)` Gets a substring of the characters.
`String`	`Substring(int start, int end)` Gets a substring of the characters.
`String`	`Substring(Words.Word_Index word_index)` Gets a substring of the characters.
`String`	`toString()` Gets the Words characters.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Field Detail

ID

public static final String ID

See Also:: Constant Field Values

Start_Index

public int Start_Index

The index in the current string where the next word starts.

Initially, this is the beginning (0) of the words string. This will be -1 when there are no more words.

End_Index

public int End_Index

The index (exclusive) in the current string where the current word ends.

This will be zero if no Next_Word has been selected. This will be -1 when there are no more words.

Mark_Index

public Words.Word_Index Mark_Index

The Word_Index the last marked word.

DEFAULT_DELIMITERS

public static final String DEFAULT_DELIMITERS

The default word delimiters;

The usual whitespace characters: " \n\r\t".

See Also:: Constant Field Values

DEFAULT_MASK

public static final String DEFAULT_MASK

The default word mask.

See Also:: Constant Field Values

QUOTED_WORDS

public static boolean QUOTED_WORDS

Whether or not to treat quoted strings as a word.

DELIMIT_AT_QUOTE

public static boolean DELIMIT_AT_QUOTE

Whether or not to delimit words at quotes.

PARENTHESIZED_WORDS

public static boolean PARENTHESIZED_WORDS

Whether or not to treat parenthesized strings as a word.

Constructor Detail

Words

public Words(String characters)

Constructs Words from a String of characters.

Parameters:: characters - The String of characters containing words.

Words

public Words()

Constructs Words with no characters.

See Also:: Characters(String)

Method Detail

toString

public String toString()

Gets the Words characters.

Overrides:: toString in class Object

Returns:: The current String of characters.

Characters

public Words Characters(String characters)

Sets the String of characters.

The current location is reset to the beginning of the string. The Mark_Index is reset to (0,0).

Parameters:: characters - The String of characters.
Returns:: This Words object.
See Also:: Location(int)

Delimiters

public Words Delimiters(String delimiters)

Sets the word delimiter characters.

A word is delimited by a contiguous sequence of characters that are all members of the delimiters characters. Note that a sequence of more than one the same or different characters from the delimiters set does not result in empty words; i.e. any continguous sequence of one or more delimiters is treated as a single word delimiter.

N.B.: Any character starting a special sequence should not be included as one of the delimiter characters. If they are then special sequence recognition will be effectively disabled.

Parameters:: delimiters - The String of delimiter characters. If null, the DEFAULT_DELIMITERS will be used.
Returns:: This Words object.
See Also:: Quoted_Words(boolean), Parenthesized_Words(boolean)

Delimiters

public String Delimiters()

Gets the current delimiters.

Returns:: The String of delimiter characters.

Quoted_Words

public Words Quoted_Words(boolean enable)

Enable or disable the treatment of quoted strings as words.

The enclosing quote characters are included in the word. If there is no matching unescaped quote character before the end of the string, the resulting word will not have the matching closing quote character at End_Index - 1.

Parameters:: enable - true if all characters within unescaped quotes (' or ") are to be treated as a single word; false otherwise.
Returns:: This Words object.
See Also:: Delimit_at_Quote(boolean), Next_Location()

Quoted_Words

public boolean Quoted_Words()

Test if quoted strings are treated as single words.

Returns:: true if all characters within unescaped quotes (' or ") will be treated as a single word; false otherwise.
See Also:: Quoted_Words(boolean)

Delimit_at_Quote

public Words Delimit_at_Quote(boolean enable)

Enable or disable delimiting words at quotes.

When unescaped quote characters are encountered they may delimit a word even if no delimiter character preceeds or follows the quote. Disabling quote delimiting causes contiguous non-delimiter characters to be included as part of the quoted string word. The quotes remain in the word in either case.

Parameters:: enable - true if unescaped quote characters delimit a word; false otherwise.
Returns:: This Words object.
See Also:: Quoted_Words(boolean)

Delimit_at_Quote

public boolean Delimit_at_Quote()

Test if quotes will delimit words.

Returns:: true if unescaped quote characters delimit a word; false otherwise.
See Also:: Delimit_at_Quote(boolean)

Parenthesized_Words

public Words Parenthesized_Words(boolean enable)

Enable or disable the treatment of parenthesized strings as words.

N.B.: Nested parenthesized strings are included in a parenthesized string.

The enclosing parentheses characters are included in the word. If there is no matching unescaped closing parenthesis character (ignoring nested parentheses) before the end of the string, the resulting word will not have one at End_Index - 1.

Parameters:: enable - true if all characters within unescaped parenthesized ('(' and ')') strings are to be treated as a single word; false otherwise.
Returns:: This Words object.
See Also:: Next_Location()

Parenthesized_Words

public boolean Parenthesized_Words()

Test if parenthesized strings are treated as single words.

Returns:: true if all characters within unescaped parenthesized strings will be treated as a single word; false otherwise.
See Also:: Parenthesized_Words(boolean)

Substring

public String Substring(int start,
                        int end)

Gets a substring of the characters.

Parameters:: start - The start index of the substring.; end - The end index of the substring.
Returns:: The substring from the start index up to, but not including, the end index.
See Also:: StringBuffer.substring(int, int)

Substring

public String Substring(int start)

Gets a substring of the characters.

Parameters:: start - The start index of the substring.
Returns:: The substring from the start index to the end of the characters string.
See Also:: StringBuffer.substring(int)

Substring

public String Substring(Words.Word_Index word_index)

Gets a substring of the characters.

Parameters:: word_index - A Word_Index for the substring.
Returns:: The substring from the word_index.Start_Index up to, but not including, the word_index.End_Index.
See Also:: StringBuffer.substring(int, int)

Location

public Words Location(int location)
               throws IndexOutOfBoundsException

Moves the word indices to a new location.

The location must be within the words string.

Parameters:: location - An index in the words string.
Returns:: This Words object.
Throws:: IndexOutOfBoundsException - If the location is not within the words string.

Mark

public Words Mark()

Marks the current word location.

The current Start_Index and End_Index are stored in the Mark_Index.

Returns:: This Words object.
See Also:: Restore()

Restore

public Words Restore()

Restores the current word to the last marked location.

Returns:: This Words object.
Throws:: StringIndexOutOfBoundsException - If the Word_Index.Start_Index is less than zero or the Word_Index.End_Index is greater than the number of characters available.
See Also:: Mark()

Next_Location

public Words.Word_Index Next_Location()

Moves the word indices to the location of the next word.

Beginning at the current End_Index all delimiter characters are skipped to find the new Start_Index. If the end of the characters string is reached without finding a non-delimiter character then there are no more words available. In this case both the Start_Index and End_Index will be equal to the character string length and nothing more will be done.

N.B.: Any character starting a special sequence should not be included as one of the delimiter characters. If they are then special sequence recognition will be effectively disabled.

The character at the Start_Index is checked to see if it starts a special sequence. If quoted words is enabled either a single (') or double (") quote character will be recognized and set as the end of sequence marker character. If parenthesized words is enabled an opening parenthesis ('(') character will be recognized and the end of sequence marker character will be set to the closing parenthesis (')') character. A special sequence start character is included as part of the word.

When delimit at quote is enabled in addition to quoted words being enabled quoted strings are delimited as separate words even if a contiguous non-delimiter character preceeds and/or follows the enclosing quotes. When delimit at quote is disabled the contiguous non-delimiter characters are treated as part of the word that includes the quoted string.

The word contains all characters up to and including an unescaped end of sequence marker character. For a parenthesized sequence the marker character must be at parenthesis level zero to end the sequence; unescaped nested parentheses increase the parenthesis level. Note that a special sequence may include what would otherwise be considered delimiter characters, and the enclosing characters - quotes or parentheses - are included as part of the word. If the end of the characters string is reached before the expected marker character is found the resulting word will be "unbalanced"; the character at End_Index - 1 will not be the marker character.

If no end of sequence marker character has been set, then the word will end when any unescaped delimiter character is found or the end of the characters string is reached. If quoted words are enabled a quote character will be recognized as a delimiter character. If parenthesized words are enabled an opening parenthesis character will be recognized as a delimiter character. The index of the delimiter character becomes the new End_Index; it is not included as part of the word.

Any character preceded by a backslash ('\') character is escaped from any special treatment. All escaped characters are taken to be part of the word, the backslash character included.

Returns:: A Word_Index for the next word. If there are no more words the word index will be set to the end of the characters.

Next_Word

public String Next_Word()

Gets the next word.

The Start_Index will be moved forward from the current End_Index over any Delimiters. Then the End_Index will be moved forward from the Start_Index until any Delimiters are found or the end of the string is reached.

Returns:: The substring from the next Start_Index up to, but not including, the next End_Index. If there are no more words, the empty String will be returned.
See Also:: Next_Location()

Split

public Vector<String> Split(int limit)

Splits the remaining characters into words.

Beginning with the next word, words are collected into a Vector in the order they occur in the string.

If the limit is 0 all available words will be returned; no delimiters will be included in any word that is returned. If the limit is positive (> 0) no more than limit words will be returned; the last "word" will contain all characters, including any delimiters, following the start of the last word (delimiters preceeding the last word will not be included). A negative limit acts the same as a positive limit except the last "word" will contain all characters following the end of the previous word (delimiters preceeding the last word will be included). Note that a limit of -1 will return all characters from the current End_Index to the end of the the characters string.

Less than limit words my be returned. No empty words will be returned.

Parameters:: limit - The word limit to return.
Returns:: A Vector of zero or more words.
See Also:: Next_Location()

Split

public Vector<String> Split()

Splits the remaining characters into words.

Beginning with the next word, words are collected into a Vector in the order they occur in the string.

Returns:: A Vector of zero or more words.
See Also:: Split(int)

Mask

public Words Mask(String mask)

Sets the mask to use when words are masked.

Parameters:: mask - The mask String. This may be null.
Returns:: This Words object.
See Also:: Mask(Vector)

Mask

public String Mask()

Gets the word mask.

Returns:: The String to me used when masking words.
See Also:: Mask(Vector)

Mask

public Words Mask(Vector<String> names)

Words preceeded by any one of a set of names are masked.

The words are searched for matches with the names. When a match is found, the following word is replaced with the mask String. If the mask String is null the preceeding name as well its word is deleted.

N.B.: The mask string may be one of the names. The mask substitution is never compared against the names list.

Parameters:: names - A Vector of names to find.
Returns:: This Words object.
See Also:: Mask(String), Next_Word()

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

PIRL.Strings Class Words

ID

Start_Index

End_Index

Mark_Index

DEFAULT_DELIMITERS

DEFAULT_MASK

QUOTED_WORDS

DELIMIT_AT_QUOTE

PARENTHESIZED_WORDS

Words

Words

toString

Characters

Delimiters

Delimiters

Quoted_Words

Quoted_Words

Delimit_at_Quote

Delimit_at_Quote

Parenthesized_Words

Parenthesized_Words

Substring

Substring

Substring

Location

Mark

Restore

Next_Location

Next_Word

Split

Split

Mask

Mask

Mask

PIRL.Strings
Class Words