|
![]() |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectPIRL.Strings.Words
public class Words
The Words class provides a mechanism to treat a String as a sequence of delimited words.
String_Buffer
Nested Class Summary | |
---|---|
class |
Words.Word_Index
A Word_Index provides a start,end string location for a word. |
Field Summary | |
---|---|
static String |
DEFAULT_DELIMITERS
The default word delimiters; |
static String |
DEFAULT_MASK
The default word mask . |
static boolean |
DELIMIT_AT_QUOTE
Whether or not to delimit words at quotes. |
int |
End_Index
The index (exclusive) in the current string where the current word ends. |
static String |
ID
|
Words.Word_Index |
Mark_Index
The Word_Index the last marked word. |
static boolean |
PARENTHESIZED_WORDS
Whether or not to treat parenthesized strings as a word. |
static boolean |
QUOTED_WORDS
Whether or not to treat quoted strings as a word. |
int |
Start_Index
The index in the current string where the next word starts. |
Constructor Summary | |
---|---|
Words()
Constructs Words with no characters. |
|
Words(String characters)
Constructs Words from a String of characters. |
Method Summary | |
---|---|
Words |
Characters(String characters)
Sets the String of characters. |
boolean |
Delimit_at_Quote()
Test if quotes will delimit words. |
Words |
Delimit_at_Quote(boolean enable)
Enable or disable delimiting words at quotes. |
String |
Delimiters()
Gets the current delimiters. |
Words |
Delimiters(String delimiters)
Sets the word delimiter characters. |
Words |
Location(int location)
Moves the word indices to a new location. |
Words |
Mark()
Marks the current word location. |
String |
Mask()
Gets the word mask. |
Words |
Mask(String mask)
Sets the mask to use when words are masked. |
Words |
Mask(Vector<String> names)
Words preceeded by any one of a set of names are masked. |
Words.Word_Index |
Next_Location()
Moves the word indices to the location of the next word. |
String |
Next_Word()
Gets the next word. |
boolean |
Parenthesized_Words()
Test if parenthesized strings are treated as single words. |
Words |
Parenthesized_Words(boolean enable)
Enable or disable the treatment of parenthesized strings as words. |
boolean |
Quoted_Words()
Test if quoted strings are treated as single words. |
Words |
Quoted_Words(boolean enable)
Enable or disable the treatment of quoted strings as words. |
Words |
Restore()
Restores the current word to the last marked location. |
Vector<String> |
Split()
Splits the remaining characters into words. |
Vector<String> |
Split(int limit)
Splits the remaining characters into words. |
String |
Substring(int start)
Gets a substring of the characters. |
String |
Substring(int start,
int end)
Gets a substring of the characters. |
String |
Substring(Words.Word_Index word_index)
Gets a substring of the characters. |
String |
toString()
Gets the Words characters. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String ID
public int Start_Index
Initially, this is the beginning (0) of the words string. This will be -1 when there are no more words.
public int End_Index
This will be zero if no Next_Word
has been
selected. This will be -1 when there are no more words.
public Words.Word_Index Mark_Index
marked
word.
public static final String DEFAULT_DELIMITERS
The usual whitespace characters: " \n\r\t".
public static final String DEFAULT_MASK
mask
.
public static boolean QUOTED_WORDS
public static boolean DELIMIT_AT_QUOTE
public static boolean PARENTHESIZED_WORDS
Constructor Detail |
---|
public Words(String characters)
characters
- The String of characters containing words.public Words()
Characters(String)
Method Detail |
---|
public String toString()
toString
in class Object
public Words Characters(String characters)
The current location is reset to the beginning of the string.
The Mark_Index
is reset to (0,0).
characters
- The String of characters.
Location(int)
public Words Delimiters(String delimiters)
A word is delimited by a contiguous sequence of characters that are all members of the delimiters characters. Note that a sequence of more than one the same or different characters from the delimiters set does not result in empty words; i.e. any continguous sequence of one or more delimiters is treated as a single word delimiter.
N.B.: Any character starting a special sequence should not be included as one of the delimiter characters. If they are then special sequence recognition will be effectively disabled.
delimiters
- The String of delimiter characters. If null,
the DEFAULT_DELIMITERS
will be used.
Quoted_Words(boolean)
,
Parenthesized_Words(boolean)
public String Delimiters()
public Words Quoted_Words(boolean enable)
The enclosing quote characters are included in the word. If there is
no matching unescaped quote character before the end of the string,
the resulting word will not have the matching closing quote character
at End_Index
- 1.
enable
- true if all characters within unescaped quotes
(' or ") are to be treated as a single word; false otherwise.
Delimit_at_Quote(boolean)
,
Next_Location()
public boolean Quoted_Words()
Quoted_Words(boolean)
public Words Delimit_at_Quote(boolean enable)
When unescaped quote characters are encountered they may delimit a
word even if no delimiter character
preceeds or follows the quote. Disabling quote delimiting causes
contiguous non-delimiter characters to be included as part of the
quoted string word. The quotes remain in the word in either case.
enable
- true if unescaped quote characters delimit a word;
false otherwise.
Quoted_Words(boolean)
public boolean Delimit_at_Quote()
Delimit_at_Quote(boolean)
public Words Parenthesized_Words(boolean enable)
N.B.: Nested parenthesized strings are included in a parenthesized string.
The enclosing parentheses characters are included in the word. If
there is no matching unescaped closing parenthesis character
(ignoring nested parentheses) before the end of the string, the
resulting word will not have one at End_Index
- 1.
enable
- true if all characters within unescaped parenthesized
('(' and ')') strings are to be treated as a single word; false
otherwise.
Next_Location()
public boolean Parenthesized_Words()
Parenthesized_Words(boolean)
public String Substring(int start, int end)
start
- The start index of the substring.end
- The end index of the substring.
StringBuffer.substring(int, int)
public String Substring(int start)
start
- The start index of the substring.
StringBuffer.substring(int)
public String Substring(Words.Word_Index word_index)
word_index
- A Word_Index for the substring.
StringBuffer.substring(int, int)
public Words Location(int location) throws IndexOutOfBoundsException
The location must be within the words string.
location
- An index in the words string.
IndexOutOfBoundsException
- If the location is not
within the words string.public Words Mark()
The current Start_Index
and End_Index
are stored
in the Mark_Index
.
Restore()
public Words Restore()
StringIndexOutOfBoundsException
- If the Word_Index.Start_Index
is less than zero or the Word_Index.End_Index is greater than the
number of characters available.Mark()
public Words.Word_Index Next_Location()
Beginning at the current End_Index
all delimiter characters
are skipped to find the new
Start_Index
. If the end of the characters string is reached
without finding a non-delimiter character then there are no more
words available. In this case both the Start_Index and End_Index will
be equal to the character string length and nothing more will be
done.
N.B.: Any character starting a special sequence should not be included as one of the delimiter characters. If they are then special sequence recognition will be effectively disabled.
The character at the Start_Index is checked to see if it starts a
special sequence. If quoted words
is
enabled either a single (') or double (") quote character will be
recognized and set as the end of sequence marker character. If parenthesized words
is enabled an
opening parenthesis ('(') character will be recognized and the end of
sequence marker character will be set to the closing parenthesis
(')') character. A special sequence start character is included as
part of the word.
When delimit at quote
is enabled
in addition to quoted words being enabled quoted strings are
delimited as separate words even if a contiguous non-delimiter
character preceeds and/or follows the enclosing quotes. When delimit
at quote is disabled the contiguous non-delimiter characters are
treated as part of the word that includes the quoted string.
The word contains all characters up to and including an unescaped end of sequence marker character. For a parenthesized sequence the marker character must be at parenthesis level zero to end the sequence; unescaped nested parentheses increase the parenthesis level. Note that a special sequence may include what would otherwise be considered delimiter characters, and the enclosing characters - quotes or parentheses - are included as part of the word. If the end of the characters string is reached before the expected marker character is found the resulting word will be "unbalanced"; the character at End_Index - 1 will not be the marker character.
If no end of sequence marker character has been set, then the word will end when any unescaped delimiter character is found or the end of the characters string is reached. If quoted words are enabled a quote character will be recognized as a delimiter character. If parenthesized words are enabled an opening parenthesis character will be recognized as a delimiter character. The index of the delimiter character becomes the new End_Index; it is not included as part of the word.
Any character preceded by a backslash ('\') character is escaped from any special treatment. All escaped characters are taken to be part of the word, the backslash character included.
public String Next_Word()
The Start_Index will be moved forward from the current End_Index over any Delimiters. Then the End_Index will be moved forward from the Start_Index until any Delimiters are found or the end of the string is reached.
Next_Location()
public Vector<String> Split(int limit)
Beginning with the next word
, words are
collected into a Vector in the order they occur in the string.
If the limit is 0 all available words will be returned; no delimiters will be included in any word that is returned. If the limit is positive (> 0) no more than limit words will be returned; the last "word" will contain all characters, including any delimiters, following the start of the last word (delimiters preceeding the last word will not be included). A negative limit acts the same as a positive limit except the last "word" will contain all characters following the end of the previous word (delimiters preceeding the last word will be included). Note that a limit of -1 will return all characters from the current End_Index to the end of the the characters string.
Less than limit words my be returned. No empty words will be returned.
limit
- The word limit to return.
Next_Location()
public Vector<String> Split()
Beginning with the next word
, words are
collected into a Vector in the order they occur in the string.
Split(int)
public Words Mask(String mask)
mask
- The mask String. This may be null.
Mask(Vector)
public String Mask()
Mask(Vector)
public Words Mask(Vector<String> names)
The words are searched for matches with the names. When a match is found, the following word is replaced with the mask String. If the mask String is null the preceeding name as well its word is deleted.
N.B.: The mask string may be one of the names. The mask substitution is never compared against the names list.
names
- A Vector of names to find.
Mask(String)
,
Next_Word()
|
![]() |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |