The JavaTM Web Services Tutorial
Home
TOC
PREV TOP NEXT

Handling Lexical Events

You saw earlier that if you are writing text out as XML, you need to know if you are in a CDATA section. If you are, then angle brackets (<) and ampersands (&) should be output unchanged. But if you're not in a CDATA section, they should be replaced by the predefined entities &lt; and &amp;. But how do you know if you're processing a CDATA section?

Then again, if you are filtering XML in some way, you would want to pass comments along. Normally the parser ignores comments. How can you get comments so that you can echo them?

Finally, there are the parsed entity definitions. If an XML-filtering app sees &myEntity; it needs to echo the same string--not the text that is inserted in its place. How do you go about doing that?

This section of the tutorial answers those questions. It shows you how to use org.xml.sax.ext.LexicalHandler to identify comments, CDATA sections, and references to parsed entities.

Comments, CDATA tags, and references to parsed entities constitute lexical information--that is, information that concerns the text of the XML itself, rather than the XML's information content. Most applications, of course, are concerned only with the content of an XML document. Such apps will not use the LexicalEventListener API. But apps that output XML text will find it invaluable.


Note: Lexical event handling is a optional parser feature. Parser implementations are not required to support it. (The reference implementation does so.) This discussion assumes that the parser you are using does so, as well.

How the LexicalHandler Works

To be informed when the SAX parser sees lexical information, you configure the XmlReader that underlies the parser with a LexicalHandler. The LexicalHandler interface defines these even-handling methods:

comment(String comment)

Passes comments to the application.

startCDATA(), endCDATA()

Tells when a CDATA section is starting and ending, which tells your application what kind of characters to expect the next time characters() is called.

startEntity(String name), endEntity(String name)

Gives the name of a parsed entity.

startDTD(String name, String publicId, String systemId), endDTD()

Tells when a DTD is being processed, and identifies it.

Working with a LexicalHandler

In the remainder of this section, you'll convert the Echo app into a lexical handler and play with its features.


Note: The code shown in this section is in Echo11.java. The output is shown in Echo11-09.

To start, add the code highlighted below to implement the LexicalHandler interface and add the appropriate methods.

import org.xml.sax.ext.LexicalHandler;	
	
public class Echo extends HandlerBase	
   implements LexicalHandler	
{ 	
   public static void main(String argv[])	
      {	
         ...	
         // Use an instance of ourselves as the SAX event 
handler	
         DefaultHandler handler = new Echo11();	
         Echo handler = new Echo();	
         ...
 

At this point, the Echo class extends one class and implements an additional interface. You changed the class of the handler variable accordingly, so you can use the same instance as either a DefaultHandler or a LexicalHandler, as appropriate.

Next, add the code highlighted below to get the XMLReader that the parser delegates to, and configure it to send lexical events to your lexical handler:

public static void main(String argv[])	
{	
   ...	
   try {	
      ...	
      // Parse the input	
      SAXParser saxParser = factory.newSAXParser();	
      XMLReader xmlReader = saxParser.getXMLReader();	
      xmlReader.setProperty(	
         "http://xml.org/sax/properties/lexical-handler",	
         handler	
         ); 	
      saxParser.parse( new File(argv[0]), handler);	
   } catch (SAXParseException spe) {	
      ...
 

Here, you configured the XMLReader using the setProperty() method defined in the XMLReader class. The property name, defined as part of the SAX standard, is the URL, http://xml.org/sax/properties/lexical-handler.

Finally, add the code highlighted below to define the appropriate methods that implement the interface.

public void processingInstruction(String target, String data)	
   ...	
}
 
public void comment(char[] ch, int start, int length)throws 
SAXException
 
{	
}
 
public void startCDATA()	
throws SAXException	
{	
}	
	
public void endCDATA()	
throws SAXException	
{	
}
 
public void startEntity(String name)	
throws SAXException	
{	
}
 
public void endEntity(String name)	
throws SAXException	
{	
}
 
public void startDTD(String name, String publicId, String 
systemId)	
throws SAXException	
{ 	
} 
 
public void endDTD()	
throws SAXException	
{ 	
}
 
private void emit(String s)	
   ...
 

You have now turned the Echo class into a lexical handler. In the next section, you'll start experimenting with lexical events.

Echoing Comments

The next step is to do something with one of the new methods. Add the code highlighted below to echo comments in the XML file:

public void comment(char[] ch, int start, int length)	
   throws SAXException	
{	
   String text = new String(ch, start, length);	
   nl(); emit("COMMENT: "+text);	
}
 

When you compile the Echo program and run it on your XML file, the result looks something like this:

COMMENT:   A SAMPLE set of slides 	
COMMENT:  FOR WALLY / WALLIES 	
COMMENT: 	
   DTD for a simple "slide show".
 
COMMENT:  Defines the %inline; declaration 	
COMMENT:  ...
 

The line endings in the comments are passed as part of the comment string, once again normalized to newlines (). You can also see that comments in the DTD are echoed along with comments from the file. (That can pose problems when you want to echo only comments that are in the data file. To get around that problem, you can use the startDTD and endDTD methods.)

Echoing Other Lexical Information

To finish up this section, you'll exercise the remaining LexicalHandler methods.


Note: The code shown in this section is in Echo12.java. The file it operates on is slideSample10.xml. (The browsable version is slideSample10-xml.html.) The results of processing are in Echo12-10.

Make the changes highlighted below to remove the comment echo (you don't need that any more) and echo the other events:

public void comment(char[] ch, int start, int length)	
throws SAXException	
{	
   String text = new String(ch, start, length);	
   nl(); emit("COMMENT: "+text);	
}
 
public void startCDATA()	
throws SAXException	
{	
   nl(); emit("START CDATA SECTION");	
}
 
public void endCDATA()	
throws SAXException	
{	
   nl(); emit("END CDATA SECTION");	
}
 
public void startEntity(String name)	
throws SAXException	
{	
   nl(); emit("START ENTITY: "+name);	
}
 
public void endEntity(String name)	
throws SAXException	
{	
   nl(); emit("END ENTITY: "+name);	
}
 
public void startDTD(String name, String publicId, String 
systemId)	
throws SAXException	
{ 	
   nl(); emit("START DTD: "+name	
      +"          publicId=" + publicId	
      +"          systemId=" + systemId); 	
}
 
public void endDTD()	
throws SAXException	
{ 	
   nl(); emit("END DTD"); 	
}
 

Here is what you see when the DTD is processed:

START DTD: slideshow	
         publicId=null	
         systemId=file:/..../samples/slideshow3.dtd	
END DTD
 

Note: To see events that occur while the DTD is being processed, use org.xml.sax.ext.DeclHandler.

Here is what happens when the internally defined products entity is processed with the latest version of the program:

ELEMENT: <slide-title>
 
CHARS:   Wake up to 	
START ENTITY: products	
CHARS:   WonderWidgets	
END ENTITY: products, INCLUDED=true	
CHARS:   !	
END_ELM: </slide-title> 
 

And here is the result of processing the external copyright entity:

   START ENTITY: copyright	
   CHARS: 	
This is the standard copyright message ...	
   END ENTITY: copyright
 

Finally, you get output like this for the CDATA section:

START CDATA SECTION
 
CHARS:   Diagram:
 
         frobmorten <------------ fuznaten
 
   |            <3>        ^
 
   | <1>                   |   <1> = fozzle
 
   V                       |   <2> = framboze    
 
staten --------------------+   <3> = frenzle
 
            <2>
 
END CDATA SECTION
 

In summary, the LexicalHandler gives you the event-notifications you need to produce an accurate reflection of the original XML text.

Home
TOC
PREV TOP NEXT