The JavaTM Web Services Tutorial
Home
TOC
PREV TOP NEXT

Generating XML from an Arbitrary Data Structure

In this section, you'll use an XSLT transformer to converting an arbitrary data structure to XML.

In general outline, then, you're going to:

  1. Modify an existing program that reads the data and modify it to generate SAX events. (Whether that is a real parser or simply a data filter of some kind is irrelevant for the moment.)
  2. You'll then use the SAX "parser" to construct a SAXSource for the transformation.
  3. You'll use the same StreamResult object you created in the last exercise, so you can see the results. (But note that you could just as easily create a DOMResult object to create a DOM in memory.)
  4. You'll wire the source to the result, using the XSLT transformer object to make the conversion.

For starters, you need a data set you want to convert and some program which is capable of reading the data. In the next two sections, you'll create a simple data file and a program that reads it.

Creating a Simple File

We'll start by creating a data set for an address book. You can duplicate the process, if you like, or simply make use of the data stored in PersonalAddressBook.ldif.

The file shown below was produced by creating a new address book in Netscape messenger, giving it some dummy data (one address card) and then exporting it in LDIF format. Figure 1 shows the address book entry that was created.

Figure 1 Address Book Entry

Exporting the address book produces a file like the one shown below. The parts of the file that we care about are shown in bold.

dn: cn=Fred Flinstone,mail=fred@barneys.house	
modifytimestamp: 20010409210816Z	
cn: Fred Flinstone	
xmozillanickname: Fred	
mail: Fred@barneys.house	
xmozillausehtmlmail: TRUE	
givenname: Fred	
sn: Flinstone	
telephonenumber: 999-Quarry	
homephone: 999-BedrockLane	
facsimiletelephonenumber: 888-Squawk	
pagerphone: 777-pager	
cellphone: 555-cell	
xmozillaanyphone: 999-Quarry	
objectclass: top	
objectclass: person
 

Note that each line of the file contains a variable name, a colon, and a space followed by a value for the variable. The "sn" variable contains the person's surname (last name) and, for some reason, the variable "cn" contains the DisplayName field from the address book entry.


Note: LDIF stands for LDAP Data Interchange Format, according to the Netscape pages. And LDAP, turn, stands for Lightweight Directory Access Protocol. I prefer to think of LDIF as the "Line Delimited Interchange Format", since that is pretty much what it is.

Creating a Simple Parser

The next step is to create a program that parses the data. Again, you can follow the process to write your own if you like, or simply make a copy of the program so you can use it to do the XSLT-related exercises that follow.


Note: The code discussed in this section is in AddressBookReader01.java. The output is in AddressBookReaderLog01.

The text for the program is shown below. It's an absurdly simple program that doesn't even loop for multiple entries because, after all, it's just a demo!

import java.io.*;
 
public class AddressBookReader01 	
{  	
	
   public static void main(String argv[])	
   {	
      // Check the arguments	
      if (argv.length != 1) {	
         System.err.println (	
            "Usage: java AddressBookReader filename");	
         System.exit (1);	
      }	
      String filename = argv[0];	
      File f = new File(filename);	
      AddressBookReader01 reader = new AddressBookReader01();	
      reader.parse(f);	
   }	
	
   /** Parse the input */	
   public void parse(File f) 	
   {	
      try {	
         // Get an efficient reader for the file	
         FileReader r = new FileReader(f);	
         BufferedReader br = new BufferedReader(r);	
	
          // Read the file and display it's contents.	
         String line = br.readLine();	
         while (null != (line = br.readLine())) {	
            if (line.startsWith("xmozillanickname: "))	
               break;	
         }	
         output("nickname", "xmozillanickname", line);	
         line = br.readLine();	
         output("email",    "mail",             line);	
         line = br.readLine();	
         output("html",     "xmozillausehtmlmail", line);	
         line = br.readLine();	
         output("firstname","givenname",        line);	
         line = br.readLine();	
         output("lastname", "sn",               line);	
         line = br.readLine();	
         output("work",     "telephonenumber",  line);	
         line = br.readLine();	
         output("home",     "homephone",        line);	
         line = br.readLine();	
         output("fax",      "facsimiletelephonenumber",	
            line);	
         line = br.readLine();	
         output("pager",    "pagerphone",       line);	
         line = br.readLine();	
         output("cell",     "cellphone",        line);	
      	
      }	
      catch (Exception e) {	
         e.printStackTrace();	
      }	
   } 	
	
   void output(String name, String prefix, String line) 	
   {	
      int startIndex = prefix.length() + 2;  	
      // 2=length of ": "	
      String text = line.substring(startIndex);	
      System.out.println(name + ": " + text); 	
   } 	
}
 

This program contains 3 methods:

main

The main method gets the name of the file from the command line, creates an instance of the parser, and sets it to work parsing the file. This method will be going away when we convert the program into a SAX parser. (That's one reason for putting the parsing code into a separate method.)

parse

This method operates on the File object sent to it by the main routine. As you can see, its about as simple as it can get! The only nod to efficiency is the use of a BufferedReader, which can become important when you start operating on large files.

output

The output method contains the smarts about the structure of a line. Starting from the right It takes 3 arguments. The first argument gives the method a name to display, so we can output "html" as a variable name, instead of "xmozillausehtmlmail". The second argument gives the variable name stored in the file (xmozillausehtmlmail). The third argument gives the line containing the data. The routine then strips off the variable name from the start of the line and outputs the desired name, plus the data.

Running this program on the address book file produces this output:

nickname: Fred	
email: Fred@barneys.house	
html: TRUE	
firstname: Fred	
lastname: Flintstone	
work: 999-Quarry	
home: 999-BedrockLane	
fax: 888-Squawk	
pager: 777-pager	
cell: 555-cell
 

I think we can all agree that's a bit more readable!

Modifying the Parser to Generate SAX Events

The next step is to modify the parser to generate SAX events, so you can use it as the basis for a SAXSource object in an XSLT transform.


Note: The code discussed in this section is in AddressBookReader02.java.

Start by extending importing the additional classes you're going to need:

import java.io.*;
 
import org.xml.sax.*;	
Import org.xml.sax.helpers.AttributesImpl;
 

Next, modify the application so that it extends XmlReader. That converts the app into a parser that generates the appropriate SAX events.

public class AddressBookReader02 	
   implements XMLReader	
{  
 

Now, remove the main method. You won't be needing that any more.

public static void main(String argv[])	
{	
   // Check the arguments	
   if (argv.length != 1) {	
      System.err.println ("Usage: Java AddressBookReader 
filename");	
      System.exit (1);	
   }	
   String filename = argv[0];	
   File f = new File(filename);	
   AddressBookReader02 reader = new AddressBookReader02();	
   reader.parse(f);	
}
 

Add some global variables that will come in handy in a few minutes:

ContentHandler handler;
 
// We're not doing namespaces, and we have no	
// attributes on our elements. 	
String nsu = "";  // NamespaceURI	
Attributes atts = new AttributesImpl();	
String rootElement = "addressbook";
 
String indent = "   "; // for readability!
 

The SAX ContentHandler is the thing that is going to get the SAX events the parser generates. To make the app into an XmlReader, you'll be defining a setContentHandler method. The handler variable will hold the result of that configuration step.

And, when the parser generates SAX element events, it will need to supply namespace and attribute information. Since this is a simple application, you're defining null values for both of those.

You're also defining a root element for the data structure (addressbook), and setting up an indent string to improve the readability of the output.

Next, modify the parse method so that it takes an InputSource as an argument, rather than a File, and account for the exceptions it can generate:

public void parse(File f)InputSource input) 	
throws IOException, SAXException 
 

Now make the changes shown below to get the reader encapsulated by the InputSource object:

try {	
   // Get an efficient reader for the file	
   FileReader r = new FileReader(f);	
   java.io.Reader r = input.getCharacterStream();	
   BufferedReader Br = new BufferedReader(r);
 

Note: In the next section, you'll create the input source object and what you put in it will, in fact, be a buffered reader. But the AddressBookReader could be used by someone else, somewhere down the line. This step makes sure that the processing will be efficient, regardless of the reader you are given.

The next step is to modify the parse method to generate SAX events for the start of the document and the root element. Add the code highlighted below to do that:

/** Parse the input */	
public void parse(InputSource input) 	
...	
{	
   try {	
      ...	
      // Read the file and display it's contents.	
      String line = br.readLine();	
      while (null != (line = br.readLine())) {	
         if (line.startsWith("xmozillanickname: ")) break;	
      }	
       if (handler==null) {	
         throw new SAXException("No content handler");	
      }	
      handler.startDocument(); 	
      handler.startElement(nsu, rootElement, 	
         rootElement, atts);      	
      output("nickname", "xmozillanickname", line);	
      ...	
      output("cell",     "cellphone",        line);	
      handler.ignorableWhitespace(".toCharArray(), 	
                  0, // start index	
                  1  // length	
                  ); 	
      handler.endElement(nsu, rootElement, rootElement);	
      handler.endDocument(); 	
   }	
   catch (Exception e) {	
   ...
 

Here, you first checked to make sure that the parser was properly configured with a ContentHandler. (For this app, we don't care about anything else.) You then generated the events for the start of the document and the root element, and finished by sending the end-event for the root element and the end-event for the document.

A couple of items are noteworthy, at this point:

Now that SAX events are being generated for the document and the root element, the next step is to modify the output method to generate the appropriate element events for each data item. Make the changes shown below to do that:

void output(String name, String prefix, String line) 	
throws SAXException 	
{	
   int startIndex = prefix.length() + 2; // 2=length of ": "	
   String text = line.substring(startIndex);	
   System.out.println(name + ": " + text);	
	
   int textLength = line.length() - startIndex;	
   handler.ignorableWhitespace(indent.toCharArray(), 	
                  0, // start index	
                  indent.length() 	
                  );	
   handler.startElement(nsu, name, name /*"qName"*/, atts);	
   handler.characters(line.toCharArray(), 	
               startIndex,	
               textLength);	
   handler.endElement(nsu, name, name);	
}
 

Since the ContentHandler methods can send SAXExceptions back to the parser, the parser has to be prepared to deal with them. In this case, we don't expect any, so we'll simply allow the app to fall on its sword and die if any occur.

You then calculate the length of the data, and once again generate some ignorable whitespace for readability. In this case, there is only one level of data, so we can use a fixed indent string. (If the data were more structured, we would have to calculate how much space to indent, depending on the nesting of the data.)


Note: The indent string makes no difference to the data, but will make the output a lot easier to read. Once everything is working, try generating the result without that string! All of the elements will wind up concatenated end to end, like this: <addressbook><nickname>Fred</nickname><email>...

Next, add the method that configures the parser with the ContentHandler that is to receive the events it generates:

/** Allow an application to register a content event handler. */	
public void setContentHandler(ContentHandler handler) {	
   this.handler = handler;	
} 
 
/** Return the current content handler. */	
public ContentHandler getContentHandler() {	
   return this.handler;	
} 
 

There are several more methods that must be implemented in order to satisfy the XmlReader interface. For the purpose of this exercise, we'll generate null methods for all of them. For a production application, though, you may want to consider implementing the error handler methods to produce a more robust app. For now, though, add the code highlighted below to generate null methods for them:

/** Allow an application to register an error event handler. */	
public void setErrorHandler(ErrorHandler handler)	
{ }
 
/** Return the current error handler. */	
public ErrorHandler getErrorHandler()	
{ return null; }
 

Finally, add the code highlighted below to generate null methods for the remainder of the XmlReader interface. (Most of them are of value to a real SAX parser, but have little bearing on a data-conversion application like this one.)

/** Parse an XML document from a system identifier (URI). */	
public void parse(String systemId)	
throws IOException, SAXException 	
{ }
 
 /** Return the current DTD handler. */	
public DTDHandler getDTDHandler()	
{ return null; }
 
/** Return the current entity resolver. */	
public EntityResolver getEntityResolver()	
{ return null; }
 
/** Allow an application to register an entity resolver. */	
public void setEntityResolver(EntityResolver resolver)	
{ }
 
/** Allow an application to register a DTD event handler. */	
public void setDTDHandler(DTDHandler handler)	
{ }
 
/** Look up the value of a property. */	
public Object getProperty(java.lang.String name)	
{ return null; }
 
/** Set the value of a property. */	
public void setProperty(java.lang.String name, java.lang.Object 
value)	
{ } 
 
/** Set the state of a feature. */	
public void setFeature(java.lang.String name, boolean value)	
{ }
 
/** Look up the value of a feature. */	
public boolean getFeature(java.lang.String name)	
{ return false; }  
 

Congratulations! You now have a parser you can use to generate SAX events. In the next section, you'll use it to construct a SAX source object that will let you transform the data into XML.

Using the Parser as a SAXSource

Given a SAX parser to use as an event source, you can (quite easily!) construct a transformer to produce a result. In this section, you'll modify the TransformerApp you've been working with to produce a stream output result, although you could just as easily produce a DOM result.


Note: The code discussed in this section is in TransformationApp04.java. The results of running it are in TransformationLog04.

Important!

Be sure to shift gears! Put the AddressBookReader aside and open up the TransformationApp. The work you do in this section affects the TransformationApp!

Start by making the changes shown below to import the classes you'll need to construct a SAXSource object. (You won't be needing the DOM classes at this point, so they are discarded here, although leaving them in doesn't do any harm.)

import org.xml.sax.SAXException; 	
import org.xml.sax.SAXParseException; 	
import org.xml.sax.ContentHandler;	
import org.xml.sax.InputSource;	
import org.w3c.dom.Document;	
import org.w3c.dom.DOMException;	
...	
import javax.xml.transform.dom.DOMSource; 	
import javax.xml.transform.sax.SAXSource; 	
import javax.xml.transform.stream.StreamResult; 
 

Next, remove a few other holdovers from our DOM-processing days, and add the code to create an instance of the AddressBookReader:

public class TransformationApp 	
{	
   // Global value so it can be ref'd by the tree-adapter	
   static Document document; 
 
    public static void main(String argv[])	
   {	
      ...	
      DocumentBuilderFactory factory =	
         DocumentBuilderFactory.newInstance();	
      //factory.setNamespaceAware(true);	
      //factory.setValidating(true); 	
	
      // Create the sax "parser".	
      AddressBookReader saxReader = new AddressBookReader();	
	
       try {	
         File f = new File(argv[0]);	
         DocumentBuilder builder = 
factory.newDocumentBuilder();	
         document = builder.parse(f);
 

Guess what! You're almost done. Just a couple of steps to go. Add the code highlighted below to construct a SAXSource object:

// Use a Transformer for output	
...	
Transformer transformer = tFactory.newTransformer();	
	
// Use the parser as a SAX source for input	
FileReader fr = new FileReader(f);	
BufferedReader br = new BufferedReader(fr);	
InputSource inputSource = new InputSource(br);	
SAXSource source = new SAXSource(saxReader, inputSource);	
	
StreamResult result = new StreamResult(System.out);	
transformer.transform(source, result);
 

Here, you constructed a buffered reader (as mentioned earlier) and encapsulated it in an input source object. You then created a SAXSource object, passing it the reader and the InputSource object, and passed that to the transformer.

When the app runs, the transformer will configure itself as the ContentHandler for the SAX parser (the AddressBookReader and tell the parser to operate on the inputSource object. Events generated by the parser will then go to the transformer, which will do the appropriate thing and pass the data on to the result object.

Finally, remove the exceptions you no longer need to worry about, since the TransformationApp no longer generates them:

} catch (SAXException sxe) {	
   // Error generated by this application	
   // (or a parser-initialization error)	
   Exception  x = sxe;	
   if (sxe.getException() != null)	
      x = sxe.getException();	
   x.printStackTrace();	
	
} catch (ParserConfigurationException pce) {	
   // Parser with specified options can't be built	
   pce.printStackTrace();	
	
} catch (IOException ioe) {
 

You're done! You have no created a transformer which will use a SAXSource as input, and produce a StreamResult as output.

Doing the Conversion

Now run the app on the address book file. Your output should look like this:

<?xml version="1.0" encoding="UTF-8"?>	
<addressbook>	
   <nickname>Fred</nickname>	
   <email>fred@barneys.house</email>	
   <html>TRUE</html>	
   <firstname>Fred</firstname>	
   <lastname>Flintstone</lastname>	
   <work>999-Quarry</work>	
   <home>999-BedrockLane</home>	
   <fax>888-Squawk</fax>	
   <pager>777-pager</pager>	
   <cell>555-cell</cell>	
</addressbook>
 

You have now successfully converted an existing data structure to XML. And it wasn't even that hard. Congratulations!

Home
TOC
PREV TOP NEXT