Home TOC |
![]() ![]() ![]() |
Transforming XML Data with XSLT
The XML Stylesheet Language for Transformations (XSLT) can be used for many purposes. For example, you could generate PDF or postscript from the XML data. But generally, XSLT is used to generated formatted HTML output, or to create an alternative XML representation of the data.
In this section of the tutorial, you'll use an XSLT transform to translate XML input data to HTML output.
Note: The XSLT specification is very large and quite complex. Rather thick books have been written on the subject. So this tutorial can only scratch the surface. It will give you enough a background to get started, so you can undertake simple XSLT processing tasks. It should also give you a head start when you investigate XSLT further.
Defining an Ultra-Simple article Document Type
We'll start by defining a super simple document type that could be used for writing articles. Our
<article>
documents will contain these structure tags:
<TITLE>
-- The title of the article.<SECT>
-- A section. (Consists of a heading and a body.)<PARA>
-- A paragraph.<LIST>
-- A list.<ITEM>
-- An entry in a list.<NOTE>
-- An aside, which will be offset from the main text.The slightly unusual aspect of this structure is that we won't create a separate element tag for a section heading. Such elements are commonly created to distinguish the heading text (and any tags it contains) from the body of the section (that is, any structure elements underneath the heading).
Instead, we'll allow the heading to merge seamlessly into the body of a section. That arrangement adds some complexity to the stylesheet, but that will give us a chance to explore XSLT's template-selection mechanisms. It also matches our intuitive expectations about document structure, where the text of a heading is directly followed by structure elements, which can simplify outline-oriented editing.
Note: However, that structure is not easily validated, because XML's mixed-content model allows text anywhere in a section, whereas we want to confine text and inline elements so that they only appear before the first structure element in the body of the section. The assertion-based validator (Schematron) can do it, but most other schema mechanisms can't. So we'll dispense with defining a DTD for the document type.
In this structure, sections can be nested. The depth of the nesting will determine what kind of HTML formatting to use for the section heading (for example, h1 or h2.) That's also useful with outline-oriented editing, because it lets you can move sections around at will without having to worry about changing the heading tag -- or any of the other section headings that are affected by the move.
For lists, we'll use a
type
attribute to specify whether the list entries areunordered
(bulleted),alpha
(enumerated with lower case letters),ALPHA
(enumerated with uppercase letters, ornumbered
.We'll also allow for some inline tags that change the appearance of the text:
Note: An inline tag does not generate a line break, so a style change caused by an inline tag does not affect the flow of text on the page (although it will affect the appearance of that text). A structure tag, on the other hand, demarcates a new segment of text, so at a minimum it always generates a line break, in addition to other format changes.
The
<DEF>
tag will help make things interesting. That tag will used for terms that are defined in the text. Such terms will be displayed in italics, the way they ordinarily are in a document. But using a special tag in the XML will allow an index program to one day find such definitions and add them to the index, along with keywords in headings. In the Note above, for example, the definitions of inline tags and structure tags could have been marked with<DEF>
tags, for future indexing.Finally, the
LINK
tag serves two purposes. First, it will let us create a link to a URL without having to put the URL in twice -- so we can code<link>http//...</link>
instead of<a href="http//...">http//...</a>
. Of course, we'll also want to allow a form that looks like<link target="...">...name...</link>
. That leads to the second reason for the<link>
tag--it will give us an opportunity to play with conditional expressions in XSLT.
Note: As one college professor said, the trick to defining a research project is to find something that is "large enough to be feasible... but small enough to be feasible". Although the article structure is exceedingly simple (consisting of only 11 tags), it raises enough interesting problems to keep us busy exploring XSLT for a while! Along the way, we'll get a good view of it's basic capabilities. But there will still be large areas of the spec that are left untouched. The last part of this tutorial will point out the major things we missed, to give you some sense of what sorts of features await you in the specification!
Creating a Test Document
Here, you'll create a simple test document using nested
<SECT>
elements, a few<PARA>
elements, a<NOTE>
element, a<LINK>
, and a<LIST type="unordered">
. The idea is to create a document with one of everything, so we can explore the more interesting translation mechanisms.
Note: The sample data described here is contained inarticle1.xml
. (The browsable version isarticle1-xml.html
.)
To make the test document, create a file called
article.xml
and enter the XML data shown below.<?xml version="1.0"?> <ARTICLE> <TITLE>A Sample Article</TITLE> <SECT>The First Major Section <PARA>This section will introduce a subsection.</PARA> <SECT>The Subsection Heading <PARA>This is the text of the subsection. </PARA> </SECT> </SECT> </ARTICLE>Note that in the XML file, the subsection is totally contained within the major section. (Unlike HTML, for example, where headings, do no contain the body of a section.) The result is an outline structure that is harder to edit in plain-text form, like this. But much easier to edit with an outline-oriented editor.
Someday, given an tree-oriented XML editor that understands inline tags like
<B>
and<I>
, it should be possible to edit an article of this kind in outline form, without requiring a complicated stylesheet. (Thereby allowing the writer to focus on the structure of the article, leaving layout until much later in the process.) In such an editor, the article-fragment above would look something like this:<ARTICLE> <TITLE>
A Sample Article<SECT>
The First Major Section<PARA>
This section will introduce a subsection.<SECT>
The Subheading<PARA>
This is the text of the subsection. Note that ...At the moment, tree-structured editors exist, but they treat inline tags like
<B>
and<I>
the same way that they treat other structure tags, which can make the "outline" a bit difficult to read. But hopefully, that situation will improve one day. Meanwhile, we'll press on...Writing an XSLT Transform
In this part of the tutorial, you'll begin writing an XSLT transform that will convert the XML article and render it in HTML.
Note: The transform described in this section is contained inarticle1a.xsl
. (The browsable version isarticle1a-xsl.html
.)
Start by creating a normal XML document:
<?xml version="1.0" encoding="ISO-8859-1"?>Then add the lines shown below to create an XSL stylesheet:
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" > </xsl:stylesheet>Now, set it up to produce HTML-compatible output:
<xsl:stylesheet ... ><xsl:output method="html"/>
... </xsl:stylesheet>We'll get into the detailed reasons for that entry later on in this section. But for now, note that if you want to output anything besides well-formed XML, then you'll need an
<xsl:output>
tag like the one shown, specifying either"text"
or"html"
. (The default value is"xml"
.)
Note: When you specify XML output, you can add theindent
attribute to produce nicely indented XML output. The specification looks like this:<xsl:output_method="xml"_indent="yes"/>
.
Processing the Basic Structure Elements
You'll start filling in the stylesheet by processing the elements that go into creating a table of contents -- the root element, the title element, and headings. You'll also process the
PARA
element defined in the test document.
Note: If on first reading you skipped the section of this tutorial that discusses the XPath addressing mechanisms, now is a good time to go back and review that section!
Begin by adding the main instruction that processes the root element:
<xsl:stylesheet ... <xsl:template match="/"> <html><body> <xsl:apply-templates/> </body></html> </xsl:template> </xsl:stylesheet>The XSL commands are shown in bold. (Note that they are defined in the
"xsl"
namespace.) The instruction<xsl:apply-templates>
processes the children of the current node. In the case, the current node is the root node.Despite its simplicity,. this example illustrates a number of important ideas, so it's worth understanding thoroughly. The first concept is that a stylesheet contains a number of templates, defined with the <xsl:template> tag. Each template contains a
match
attribute, which selects the elements that the template will be applied to, using the XPath addressing mechanisms.Within the template, tags that do not start with the
xsl:
namespace prefix are simply copied. The newlines and whitespace that follow them are also copied, which helps to format make the resulting output readable.
Note: When a newline is not present, whitespace generally seems to be ignored. To include whitespace in the output in such cases, or to include other text, you can use the<xsl:text>
tag. Basically, an XSLT stylesheet expects to process tags. So everything it sees needs to be either an<xsl:..>
tag, some other tag, or whitespace.
In this case, the non-xsl tags are HTML tags (shown in red, for readability). So when the root tag is matched, XSLT outputs the HTML start-tags, processes any templates that apply to children of the root, and then outputs the HTML end-tags.
Process the <TITLE> Element
Next, add a template to process the article title:
<xsl:template match="/ARTICLE/TITLE"> <h1 align="center"> <xsl:apply-templates/> </h1> </xsl:template> </xsl:stylesheet>In this case, you specified a complete path to the TITLE element, and output some HTML to make the text of the title into a large, centered heading. In this case, the apply-templates tag ensures that if the title contains any inline tags like italics, links, or underlining, they will be processed as well.
More importantly, the apply-templates instruction causes the text of the title to be processed. Like the DOM data model, the XSLT data model is based on the concept of text nodes hanging off of element nodes (which, in turn, can hang off other element nodes, and so on). That hierarchical structure constitutes the source tree. There is also a result tree, which contains the output.
XSLT works by transforming the source tree into the result tree. To visualize the result of XSLT operations, it is helpful to understand the structure of those trees, and their contents. (For more on this subject, see the sidebar on The XSLT/XPath Data Model later in this section.)
Process Headings
To continue processing the basic structure elements, add a template to process the top-level headings:
<xsl:template match="/ARTICLE/SECT">
<h1> <xsl:apply-templates select="text()|B|I|U|DEF|LINK"/> </h1> <xsl:apply-templates select="SECT|PARA|LIST|NOTE"/> </xsl:template> </xsl:stylesheet>Here, you've specified the path to the topmost
SECT
elements. But this time, you've applied templates in two stages, using theselect
attribute. For the first stage, you selected text nodes using the XPathtext()
function, as well as inline tags like bold and italics. (The vertical pipe (|) is used to match multiple items -- text, or a bold tag, or an italics tag, etc.) In the second stage, you selected the other structure elements contained in the file, for sections, paragraphs, lists, and notes.Using the select tags let you put the text and inline elements between the
<h1>...</h1>
tags, while making sure that all of the structure tags in the section are processed afterwards. In other words, you made sure that the nesting of the headings in the XML document is not reflected in the HTML formatting, which is important for HTML output.In general, the select clause lets you apply all templates to a selected subset of the information available at the current context. As another example, this template selects all attributes of the current node:
<xsl:apply-templates select="@*"/></attributes>Next, add the virtually identical template to process the second-level headings:
<xsl:template match="/ARTICLE/SECT/SECT"> <h2> <xsl:apply-templates select="text()|B|I|U|DEF|LINK"/> </h2> <xsl:apply-templates select="SECT|PARA|LIST|NOTE"/> </xsl:template>
</xsl:stylesheet>Generate a Runtime Message
You could add templates for deeper headings, too, but at some point you have to stop, if only because HTML only goes down to 5 levels. But for this example, you'll stop at two levels of section headings. But if the XML input happens to contain a 3rd level, you'll want to deliver an error message to the user. This section shows you how to do that.
Note: We could continue processing SECT elements that are further down, by selecting them with the expression/SECT/SECT//SECT
. The//
selects any SECT elements, at any "depth", as defined by XPath addressing mechanism. But we'll take the opportunity to play with messaging, instead.
Add the following template to generate an error when a section is encountered that is nested too deep:
<xsl:template match="/ARTICLE/SECT/SECT/SECT">
<xsl:message terminate="yes">Error: Sections can only be nested 2 deep .</xsl:message> </xsl:template> </xsl:stylesheet>The
terminate="yes"
clause causes the transformation process to stop after the message is generated. Without it, processing could still go on with everything in that section being ignored.Expand the stylesheet to handle sections nested up to 5 sections deep, generating <h1>...<h5> tags. Generate an error on any section nested 6 levels deep.
Finally, finish up the stylesheet by adding a template to process the
PARA
tag:<xsl:template match="PARA"> <p><xsl:apply-templates/></p> </xsl:template> </xsl:stylesheet>Nothing unusual here. Just another template like the ones you're used to.
Writing the Basic Program
In this part of the tutorial, you'll modify the program that used XSLT to echo an XML file unchanged, and modify it so that it uses your stylesheet.
Note: The code shown in this section is contained inStylizer.java
. The result is the HTML code shown instylizer1a.txt
. (The displayable version isstylizer1a.html
.)
Start by copying
TransformationApp02
, which parses an XML file and writes to System.out. Save it asStylizer.java
.Next, modify occurrences of the class name and the usage-section of the program:
public class TransformationAppStylizer
{ if (argv.length != 1 2) { System.err.println ("Usage: java TransformationAppStylizer
stylesheet filename"); System.exit (1); } ...Then modify the program to use the stylesheet when creating the
Transformer
object.... import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamSource; import javax.xml.transform.stream.StreamResult; ... public class Stylizer { ... public static void main (String argv[]) { ... try { File f = new File(arv[0]);File stylesheet = new File(argv[0]); File datafile = new File(argv[1]);
DocumentBuilder builder = factory.newDocumentBuilder(); document = builder.parse(fdatafile
); ...StreamSource stylesource = new Stream- Source(stylesheet);
Transformer transformer = Factory.newTransformer(stylesource
); ...This code uses the file to create a
StreamSource
object, and then passes the source object to the factory class to get the transformer.
Note: You can simplify the code somewhat by eliminating the DOMSource class entirely. Instead of creating a DOMSource object for the XML file, create a StreamSource object for it, as well as for the stylesheet. (Take it on for extra credit!)
Now compile and run the program using
article1a.xsl
onarticle1.xml
. The results should look like this:<html> <body> <h1 align="center">A Sample Article</h1> <h1>The First Major Section </h1> <p>This section will introduce a subsection.</p> <h2>The Subsection Heading </h2> <p>This is the text of the subsection. </p> </body> </html>At this point, there is quite a bit of excess whitespace in the output. You'll see how to eliminate most of it in the next section.
Trimming the Whitespace
If you recall, when you took a look at the structure of a DOM, there were many text nodes that contained nothing but ignorable whitespace. Most of the excess whitespace in the output came from them. Fortunately, XSL gives you a way to eliminate them. (For more about the node structure, see the sidebar: The XSLT/XPath Data Model.)
Note: The stylesheet described here isarticle1b.xsl
. The result is the HTML code shown instylizer1b.txt
. (The displayable versions arearticle1b-xsl.html
andstylizer1b.html
.)
To do remove some of the excess whitespace, add the line highlighted below to the stylesheet.
<xsl:stylesheet ... > <xsl:output method="html"/> <xsl:strip-space elements="SECT"/> ...This instruction tells XSL to remove any text nodes under
SECT
elements that contain nothing but whitespace. Nodes that contain text other than whitespace will not be affected, and other kinds of nodes are not affected.Now, when you run the program, the result looks like this:
<html> <body> <h1 align="center">A Sample Article</h1> <h1>The First Major Section </h1> <p>This section will introduce a subsection.</p> <h2>The Subsection Heading </h2> <p>This is the text of the subsection. </p> </body> </html>That's quite an improvement. There are still newline characters and white space after the headings, but those come from the way the XML is written:
<SECT>The First Major Section ____<PARA>This section will introduce a subsection.</PARA> ^^^^Here, you can see that the section heading ends with a newline and indentation space, before the
PARA
entry starts. That's not a big worry, because the browsers that will process the HTML routinely compress and ignore the excess space. But we there is still one more formatting at our disposal.
Note: The stylesheet described here isarticle1c.xsl
. The result is the HTML code shown instylizer1c.txt
. (The displayable versions arearticle1c-xsl.html
andstylizer1c.html
.)
To get rid of that last little bit of whitespace, add this template to the stylesheet:
<xsl:template match="text()">
<xsl:value-of select="normalize-space()"/> </xsl:template> </xsl:stylesheet>The output now looks like this:
<html> <body> <h1 align="center">A Sample Article</h1> <h1>The First Major Section</h1> <p>This section will introduce a subsection.</p> <h2>The Subsection Heading</h2> <p>This is the text of the subsection.</p> </body> </html>That is quite a bit better. Of course, it would be nicer if it were indented, but that turns out to be somewhat harder than expected! Here are some possible avenues of attack, along with the difficulties:
- Unfortunately, the
indent="yes"
option that can be applied to XML output is not available for HTML output. Even if that option were available, it wouldn't help, because HTML elements are rarely nested! Although HTML source is frequently indented to show the implied structure, the HTML tags themselves are not nested in a way that creates a real structure.
- The
<xsl:text>
function lets you add any text you want, including whitespace. So, it could conceivably be used to output indentation space. The problem is to vary the amount of indentation space. XSLT variables seem like a good idea, but they don't work here. The reason is that when you assign a value to a variable in a template, the value is only known within that template (statically, at compile time value). Even if the variable is defined globally, the assigned value is not stored in a way that lets it be dynamically known by other templates at runtime. Once<apply-templates/>
invokes other templates, they are unaware of any variable settings made in other templates.
- Using a "parameterized template" is another way to modify a template's behavior. But determining the amount of indentation space to pass as the parameter remains the crux of the problem!
At the moment, then, there does not appear to be any good way to control the indentation of HTML-formatted output. Typically, that fact is of little consequence, since the data will usually be manipulated in its XML form, while the HTML version is only used for display a browser. It's only inconvenient in a tutorial like this, where it would be nice to see the structure you're creating! But when you click on the link to
stylizer1c.html
, you see the results you expect.Processing the Remaining Structure Elements
In this section, you'll process the LIST and NOTE elements that add additional structure to an article.
Note: The sample document described in this section isarticle2.xml
, the stylesheet used to manipulate it isarticle2.xsl
. The result is the HTML code shown instylizer2.txt
. (The displayable versions arearticle2-xml.html
,article2-xsl.html
, andstylizer2.html
.)
Start by adding some test data to the sample document:
<?xml version="1.0"?> <ARTICLE> <TITLE>A Sample Article</TITLE> <SECT>The First Major Section ... </SECT> <SECT>The Second Major Section <PARA>This section adds a LIST and a NOTE. <PARA>Here is the LIST: <LIST type="ordered"> <ITEM>Pears</ITEM> <ITEM>Grapes</ITEM> </LIST> </PARA> <PARA>And here is the NOTE: <NOTE>Don't forget to go to the hardware store on your way to the grocery! </NOTE> </PARA> </SECT> </ARTICLE>
Note: Although the list and note in the XML file are contained in their respective paragraphs, it really makes no difference whether they are contained or not--the generated HTML will be the same, either way. But having them contained will make them easier to deal with in an outline-oriented editor.
Modify <PARA> handling
Next, modify the
PARA
template to account for the fact that we are now allowing some of the structure elements to be embedded with a paragraph:<xsl:template match="PARA"> <p><xsl:apply-templates/></p> <p> <xsl:apply-templates select="text()|B|I|U|DEF|LINK"/> </p> <xsl:apply-templates select="PARA|LIST|NOTE"/> </xsl:template>This modification uses the same technique you used for section headings. The only difference is that
SECT
elements are not expected within a paragraph.Process <LIST> and <ITEM> elements
Now you're ready to add a template to process
LIST
elements:<xsl:template match="LIST">
<xsl:if test="@type='ordered'"> <ol> <xsl:apply-templates/> </ol> </xsl:if> <xsl:if test="@type='unordered'"> <ul> <xsl:apply-templates/> </ul> </xsl:if> </xsl:template> </xsl:stylesheet>The
<xsl:if>
tag uses thetest=""
attribute to specify a boolean condition. In this case, the value of thetype
attribute is tested, and the list that is generated changes depending on whether the value isordered
orunordered
.The two important things to note for this example are:
- There is no
else
clause, nor is there areturn
orexit
statement, so it takes two<xsl:if>
tags to cover the two options. (Or the<xsl:choose>
tag could have been used, which provides case-statement functionality.)- Single quotes are required around the attribute values. Otherwise, the XSLT processor attempts to interpret the word
ordered
as an XPath function, instead of as a string.Now finish up
LIST
processing by handlingITEM
elements. Nothing spectacular here.<xsl:template match="ITEM">
<li><xsl:apply-templates/> </li> </xsl:template> </xsl:stylesheet>Ordering Templates in a Stylesheet
By now, you should have the idea that templates are independent of one another, so it doesn't generally matter where they occur in a file. So from here on, we'll just show the template you need to add. (For the sake of comparison, they're always added at the end of the example stylesheet.)
Order does make a difference when two templates can apply to the same node, In that case, the one that is defined last is the one that is found and processed. For example, to change the ordering of an indented list to use lowercase alphabetics, you could specify a template pattern that looks like this:
//LIST//LIST
. In that template, you would use the HTML option to generate an alphabetic enumeration, instead of a numeric one.But such an element could also be identified by the pattern
//LIST
. To make sure the proper processing is done, the template that specifies//LIST
would have to appear before the template the specifies//LIST//LIST
.Process <NOTE> Elements
The last remaining structure element is the
NOTE
element. Add the template shown below to handle that.<xsl:template match="NOTE"> <blockquote><b>Note:</b><br/> <xsl:apply-templates/> </p></blockquote> </xsl:template>This code brings up an interesting issue that results from the inclusion of the
<br/>
tag. To be well-formed XML, the tag must be specified in the stylesheet as<br/>
, but that tag is not recognized by many browsers. And while most browsers recognize the sequence<br></br>
, they all treat it like a paragraph break, instead of a single line break.In other words, the transformation must generate a
<br>
tag, but the stylesheet must specify<br/>
. That brings us to the major reason for that special output tag we added early in the stylesheet:<xsl:stylesheet ... > <xsl:output method="html"/> ... </xsl:stylesheet>That output specification converts empty tags like
<br/>
to their HTML form,<br>,
on output. That conversion is important, because most browsers do not recognize the empty-tags. Here is a list of the affected tags:
Table 3 Empty Tags - area
- base
- basefont
- br
- col
- frame
- hr
- img
- input
- isindex
- link
- meta
- param
By default, XSLT produces well-formed XML on output. And since an XSL stylesheet is well-formed XML to start with, you cannot easily put a tag like
<br>
in the middle of it. The "<xsl:output method="html"/>
" solves the problem, so you can code<br/>
in the stylesheet, but get<br>
in the output.The other major reason for specifying
<xsl:output method="html"/>
is that, like the specification<xsl:output method="text"/>
, generated text is not escaped. For example, if the stylesheet includes the<
entity reference, it will appear as the "<" character in the generated text. When XML is generated, on the other hand, the<
entity reference in the stylesheet would be unchanged, so it would appear as<
in the generated text.
Note: If you actually want < to be generated as part of the HTML output, you'll need to encode it as&lt;
--that sequence becomes < on output, because only the&
is converted to an&
character.
Run the Program
Here is the HTML that is generated for the second section when you run the program now:
... <h1>The Second Major Section</h1> <p>This section adds a LIST and a NOTE.</p> <p>Here is the LIST:</p> <ol> <li>Pears</li> <li>Grapes</li> </ol> <p>And here is the NOTE:</p> <blockquote> <b>Note:</b> <br>Don't forget to go to the hardware store on your way to the grocery! </blockquote>Process Inline (Content) Elements
The only remaining tags in the
ARTICLE
type are the inline tags -- the ones that don't create a line break in the output, but which instead are integrated into the stream of text they are part of.Inline elements are different from structure elements, in that they are part of the content of a tag. If you think of an element as a node in a document tree, then each node has both content and structure. The content is composed of the text and inline tags it contains. The structure consists of the other elements (structure elements) under the tag.
Note: The sample document described in this section isarticle3.xml
, the stylesheet used to manipulate it isarticle3.xsl
. The result is the HTML code shown instylizer3.txt
. (The browser-displayable versions arearticle3-xml.html
,article3-xsl.html
, andstylizer3.html
.)
Start by adding one more bit of test data to the sample document:
<?xml version="1.0"?> <ARTICLE> <TITLE>A Sample Article</TITLE> <SECT>The First Major Section ... </SECT> <SECT>The Second Major Section ... </SECT> <SECT>The <I>Third</I> Major Section <PARA>In addition to the inline tag in the heading, this section defines the term <DEF>inline</DEF>, which literally means "no line break". It also adds a simple link to the main page for the Java platform (<LINK>http://java.sun.com</LINK>), as well as a link to the <LINK target="http://java.sun.com/xml">XML</LINK> page. </PARA> </SECT> </ARTICLE>Now, process the inline
<DEF>
elements in paragraphs, renaming them to HTML italics tags:<xsl:template match="DEF">
<i> <xsl:apply-templates/> </i> </xsl:template>Next, comment out the text-node normalization. It has served its purpose, and new we're to the point that we need to preserve spaces important:
<!--
<xsl:template match="text()"> <xsl:value-of select="normalize-space()"/> </xsl:template> -->This modification keeps us from losing spaces before tags like
<I>
and<DEF>
. (Try the program without this modification to see the result.)Now, process basic inline HTML elements like <B>, <I>, <U> for bold, italics, and underlining.
<xsl:template match="B|I|U"> <xsl:element name="{name()}"> <xsl:apply-templates/> </xsl:element> </xsl:template>The
<xsl:element>
tag lets you compute the element you want to generate. Here, you generate the appropriate the inline tag using the name of the current element. In particular, note the use of curly braces ({}
) in thename=".."
expression. Those curly braces cause the text inside the quotes to be processed as an XPath expression, instead of being interpreted as a literal string. Here, they cause the XPathname()
function to return the name of the current node.Curly braces are recognized anywhere that an "attribute value template" can occur. (Attribute value templates are defined in section 7.6.2 of the specification, and they appear several places in the template definitions.). In such expressions, curly braces can also be used to refer to the value of an attribute,
{@foo}
, or to the content of an element{foo}
.
Note: You can also generate attributes using<xsl:attribute>
. For more information see Section 7.1.3 of the XSLT Specification.
The last remaining element is the
LINK
tag. The easiest way to process that tag will be to set up a named-template that we can drive with a parameter:<xsl:template name="htmLink"> <xsl:param name="dest" select="UNDEFINED"/> <xsl:element name="a"> <xsl:attribute name="href"> <xsl:value-of select="$dest"/> </xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template>The major difference in this template is that, instead of specifying a
match
clause, you gave the template a name with thename
="" clause. So this template only gets executed when you invoke it.Within the template, you also specified a parameter named "dest", using the
<xsl:param>
tag. For a bit of error checking, you used theselect
clause to give that parameter a default value of "UNDEFINED
". To reference the variable in the<xsl:value-of>
tag, you specified"$dest"
.
Note: Recall that an entry in quotes is interpreted as an expression, unless it is further enclosed in single quotes. That's why the single quotes were needed earlier, in"@type='ordered'"
--to make sure thatordered
was interpreted as a string.
The
<xsl:element>
tag generates an element. Previously, we have been able to simply specify the element we want by coding something like<html>
. But here you are dynamically generating the content of the HTML anchor (<a>
) in the body of the<xsl:element>
tag. And you are dynamically generating thehref
attribute of the anchor using the<xsl:attribute>
tag.The last important part of the template is the
<apply-templates>
tag, which inserts the text from the text node under theLINK
element. (Without it, there would be no text in the generated HTML link.)Next, add the template for the
LINK
tag, and call the named template from within it:<xsl:template match="LINK"> <xsl:if test="@target"> <!--Target attribute specified.--> <xsl:call-template name="htmLink"> <xsl:with-param name="dest" select="@target"/> </xsl:call-template> </xsl:if> </xsl:template> <xsl:template name="htmLink"> ...The
test="@target"
clause returns true if thetarget
attribute exists in theLINK
tag. So this if-statement generates HTML links when the text of the link and the target defined for it are different.The
<xsl:call-template>
tag invokes the named template, while<xsl:with-param>
specifies a parameter using thename
clause, and its value using theselect
clause.As the very last step in the stylesheet construction process, add the if-clause shown below to process
LINK
tags that do not have atarget
attribute.<xsl:template match="LINK"> <xsl:if test="@target"> ... </xsl:if> <xsl:if test="not(@target)"> <xsl:call-template name="htmLink"> <xsl:with-param name="dest"> <xsl:apply-templates/> </xsl:with-param> </xsl:call-template> </xsl:if> </xsl:template>The
not(...)
clause inverts the previous test (there is no else clause, remember?). So this part of the template is interpreted when thetarget
attribute is not specified. This time, the parameter value comes not from a select clause, but from the contents of the<xsl:with-param>
element.
Note: Just to make it explicit: variables (which we'll mention a bit later) and parameters can have their value specified either by aselect
clause, which lets you use XPath expressions, or by the content of the element, which lets you use XSLT tags.
The content of the parameter, in this case, is generated by the
<xsl:apply-templates/>
tag, which inserts the contents of the text node under theLINK
element.Run the Program
When you run the program now, the results should look like this:
... <h1>The <I>Third</I> Major Section </h1> <p>In addition to the inline tag in the heading, this section defines the term <i>inline</i>, which literally means "no line break". It also adds a simple link to the main page for the Java platform (<a href="http://java.sun.com">http://java.sun.com</a>), as well as a link to the <a href="http://java.sun.com/xml">XML</a> page. </p>Awesome! You have now converted a rather complex XML file to HTML. (As seemingly simple as it was, it still provided a lot of opportunity for exploration.)
Printing the HTML
You have now converted an XML file to HTML. One day, someone will produce an HTML-aware printing engine that you'll be able to find and use through the Java Printing Service (JPS) API. At that point, you'll have ability to print an arbitrary XML file as formatted data--all you'll have to do is set up a stylesheet!
What Else Can XSLT Do?
As lengthy as this section of the tutorial has been, it has still only scratched the surface of XSLT's capabilities. Many additional possibilities await you in the XSLT Specification. Here are a few of the things to look for:
- Use these statements to modularize and combine XSLT stylesheets. The
include
statement simply inserts any definitions from the included file. Theimport
statement lets you override definitions in the imported file with definitions in your own stylesheet.
- Dynamically generate numbered sections, numbered elements, and numeric literals. XSLT provides three numbering modes:
- single: Numbers items under a single heading, like an "ordered list" in HTML.
- multiple: Produces multi-level numbering like "A.1.3".
- any: Consecutively numbers items wherever they appear, like the footnotes in a chapter.
- Control enumeration formatting, so you get numerics (
format="1"
), uppercase alphabetics (format="A"
), lowercase alphabetics (format="a"
), or compound numbers, like "A.1", as well as numbers and currency amounts suited for a specific international locale.
- Lets you process an element multiple times, each time in a different "mode". You add a
mode
attribute to templates, and then specify<apply-templates mode="...">
to apply only the templates with a matching mode. Combined with the<apply-templates select="...">
to slice and dice the input processing, creating a matrix of elements to process and the templates to apply to them.
- Variables, like parameters, let you control a template's behavior. But they are not as valuable as you might think. The value of a variable is only known within the scope of the current template or <xsl:if> clause (for example) in which it is defined. You can't pass a value from one template to another, or even from an enclosed part of a template to another part of the same template.
- These statements are true even for a "global" variable. You can change its value in a template, but the change only applies to that template. And when the expression used to define the global variable is evaluated, that evaluation takes place in the context of the structure's root node. In other words, global variables are essentially runtime constants. Those constants can be useful to change the behavior of a template, especially when coupled with
include
andimport
statements. But variables are not a general-purpose data-management mechanism.The XSLT/XPath Data Model
Like the DOM, the XSL/XPath data model consists of a tree containing a variety of nodes. Under any given element node, there are text nodes, attribute nodes, element nodes, comment nodes, and processing instruction nodes.
Once an XPath expression establishes a context, other expressions produce values that are relative to that context. For example, the expression
//LIST
establishes a context consisting of a LIST node. Within the XSLT template that processes such nodes, the expression@type
refers to the element's type attribute. (Similarly, the expression@*
refers to all of the element's attributes.)The Trouble with Variables
It is awfully tempting to create a single template and set a variable for the destination of the link, rather than going to the trouble of setting up a parameterized template and calling it two different ways. The idea would be to set the variable to a default value (say, the text of the
LINK
tag) and then, iftarget
attribute exists, set the destination variable to the value of thetarget
attribute.That would be a darn good idea--if it worked. But once again, the issue is that variables are only known in the scope within which they are defined. So when you code an
<xsl:if>
to change the value of the variable, the value is only known within the context of the<xsl:if>
tag. Once</xsl:if>
is encountered, any change to the variable's setting is lost.A similarly tempting idea is the possibility of replacing the
text()|B|I|U|DEF|LINK
specification with a variable ($inline
). But since the value of the variable is determined by where it is defined, the value of a globalinline
variable consists of text nodes,<B>
nodes, etc. that happen to exist at the root level. In other words, the value of such a variable, in this case, is null.Next...
The final page of the XSLT tutorial will show you how to concatenate multiple transformations together in a filter chain.
Home TOC |
![]() ![]() ![]() |