spacer spacer
Index   Page down
spacer

Summary

Application example of XSLT to convert XML documents into HTML documents suitable for publication on the internet. The example is kept very simple so as to allow a rapid grasp of the key concepts. Instructions are given for the reader to test the whole on his own computer.

Acknowledgements

A sincere thank to Max Froumentin from W3C for his helpful comments.
spacer spacer
Introduction  
Page up Page down
We cannot stay unimpressed if we look at the considerable work devoted in recent years by the W3 Consortium to the development of the Extensible Markup Language (XML) and to the Extensible Stylesheet Language ( XSL). Their main goal was to add more flexibility in handling documents, succeeding remarkably well in bypassing several limitations and drawbacks connected with the almost universally used Hypertext Markup Language (HTML) standard. Improved clarity, simplicity, modularity and coherence certainly were not mere slogans if we examine the many developments it originated.  In the meantime a number of interesting tools related with XML and XSL have appeared which, although far from complete, allow us to make first steps toward applying them. Hereafter we present a short case study on how the handling of the pages published on the present site could be improved with the introduction of a specific document markup language based on XML resulting among others in a de facto indirect extension of HTML. This presentation assumes the reader has a basic knowledge of XHTML, XML, XSL, XSLT.
spacer spacer
The problem with HTML documents  
Page up Page down
At the start, writing documents suitable for being published on the Internet seemed rather straightforward: all what you needed was a decent text editor and the HTML specifications in order to add the appropriate markup but with the work in progress some problems appeared which lead to the conclusion that the mixing of structural markup elements and style markup was not quite a blessing. The first disappointment came from the discovery that not all the pages did consistently look similar to the intended design if displayed with different browsers and/or operating systems. A turnaround to this problem was the careful choice of a compromise HTML structure and style, actually still at the base of the pages found on this site.
The observation that a lot of repetitive style and layout markup had almost submerged the content of the documents lead the author to postulate a mechanism by which the desired layout and style could be derived from the markup structuring the content of a document.
 Until recently no viable solution was in sight to this problem but the recent release of MSXML 3.0 by Microsoft Corporation combined with the MSXSL.EXE command line Utility from the same company have now provided the opportunity for implementing a solution. As an alternative you may use Instant Saxon 6.5.3, a package written by Michael Kay.
Another way to implement the separation of content from presentation implies the use of Cascading Style Sheets (CSS), a well designed mechanism which unfortunately has not been implemented adequately by most of the current browsers and therefore source for a lot of frustration to those who tried to use it. When we talk of a viable solution we obviously exclude having to transfer frustration from one technology to another. This problem should gradually disappear with the diffusion of the latest browser releases.
spacer spacer
Which roles play XML, DTD, and XSLT  
Page up Page down
The first brick building our solution comes from the eXtended Markup Language ( XML) standard which allows to structure a document following a relatively simple set of rules. As an example let's have a look at the source of the XML structured document page_ex.xml at the base of the HTML page page_ex.html . (We suggest to open this pages in separate windows and compare the XML source with the HTML source of page_ex.html using the "display source" option of the browser.) Most browsers so far around still lack the ability to handle in a useful way such a document, therefore, in order to display page_ex.xml, we need to convert it into a HTML document.
Here comes in the second brick of our solution: the eXtensible Stylesheet Language Transformations (XSLT) standard which allows us to specify by means of template rules how the XML document has to be converted into a HTML document by a XSL processor.
 From the third preamble line in our example we can desume that the stylesheet applicable to this XML document is found in the resource page_ex.xsl . This stylesheet is used by the XSL processor, in our case the MSXSL.EXE utility from Microsoft Corporation, to derive the HTML version of page_ex.xml.
The last brick of our solution is represented by the Document Type Definition (DTD) which describes the content structure of our XML document and, although not required for our solution to work, is a very valuable tool for ensuring conformance to the intended content structure. The second line in the preamble tells us that the DTD for the document page_ex.xml is found in page_ex.dtd . With this DTD an XSL processor will be able to point to any inconsistencies and/or errors in the document. The statements specific to the document content are grouped together at the end of the file.
spacer spacer
Some highlights on this solution  
Page up Page down
In essence, the stylesheet page_ex.xsl describes a document framework ready to be filled with content. The generation of page_ex.html can be seen as a process of merging this frame work with the structured content in page_ex.xml whereas the element names help to bring in at the right place the various text bits. An unexpected powerful feature of the XSLT design is that you describe only once how a particular text element has to be handled, however many times and however where it may occur. If in a further development you need to distinguish the handling of an element depending on the context you can modify the stylesheet without having to rewrite all again. The DTD allows the use of any HTML markup valid in the BODY block to facilitate the transition from HTML to XML with existing documents. Note however that the stylesheet page_ex.xsl handles only the subset of HTML markup effectively needed for our example.  For example the home, previous, and next attributes of the page element control automatically the navigation help provided in the header. This is obtained by the following statements in page_ex.xsl:

<TD class="rheader">
  <xsl:apply-templates select="@home"/>
  <xsl:call-template name="br"/>
  <xsl:apply-templates select="@previous"/>
  <xsl:call-template name="br"/>
  <xsl:apply-templates select="@next"/>
</TD>
This could of course be further enhanced by making appear an appropriate icon with mapping etc..., as it is done for the pages of this site.
If you examine the generated HTML markup for page_ex.html you will notice that some markup elements are in lower case characters whereas others are in upper case characters. This is done with the purpose of distinguishing what has been added by the XSLT processor (upper case) from what is coming directly from the XML document (lower case). Browsers are not case sensitive when reading HTML markup.
spacer spacer
 
Page up Page down
It is possible to parametrise the presentation by means of attributes having default values to which the XSLT stylesheet can refer as with indent and separation in the following excerpt taken from the "sections" template in page_ex.xsl :

<TR><TD width="{@indent}">
    <xsl:call-template name="nbsp"/></TD>
  <TD width="{@separation}">
    <xsl:call-template name="nbsp"/></TD>
  ...
</TR>
In the DTD page_ex.dtd we can find the corresponding excerpt from which the values will be taken:

<!ELEMENT sections   (section+, body?)>
<!ATTLIST sections
  %core.att;
  width         %RelativeLength;  "95%"
  height        %RelativeLength;  "80%"
  indent        %RelativeLength;  "30%"
  separation    %RelativeLength;  "2%"
>
This way of parametrisation gives you also the opportunity to change them in exceptional cases by simply specifying the alternative values locally via the element attributes.
 Those who will give a closer look to the generated HTML markup and the corresponding declarations headings in page_ex.html or submit it to a HTML validator will discover some discrepancies. The presented example is effectively the result of a compromise between generating pure HTML4 markup and generating XHTML1 markup, because XML, XSL, and XSLT require that every element must have its corresponding end tag. For instance you cannot specify in XSLT the generation of a META element without its ending META/ (unless you resort to a cumbersome and obscuring trick involving xsl:text elements, see the template named "br", for a very simple instance of the sort). Tests with various browsers have shown that elements with end tags, even if not contemplated by the HTML4 standard, do not cause any inconvenience.
The "height" attribute appearing in table elements is not part of the HTML4 specification but is nevertheless recognised by most browsers for compatibility with past versions. It is used extensively for controlling the layout in the pages of this site.
spacer spacer
What do we gain with XML, XSL, XSLT?  
Page up Page down
Here a list for some decisive advantages characterising this new standards:
  • It is easier to focus on content with documents structured according to the XML standard.
  • With separate XSLT stylesheets we can achieve a specific presentation of the same content for different media.
  • The continuity with the existing HTML standard is ensured.
  • XSLT allows you to drive the presentation of your documents by the content structure.
  • Flexibility in setting the border between content structure and presentation details.
  • Ability to modify the presentation style without having to rewrite the content.
  • You determine only once the compromise HTML working as expected on most browsers.
 What applies to XSLT of course equally applies to XSL, a standard which embodies XSLT, Xpath, and Formatting Objects. With XSL you provide a recipe on how the content in a XML document has to be presented without having to alter it. Once supported by most browsers, XSL will become an interesting alternative since it does not require the HTML standard.
Obviously, in order to benefit from all this advantages, we have to invest some time in setting up more consistent working procedures, and most importantly, ensure the coherence of the handled information.
spacer spacer
Test the example on your computer  
Page up Page down
The following step by step instructions will allow you to play around with the presented example on your computer:
  1. Download MSXML 3.0 and install it with the replace modality (see instructions from Microsoft).
  2. Download MSXSL.EXE Command Line Utility.
  3. Download the ZIP file page_ex.zip containing the sources of this example and unpack them into a directory of your choice. Together with the sources you will find a small batch file called xml.bat which must be edited so as to make it point to the directory where you installed the command-line utility msxsl.exe (see comment in the batch file itself).
  4. Open the Microsoft Internet Explorer (MSIE) and load the file page_ex.xml; if MSXML 3.0 has been installed correctly the document should be displayed in its HTML form (internal conversion by MSIE).
 
  1. Now open a DOS command window, change to the directory with the source files and type the command
    
    xml page_ex
    
    This should generate the file page_ex.html which can be viewed with any browser.
  2. If you like to experiment more with XML and XSLT you may also download Microsoft XML Notepad for viewing and editing on the fly XML files and Microsoft XML Parser SDK with the documentation for MSXML 3.0.
    You may also consider using a more specific tool like XMLwriter from Wattle Software.
spacer spacer
Conclusions  
Page up Page down
In our presentation we have focused on the flexibility the new standards introduce in handling documents. But documents should contain information, not obscuring markup language and this is exactly what we can achieve using XML, XSL, and XSLT. One might well argue that in the presented example there still occurs HTML markup but to this point we must first distinguish the structuring markup as opposed to presentational markup; the latter should almost disappear but banning it straightway would not be wise because for powerful as XML combined with XSL, XSLT might be, for now there are still cases which can be handled more effectively with already known recipes.  The conversion of the pages making up this internet publication would hardly have been feasible would the XML standard not have provided the necessary flexibility in making the transition. For still a while this pages will be made accessible to the users in HTML form but as soon as the new XML enhanced browsers will be more widely available their XML form will be made available directly. At that moment native XSL will become determinant in the management of their presentation to the public, making the transition complete.

Louis JEAN-RICHARD

November 2000

Rev. October 2004

spacer spacer
References   Page up
  1. The World Wide Web Consortium
  2. Extensible Markup Language (XML) 1.0
  3. Extensible Stylesheet Language (XSL) 1.0
  4. XSL Transformations (XSLT) 1.0
  5. Hypertext Markup Language (HTML) 4.01
  6. Cascading Style Sheets (CSS)
  7. MSXSL.EXE Command Line Utility by Microsoft Corporation
  8. MSXML 3.0 by Microsoft Corporation
  9. Microsoft XML Notepad
  10. Microsoft XML Parser SDK
  11. Instant Saxon 6.5.3
  12. page_ex.zip with sources of the examples used in this article
  13. XMLwriter by Wattle Software