This manual is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This manual is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this manual if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Special thanks go to David Cherry, a long-time friend, who provided the drawings. Lori Bowen Ayre deserves a round of applause for providing editorial support. Infopeople are the folks who sponsored the whole thing. Roy Tennant helped with proofreading. Thank you! --ELM
For possibly more up-to-date information see the Getting Started With XML home page .
Editions:
The first public release of this document was dated Saturday, February 22, 2003.
The second edition was dated Monday, April 21, 2003 (Dingus Day).
This is the third edition of this document, Sunday, October 26, 2003 (getting ready for MCN).
This is the fourth edition of this document, Tuesday, 27, 2004 (Shining a LAMP on XML 'n Monterey).
Table of Contents
Table of Contents
Designed for librarians and library staff, this workshop introduces participants to the extensible markup language (XML) through numerous library examples, demonstrations, and structured hands-on exercises. Through this process you will be able to evaluate the uses of XML for making your library's data and information more accessible to people as well as computers. Examples include adding value to electronic texts, creating archival finding aids, and implementing standards compliant Web pages. By the end of the manual you will have acquired a thorough introduction to XML and be able to: 1) list seven rules governing the syntax of XML documents, 2) create your very own XML markup language, 3) write XML documents using a plain text editor and validate them using a Web browser, 4) apply page layout and typographical techniques to XML documents using cascading style sheets, 5) create simple XML documents using a number of standard XML vocabularies important to libraries such as XHTML, TEI, and EAD, and finally, 6) articulate why XML is important for libraries.
Hilights of the manual include:
Demonstrations of the use of XML in libraries to create, store, and disseminate electronic texts, archival finding aids, and Web pages
Teaching seven simple rules for creating valid XML documents
Practicing with the combined use of cascading style sheets and XML documents to display data and information in a Web browser
Practicing with the use of XHTML and learning how it can make your website more accessible to all types of people as well as Internet robots and spiders
Demonstrating how Web pages can be programmatically created using XSLT allowing libraries to transform XML documents into other types of documents
Enhancing electronic texts with the use of the TEI markup allowing libraries to add value to digitized documents
Writing archival finding aids using EAD thus enabling libraries to unambiguously share special collection information with people and other institutions
Using LAMP-esque open source software (Linux, Apache, MySQL, Perl) to manipulate and provide access to XML content .
The manual is divided into the following chapters/sections:
What is XML and why should I care?
A gentle introduction to XML markup
Creating your own markup
Rendering XML with cascading stylesheets
Transforming XML with XSL
Validating XML with DTDs
Introduction to selected XML languages: XHTML, TEI, DocBook, RDF, etc.
XML and MySQL, Perl, swish-e, and Apache.
Web services (OAI and SRU)
Selected reading list
Professionally speaking, Eric Lease Morgan is a librarian first and a computer user second. His goal is to discover new ways to use computers to provide better library services and ultimately increase useful knowledge and understanding.
During the day Eric is the Head of the Digital Access and Information Architecture Department at the University Libraries of Notre Dame. As such he and Team DAIAD help the Libraries do stuff digital. In the evening and on the weekends Eric spends much of his time doing more library work but under the ruberic of Infomotions, Inc. While wearing this hat Eric writes computer programs, provides consulting services, and maintains his infomotions.com domain where his photo gallery, Alex Catalogue of Electronic Texts, and Musing on Information are the showcases.
Eric has a BA in Philosophy from Bethany College, Bethany, WV (1982). He has an MIS from Drexel University (1987). He had been writting software since 1978 and been giving it away at least a decade before the term "open source" was coined. In 1998, Eric helped develop and popularize the idea of MyLibrary, a user-driven, customizable interface to collections of library services and content -- a portal.
Recognized numersous times by his peers, Eric won the 1991 Meckler Computers in Libraries Grand Prize and was runner-up in 1990, received three Apple Library of Tomorrow (ALOT) grants, won the 2002 Bowker/Ulrich Serials Librarianship Award, was designated as one of the "Top Librarian Personalities on the Web" in a 2002 issue of Searcher Magazine, was deemed a "2002 Mover & Shaker" in Library Journal, and won the the 2004 LITA HiTech Award for excellent communication and contributions to the profession.
In his copious spare time, Eric can be seen playing disc (frisbee) golf and folding defective floppy disks into intricate origami flora and fauna.
This workbook has grown larger and larger over the past couple of years. Through the process it has lost some of its coherency. I know there are spelling and gramatical errors. The whole thing is in need of a good editor. If you enjoy editing, and if you have some understanding of DocBook, then don't hesitate to "apply within".
Eric Lease Morgan ( eric_morgan@infomotions.com )
Table of Contents
Table of Contents

In a sentence, the eXtensible Markup Language (XML) is an open standard providing the means to share data and information between computers and computer programs as unambiguously as possible. Once transmitted, it is up to the receiving computer program to interpret the data for some useful purpose thus turning the data into information. Sometimes the data will be rendered as HTML. Other times it might be used to update and/or query a database. Originally intended as a means for Web publishing, the advantages of XML have proven useful for things never intended to be rendered as Web pages.
Think of XML as if it represented tab-delimited text files on steroids. Tab-delimited text files are very human readable. They are easy to import into word processors, databases, and spreadsheet applications. Once imported, their simple structure make their content relative easy to manipulate. Tab-delimited text files are even cross-platform and operating system independent (as long as you can get around the carriage-return/linefeed differences between Windows, Macintosh, and Unix computers). See the following example
Amanda 10 dog brown Blake 12 dog blue Jack 3 cat black Loosey 1 cat brown Stop 5 pig brown Tilly 14 cat silver
The problem with tab-delimited text files are two-fold. First, the meaning of each tab-delimited values are not explicitly articulated. In order to know what each value is suppose to represent it is necessary to be given (or be told ahead of time) some sort of map or context for the data. Second and more importantly, tab-delimited text files can only represent a very simple data structure, a data structure analogous to a simple matrix of rows and columns. Put another way, tab-delimited text files are exactly like flat file databases. There is no easy, standardized way of representing data in a hierarchial fashion.
Much like tab-delimited text files, XML files are very human readable since they are allowed to contain only Unicode characters -- a considerably extended version of the original ASCII character code set. Additionally, XML files are operating system and application independent with the added benefit of making carriage-return/linefeed sequences almost a non-issue.
Unlike tab-delimited files, XML files explicitly state the meaning of each value in the file. Very little is left up to guesswork. Each element's value is explicitly described. XML turns data into information. The tab-delimited file from Figure 1.1 is simply an organized list of words and numbers. They have no context and therefore they only represent data. On the other hand, the words and numbers in XML files are given value and context, and therefore are transformed from data to information. Furthermore, it is very easy to create hierarchial data structures using XML. Figure 1.2 illustrates these concepts. Without very much examination, it becomes apparent the data represents a list of pets, specifically, six pets, and each pet has a name, age, type, and color. Was that as apparent in the previous example?
<pets> <pet> <name>Tilly</name> <age>14</age> <type>cat</type> <color>silver</color> </pet> <pet> <name>Amanda</name> <age>10</age> <type>dog</type> <color>brown</color> </pet> <pet> <name>Jack</name> <age>3</age> <type>cat</type> <color>black</color> </pet> <pet> <name>Blake</name> <age>12</age> <type>dog</type> <color>blue</color> </pet> <pet> <name>Loosey</name> <age>1</age> <type>cat</type> <color>brown</color> </pet> <pet> <name>Stop</name> <age>5</age> <type>pig</type> <color>brown</color> </pet> </pets>
As the world's production economies move more and more towards service economies, the stuff of business becomes more tied to data and information. Similarly, libraries are becoming less about books and more about the ideas and concepts manifested in the books. In both of these spheres of influence there needs to be a way to move data and information around efficiently and effectively. XML data shared between computers and computer programs via the hypertext transfer protocol represents an evolving method to facilitate this sharing, a method generically called Web Services.
For example, an XML markup called RSS (Rich Site Summary) is increasingly used to syndicate lists of uniform resource locators (URL's) representing news stories found on websites. RDF (Resource Description Framework) is an XML markup used to encapsulate meta data about content found at the end of URL's. TEI (Text Encoding Initiative) and TEILite are both an SGML and well as an XML markup used to explicitly give value to things found in literary works. Similarly, another XML language called DocBook is increasingly used to markup computer-related books or articles. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) uses XML to gather meta data about the content found at remote Internet sites.
As information professionals, it behooves us to learn how to exploit the capabilities of XML, because XML is a tool making it easy to unambiguously and as platform independently as possible communicate information in a globally networked environment. Isn't that what librarianship and information science is all about?
Table of Contents

XML documents have syntactic and semantic structures. The syntax (think spelling and punctuation) is made up of a minimum of rules such as but not limited to:
XML documents always have one and only one root element
Element names are case-sensitive
Elements are always closed
Elements must be correctly nested
Elements' attributes must always be quoted
There are only five entities defined by default (<, >, &, ", and ')
When necessary, namespaces must be employed to eliminate vocabulary clashes.
Each of these rules are described in more detail below.
The structure of an XML document is a tree structure where there is one trunk and optionally many branches. The single trunk represents the root element of the XML document. Consider the following, overly simplified, HTML document, Figure 2.1:
<html> <head> <title>Hello, World</title> </head> <body> <p>Hello, World</p> </body> </html>
This document structure should look familiar to you. It is a valid XML document, and it only contains a single root element, namely html. There are then two branches to the document, head and body.
Element names, the basic vocabulary of XML documents, are case-sensitive. In Figure 2.1 there are five elements: html, head, title, body, and p. Since each element's name is case-sensitive, the element html does not equal HTML, nor does it equal HTmL or Html. The same is true for the other elements.
Each element is denoted by opening and closing brackets, the less than sign (<) and greater than sign (>), respectively. XML elements are rarely empty; they are usually used to provide some sort of meaning or context to some data, and consequently, XML elements usually surround data. Each of the elements is Figure 2.1 are opened and closed. For example, the title of the document is denoted with the <title> and </title> elements and the only paragraph of the document is denoted with <p> and </p> elements. An opened element does not contain the initial forward slash but closing elements do.
Sometimes elements can be empty such as the break tag in XHTML. In such cases the element is opened and closed at the same time, and it is encoded like this: <br />.
Consecutive XML elements may not be opened and then closed without closing the elements that were opened last first. Doing so is called improper nesting. Take the following incorrect encoding of an XHTML paragraph:
<p>This is a test. This is a test of the <em> <strong>Emergency</em> Broadcast System.</strong></p>
In the example above the em and strong elements are opened, but the em element is closed before the strong element. Since the strong element was opened after the em element it must be closed before the em element. Here is correct markup:
<p>This is a test. This is a test of the <strong> <em>Emergency</em> Broadcast System.</strong></p>
XML element are often qualified using attributes. For example, an integer might be marked up as a length and the length element might be qualified to denote feet as the unit of measure. For example: <length unit='feet'>5</length>. The attribute is named unit, and it's value is always quoted. It does not matter whether or not it is quoted with an apostrophe (') or a double quote (").
Certain characters in XML documents have special significance, specifically, the less than (<), greater than (>), and ampersand (&) characters. The first two characters are used to delimit the existence of element names. The ampersand is used to delimit the display of special characters commonly known as entities; they ampersand character is the "escape" character. Consequently, if you want to display any of these three characters in your XML documents, then you must express them in their entity form:
to display the & character type &
to display the < character type <
to display the > character type >
XML processors, computer programs that render XML documents, should be able interpret these characters without the characters being previously defined.
There are two other characters that can be represented as entity references:
to display the ' character optionally type '
to display the " character optionally type "
The concept of a "namespace" is used to avoid clashes in XML vocabularies.
Remember, the X in XML stands for extensible. This means you are allowed to create your own XML vocabulary. There is no centralized authority to dictate what all the valid vocabularies are and how they are used. Fine, but with so many XML vocabularies there are bound to be similarities between them. For example, it is quite likely that different vocabularies will want some sort of date element. Others will want a name or description element. In each of these vocabularies the values expected to be stored in date, name, or description elements may be different. How to tell the difference? Namespaces.
Namespaces have a one-to-one relationship with a URI (Universal Resource Identifier), and namespace attributes defined with URIs can be inserted into XML elements to denote how an element is to be used. Namespace attributes always begin with "xmlns". Namespace attributes are always end with some sort of identifier call the "local part". These two things are always delimited by a colon and finally equated with a URI. For example, a conventional namespace defining the Dublin Core namespace is written as:
xmlns:dc="http://purl.org/dc/elements/1.1/"
where:
xmln denotes a namespace
dc denotes the name of the namespace, the local part
http://purl.org/dc/elements/1.1/ is a unique identifier (URI)
This whole namespace thing become very useful when an XML document uses two or more XML vocabularies. For example, it is entirely possible (if not necessary) to have more than one vocabulary in RDF streams. There is one vocabulary used to describe the RDF structure, and there is another vocabulary used to describe the metadata. The example below is a case in point:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.AcronymFinder.com/">
<dc:title>Acronym Finder</dc:title>
<dc:description>The Acronym Finder is a world wide
web (WWW) searchable database of more than 169,000
abbreviations and acronyms about computers,
technology, telecommunications, and military acronyms
and abbreviations.</dc:description>
<dc:subject>
<rdf:Bag>
<rdf:li>Astronomy</rdf:li>
<rdf:li>Literature</rdf:li>
<rdf:li>Mathematics</rdf:li>
<rdf:li>Music</rdf:li>
<rdf:li>Philosophy</rdf:li>
</rdf:Bag>
</dc:subject>
</rdf:Description>
</rdf:RDF>
Here you have two vocabularies going on. One is defined as rdf and assocaited with the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#. The second one is defined as dc and associated with http://purl.org/dc/elements/1.1/. The namespace local parts are then associated with various elements a needed.
One final note. URI are simply unique identifiers. They often take the shape of URLs, but need not point to anything accessible over the Web. They are just strings of text.
The semantics of an XML document (think grammar) is an articulation of what XML elements can exist in a file, their relationship(s) to each other, and their meaning. Ironically, this is the really hard part about XML and has manifested itself as a multitude of XML "languages" such as: RSS, RDF, TEILite, DocBook, XMLMARC, EAD, XSL, etc. In the following, valid, XML file there are a number of XML elements. It is these elements that give the data value and meaning:
<catalog> <work type='prose' date='1906'> <title>The Gift Of The Magi</title> <author>O Henry</author> </work> <work type='poem' date='1845'> <title>The Raven</title> <author>Edgar Allen Poe</author> </work> <work type='play' date='1601'> <title>Hamlet</title> <author>William Shakespeare</author> </work> </catalog>
In this exercise you will learn to identify syntactical errors in XML files.
Examine the following file. Circle all of it's syntactical errors, and write in the corrections.
<name>Oyster Soup</name> <author>Eric Lease Morgan</author> <copyright holder=Eric Lease Morgan>© 2003</copyright> <ingredients> <list> <item>1 stalk of celery <item>1 onion <item>2 tablespoons of butter <item>2 cups of oysters and their liquor <item>2 cups of half & half </list> </ingredients> <process> <P>Begin by sauteing the celery and onions in butter until soft. Add oysters, oyster liquor, and cream. Heat until the oysters float. Serve in warm bowls.</p> <p><i>Yummy!</p></i> </process>
Check for one and only one root element. Is there a root element?
Check for quoted attribute values. Are the attributes quoted?
Check for invalid use of entities. There are two errors in the file.
Check for properly opened and closed element tags. Five elements are not closed.
Check for properly nested elements. Two elements are not nested correctly.
Check for case-sensitive element naming. One element is not correctly cased.
Table of Contents

The "X" in XML stands for extensible. By this the creators of XML mean it should be easy to create one's own markup -- a vocabulary or language intended to describe a set of data/information. The key to creating an XML mark up language is to first articulate what the documents will be used for, and second the ability to specify the essential components of a document and assign them elements. The process of creating an XML mark up is similar to the process of designing a database application. You must ask yourself what data you will need and create places for that data to be saved.
Creating a markup for a letter serves as an excellent example:
December 11, 2002
Melvile Dewey
Columbia University
New York, NY
Dear Melvile,
I have been reading your ideas concering the nature of
librarianship, and I find them very intriguing. I would love the
opportunity to discuss with you the role of the card catalog in today's
libraries considering the advent to World Wide Web. Specifically, how
are things like Google and Amazon.com changing our patrons' expectations
of library services? Mr. Cutter and I will be discussing these ideas at
the next Annual Meeting, and we are available at the follow dates/times:
* Monday, 2-4
* Tuesday, 3-5
* Thursday, 1-3
We hope you can join us.
Sincerely, S. R. Ranganathan
As you read the letter you notice sections common to many letters. By analyzing these sections it is possible to create a list of XML elements. For example, the letter contains a date, a block of text describing the addressee, a greeting, one or more paragraphs of text, a list, and a closing statement. Upon closer examination, some of your sections have subsections. For example, the addressee has a name, a first address line, and a second address line. Further, the body of the letter might have some sort of emphasis.
The division into smaller and smaller subsections could go all the way down to individual words. Where to stop? Only create elements for pieces of data you are going to use. If you never need to know the city or state of your addressee, then don't create an element for them. Ask yourself, what is the purpose of the document? What sort of information do you want to hilight from its content? If you wanted to create lists of all the cities you sent letters to, then you will need to demarcate the values for city. If you need to extract each and every sentence from your document, then you will have to demarcate them as well. Otherwise, save yourself the time and energy and keep it simple.
Once you have articulated the parts of the document you want to mark up you have to give them names. XML element names can contain standard English letters A - Z and a - z as well as integers 0 - 9. They can also contain non-English letters and three punctuation characters: underscore (_), hyphen (-), and period (.). Element names may not contain white space (blanks, tabs, return characters), nor other punctuation marks. Play it save. Use letters.
Now it is time to actually create a few elements. Based on the previous discussion. We could create a set of element names such as this:
letter
date
addressee
name
address_one
address_two
greeting
paragraph
italics
list
item
closing
Using these elements as a framework, it is possible to mark up the text in the following manner:
<letter> <date>December 11, 2002</date> <addressee> <name>Melvile Dewey</name> <address_one>Columbia University</address_one> <address_two>New York, NY</address_two> </addressee> <greeting>Dear Melvile,</greeting> <paragraph> I have been reading your ideas concerning nature of librarianship, and <italics>I find them very intriguing</italics>. I would love the opportunity to discuss with you the role of the card catalog in today's libraries considering the advent to World Wide Web. Specifically, how are things like Google and Amazon.com changing our patrons' expectations of library services? Mr. Cutter and I will be discussing these ideas at the next Annual Meeting, and we are available at the follow dates/times: </paragraph> <list> <item>Monday, 2-4</item> <item>Tuesday, 3-5</item> <item>Thursday, 1-3</item> </list> <paragraph>We hope you can join us.</paragraph> <closing>Sincerely, S. R. Ranganathan</closing> </letter>
In this exercise you will create your own XML markup, a markup describing a simple letter.
Consider the following letter.
February 3, 2003 American Library Association 15 Huron Street Chicago, IL 12304 To Whom It May Concern: It has come to my attention that the Association no longer wants to spend money on posters of famous people advocating reading. What is wrong with you guys! Don't you know that reading is FUNdamental? These posters really get me and my patrons going. I thought they were great. Please consider re-instating the posters. Sincerely, B. Ig Reeder
As a group, decide what elements to use to mark up the letter as an XML file.
What can our root element be?
What sections make up the letter? What element names can we give these sections?
Some of the sections, such as the address, greeting, and saluation have sub-sections. What should we call these sub-sections?
Use a pen or pencil to mark up the letter above using the elements decided upon.
Mark up the letter as an XML document, and validate its syntax using a Web browser.
Use NotePad to open the file named ala.txt on the distributed CD.
Add the root element to the beginning and ending of the file.
Mark up each section and sub-section of the letter with the element names decided upon.
Save the file with the name ala.xml.
Open ala.xml in your Web browser, and fix any errors that it may report. If there are no errors, then congratulations, you have marked up your first XML document.
Table of Contents

Creating your own XML mark up is all well and good, but if you want to share your documents with other people you will need to communicate to these other people the vocabulary your XML documents understand. This is the semantic part of XML documents -- what elements do your XML files contain and how are the elements related to each other? These semantic relationships are created using Document Type Definitions (DTD) and/or XML Schemas. DTDs are legacy implementations from the SGML world. They are more commonly used than the newer, XML-based, XML Schemas. This section provides an overview for creating DTDs.
DTDs can exist inside an XML document or outside an XML document. If they reside in an XML document, then they begin with a DOCTYPE declaration followed by the name of the XML document's root element and finally a list of all the elements and how they are related to each other. Here is a simple DTD for embedded in the pets.xml file itself:
<!DOCTYPE pets [ <!ELEMENT pets (pet+)> <!ELEMENT pet (name, age, type, color)> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT type (#PCDATA)> <!ELEMENT color (#PCDATA)> ]> <pets> <pet> <name>Tilly</name> <age>14</age> <type>cat</type> <color>silver</color> </pet> <pet> <name>Amanda</name> <age>10</age> <type>dog</type> <color>brown</color> </pet> <pet> <name>Jack</name> <age>3</age> <type>cat</type> <color>black</color> </pet> <pet> <name>Blake</name> <age>12</age> <type>dog</type> <color>blue</color> </pet> <pet> <name>Loosey</name> <age>1</age> <type>cat</type> <color>brown</color> </pet> <pet> <name>Stop</name> <age>5</age> <type>pig</type> <color>brown</color> </pet> </pets>
More commonly, DTDs reside outside an XML document since they are intended to be used by many XML files. In this case, the DOCTYPE declaration includes a pointer to a file where the XML elements are described.
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
Whether or not the DTD is internal or external, a list of XML elements needs to be articulated. Each item on the list will look something like !ELEMENT pets (pet+) where "!ELEMENT" denotes an element, "pets" is the element being defined, and "(pet+)" is the definition. The definitions are the difficult part. There are many different types of values the definitions can include, and only a few of them are described here.
First of all, the definitions can include the names of other elements. In our example above, the first declaration defines and element called pets and it is allowed to include just on other element, pet. Similarly, the element defined as pet is allowed to contain four other elements: name, age, type, and color. Each element is qualified by how many times it can occur in the XML document. This is done with the asterisk (*), question mark (?), and plus sign (+) symbols. Each of these symbols have a specific meaning:
asterisk (*) - The element may appear zero or more times
question mark (?) - The element may appear zero or one time, only
plus sign (+) - The element appears at least once if not more times
If an element is not qualified with one of these symbols, then the element can appear once and only once. Consequently, in the example above, since pets is defined to contain the element pet, and the pet element is qualified with a plus sign, there must be at least one pet element within the pets element.
There is another value for element definitions you need to know, #PCDATA. This stands for parsed character data, and it is used to denote content that contains only text, text without markup.
Finally, it is entirely possible that an element will contain multiple, sub elements. When strung together, this list of multiple elements is called a sequence, and they can be grouped together in the following ways:
comma (,) is used to denote the expected order of the elements in the XML file
parentheses (()) are used to group elements together
vertical bar (|) is used to denote a Boolean union relationship between the elements.
Walking through the DTD for pets.xml we see that:
The root element of the document should is pets.
The root element, pets, contains at least one pet element.
Each pet element can contain one and only one name, age, type, and color element, in that order.
The elements name, age, type, and color are to contain plain text, no mark up.
Below is a DTD for the letter in a previous example.
<!ELEMENT letter (date, addressee, greeting, (paragraph+ | list+)*, closing)> <!ELEMENT date (#PCDATA)> <!ELEMENT addressee (name, address_one, address_two)> <!ELEMENT name (#PCDATA)> <!ELEMENT address_one (#PCDATA)> <!ELEMENT address_two (#PCDATA)> <!ELEMENT greeting (#PCDATA)> <!ELEMENT paragraph (#PCDATA | italics)*> <!ELEMENT italics (#PCDATA)> <!ELEMENT list (item+)> <!ELEMENT item (#PCDATA)> <!ELEMENT closing (#PCDATA)>
This example is a bit more complicated. Walking through it we see that:
The letter element contains one date element, one adressee element, one greeting element, at least one paragraph or at least one list element, and one closing element.
The date element contains plain o' text, no markup.
The addressee element contains one and only one name, address_one, and address_two element, in that order.
The name, address_one, address_two, and greeting elements contain text, no markup.
The paragraph element can contain plain text or the italics element.
The italics element contains plain, non-marked up, text.
The list element contains at least one item element.
The item and closing elements contain plain text.
To include this DTD in our XML file, we must create pointer to the DTD, and since the DTD is local to our environment, and not a standard, the pointer should be included in the XML document looking like this:
<!DOCTYPE letter SYSTEM "letter.dtd">
<letter>
<date>
December 11, 2002
</date>
<addressee>
<name>
Melvile Dewey
</name>
<address_one>
Columbia University
</address_one>
<address_two>
New York, NY
</address_two>
</addressee>
<greeting>
Dear Melvile,
</greeting>
<paragraph>
I have been reading your ideas concerning nature of librarianship,
and <italics>I find them very intriguing</italics>. I would love
the opportunity to discuss with you the role of the card catalog
in today's libraries considering the advent to World Wide Web.
Specifically, how are things like Google and Amazon.com changing
our patrons' expectations of library services? Mr. Cutter and I
will be discussing these ideas at the next Annual Meeting, and we
are available at the follow dates/times:
</paragraph>
<list>
<item>
Monday, 2-4
</item>
<item>
Tuesday, 3-5
</item>
<item>
Thursday, 1-3
</item>
</list>
<paragraph>
We hope you can join us.
</paragraph>
<closing>
Sincerely, S. R. Ranganathan
</closing>
</letter>
By feeding this XML to an XML processor, the XML processor should know that the element named letter is the root of the XML file, and the XML file can be validated using a local, non-standardized DTD file named letter.dtd.
In this exercise your knowledge of DTDs will be sharpened by examining an existing DTD, and then you will write your own DTD.
Consider the DTD describing the content of the catalog.xml file, below, and on the back of this paper write the answers the following questions:
<!ELEMENT catalog (caption, structure, work+)> <!ELEMENT caption (#PCDATA)> <!ELEMENT structure (title, author, type, date)> <!ELEMENT work (title, author, type, date)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT type (#PCDATA)> <!ELEMENT date (#PCDATA)>
How many elements can the catalog element contain, and what are they?
How many works can any one catalog.xml file contain?
Can marked up text be included in the title element? Explain why or why not.
If this DTD is intended to be a locally developed DTD, and intended to be accessed from outside the XML document, how would you write the DTD declaration appearing in the XML file?
Create an internal DTD for the file ala.xml, and validate the resulting XML file.
Open ala.xml in NotePad.
Add an internal document type declaration to the top of the file, <!DOCTYPE letter [ ]>.
Between the square brackets ([]), enter the beginings of an element declaration for each element needed to be defined (i.e. letter, date, address, greeting, etc.). For example, type <!ELEMENT para ()> for the paragraph element between the square brackets.
For each element define its content by listing either other element names or #PCDATA, depending on how the XML file is structured. Don't forget to append either a plus sign (+), an asterisk (*), or a question mark (?) to denote the number of times an element or list of elements may appear in the XML file.
Save ala.xml.
Select and copy the entire contents of ala.xml to the clipboard.
Open your Web browser, and validate your XML file by using a validation form .
In this exercise you will use xmllint to validate a locally defined (system) DTD. While using a remote service such as the one mentioned above is easy, it really behooves you to be more self-reliant than that.
Install xmllint. Xmllint is a validation program written against a set of C libraries called libxml2 and libxslt. Installing these libraries on Unix follows the normal installation processes: download, unzip, untar, configure, make, and install. On Windows is it much easier to download the pre-compiled binaries making sure you download the necessary .dll files. All of these files have been saved in a directory on the CD called libxml. Simply copy this directory to your C drive and/or add the libxml directory to your PATH environment variable. Once done you should be able to enter xmllint and xsltproc from the command line and see lot's of help text.
Open a command prompt and change directories to the getting-started directory of the workshop's distribution.
Validate letter.xml against letter.dtd using this command: xmllint --dtdvalid letter.dtd letter.xml . If everything goes well you should see a stream of XML without any errors. If you want to repress the stream of XML then add --noout: xmllint --noout --dtdvalid letter.dtd letter.xml .
To good to be true? Change something about letter.xml to make it invalid, and try validating it again to demonstrate that xmllint is working correctly.
If you have not had the joy of trying to fix an XML file based on the output of xmllint, then here is your chance. In this exercise you will make an XML document validate against a DTD.
Browse the content of the directory xml-data/ead/broken/. It contains sets of well-formed XML documents that validated against an older version of an EAD (Encoded Archival Description) DTD.
Open an EAD file in your favorite text editor, say ncw.xml. Notice its structure. At first glance, especially to the uninitiated, the file seems innocuous.
Open a command prompt and change directories to the root of the workshop's distribtuion.
Validate ncw.xml against the latest version of the EAD DTD supplied on the CD: xmllint --noout --dtdvalid dtds/ead/ead.dtd xml-data/ead/broken/ncw.xml . You should get output looking something like this:
xml-data/ead/broken/ncw.xml:5: element eadheader: validity error : Syntax of value for attribute langencoding of eadheader is not valid
xml-data/ead/broken/ncw.xml:21: element archdesc: validity error : Element archdesc content does not follow the DTD, expecting (runner* , did , (accessrestrict | accruals | acqinfo | altformavail | appraisal | arrangement | bibliography | bioghist | controlaccess | custodhist | descgrp | fileplan | index | odd | originalsloc | otherfindaid | phystech | prefercite | processinfo | relatedmaterial | scopecontent | separatedmaterial | userestrict | dsc | dao | daogrp | note)*), got (did admininfo scopecontent bioghist controlaccess dsc)
xml-data/ead/broken/ncw.xml:21: element archdesc: validity error : No declaration for attribute langmaterial of element archdesc
xml-data/ead/broken/ncw.xml:21: element archdesc: validity error : No declaration for attribute legalstatus of element archdesc
xml-data/ead/broken/ncw.xml:35: element admininfo: validity error : No declaration for element admininfo
Document xml-data/ead/broken/ncw.xml does not validate against dtds/ead/ead.dtd
Yuck!
You can make the document validate by opening up ncw.xml in your text editor and then:
deleting the space in the langencoding attribute of the eadheader element
deleting the admininfo element and all of its children
deleting the langmaterial name/value from the archdesc element
deleting the legalstatus name/value from the archdesc element
Save your changes, and validate the document again.
Table of Contents
Table of Contents

Cascading style sheets (CSS) represent a method for rendering XML files into a more human presentation. CSS files exemplify a method for separating presentation from content.
CSS have three components: layout, typography, and color. By associating an XML file with a CSS file and processing them with a Web browser, it is possible to display the content of the XML file in an aesthetically pleasing manner.
CSS files are made up of sets of things called selectors and declarations. Each selector in a CSS file corresponds to an element in an XML file. Each selector is then made of up declarations -- standardized name/value pairs -- denoting how the content of XML elements are to be displayed. They look something like this: note { display: block; }.
Here is a very simple XML document describing a note:
<?xml-stylesheet type="text/css" href="note.css"?> <note> <para>Notes are very brief documents.</para> <para>They do not contain very much content.</para> </note>
The first thing you will notice about the XML document is the addition of the very first line, an XML processing instruction. This particular instruction tells the application reading the XML file to render it using a CSS file named note.css. The balance of the XML file should be familiar to you.
If I wanted to display the contents of the note such that each paragraph were separated by a blank line, then the CSS file might look like this:
note { display: block; }
para { display: block; margin-bottom: 1em; }
In this CSS file there are two selectors corresponding to each of the elements in the XML file: note and para. Each selector is associated with one or more name/value pairs (declarations) describing how the content of the elements are to be displayed. Each name is separated from the value by a colon (:), the name/value pairs are separated from each other by a semicolon (;), and all the declarations associated with a selector are grouped together with curly braces({}).
Opening note.xml in a relatively modern Web browser should result in something looking like this:

Be forewarned. Not all web browsers support CSS similarly. (What a surprise!) In general, you will get minimal performance from Netscape Navigator 4.7 and Internet Explorer 5.0. Much better implementations of CSS are built into Mozilla 1.0 and Internet Explorer 6.0. Your milage will vary.
The key to using CSS files is knowing how to create the name/value pair declarations. For a comprehensive list of these name/value pairs see the World Wide Web Consortium's description of CSS . A number of them are described below.
The display property is used to denote whether or not an element is to be displayed, and if so, how but only in a very general way. The most important values for display are: inline, block, list-item, or none. Inline is the default value. This means the content of the element will not include a line break after the content; the content will be displayed as a line of text. Giving display a value of block does create line breaks after the content of the element. Think of blocks as if they were paragraphs. The list-item value is like block, but it also indents the text just a bit. The use of none means the content will not be displayed; the content is hidden. Examples include:
display: none;
display: inline;
display: block;
display: list-item;
The margin property is used to denote the size of white space surrounding blocks of text. Values can be denoted in terms of percentages (%), pixels (px), or traditional typographic conventions such as the em unit (em). When the simple margin property is given a value, the value is assigned to the top, bottom, left, and right margins simultaneously. It is possible to specify specific margins using the margin-top, margin-bottom, margin-left, and margin-right properties. Examples include:
margin: 5%;
margin: 10px;
margin-top: 2em;
margin-left: 85%;
margin-right: 50px;
margin-bottom: 1em;
Like the margin property, the text-indent property can take percentages, pixels, or typographic units for values. This property is used to denote how particular lines in blocks of text are indented. For example:
text-indent: 2em;
text-indent: 3%;
Common values for text-align are right, left, center, and justify. They are used to line up the text within a block of text. These values operate in the same way your word processor aligns text. For example:
text-align: right;
text-align: left;
text-align: center;
text-align: justify;
Bulleted lists are easy to read and used frequently in today's writing styles. If you want to create a list, then you will want to use first use the selector display: list-item for the list in general, and then something like disc, circle, square, or decimal for the list-style value. For example:
list-style: circle;
list-style: square;
list-style: disc;
list-style: decimal;
Associate font-family with a selector if you want to describe what font to render the XML in. Values include the names of fonts as well as a number of generic font families such as serif or sans-serif. Font family names containing more than one word should be enclosed in quotes. Examples:
font-family: helvetica;
font-family: times, serif;
font-family: 'cosmic cartoon', sans-serif;
The sizes of fonts can be denoted with exact point sized as well as relative sizes such as small, x-small, or large. For example:
font-size: 12pt;
font-size: small;
font-size: x-small;
font-size: large;
font-size: xx-large;
Usual values for font-style are normal or italic denoting how the text is displayed as in:
font-style: normal;
font-style: italic;
This is used to denote whether or not the font is displayed in bold text or not. Typical values for font-weight are normal and bold:
font-weight: normal;
font-weight: bold;
Below is a CSS file intended to be applied against the letter.xml file previously illustrated. Notice how each element in the XML file has a corresponding selector in the CSS file. In order to tell your Web browser to use this CSS file, you will have to add the xml-stylesheet processing instruction (<?xml-stylesheet type="text/css" href="letter.css" ?>) to the top of letter.xml.
letter {
display: block;
margin: 5%;
}
date, addressee {
display: block;
margin-bottom: 1em;
}
name, address_one, address_two { display: block; }
greeting, list {
display: block;
margin-bottom: 1em;
}
paragraph {
display: block;
margin-bottom: 1em;
text-indent: 1em;
}
italics {
display: inline;
font-style: italic;
}
list { display: block; }
item {
display: list-item;
list-style: inside;
text-indent: 2em;
}
closing {
display: block;
margin-top: 3em;
text-align: right;
}
Once rendered the resulting XML file should look something like this:

Tables are two-dimensional lists; they are a matrix of rows and columns. A very simple list of books (a catalog) lends itself to a tabled layout since each book (work) in the list has a number of qualities such as title, author, type, and date. Each work represents a row, and the title, author, type, and date represent columns.
Here is an XML file representing a simple, rudimentary catalog. Notice the XML processing instruction directing any XML processor to render the content of the file using the CSS file catalog.css:
<?xml-stylesheet href='catalog.css' type='text/css'?> <catalog> <caption>This is my personal catalog.</caption> <structure> <title>Title</title> <author>Author</author> <type>Type</type> <date>Date</date> </structure> <work> <title>The Gift Of The Magi</title> <author>O Henry</author> <type>prose</type> <date>1906</date> </work> <work> <title>The Raven</title> <author>Edgar Allen Poe</author> <type>prose</type> <date>1845</date> </work> <work> <title>Hamlet</title> <author>William Shakespeare</author> <type>prose</type> <date>1601</date> </work> </catalog>
CSS provides support for tables, but again, present-day browsers do not render tables equally well. To create a table you must you must learn at least three new values for an element's display value:
display: table;
display: table-row;
display: table-cell;
Using the catalog example above, display: table will be associated with the catalog element, display: table-row will be associated with the work element, and display: table-cell will be associated with the title, author, type, and date elements.
Additionally, you might want to use these values to make your tables more complete as well as more accessible:
display: table-caption;
display: table-header-group;
Table-caption is used to give an overall description of the table. Table-header-group is used to denote the labels for the column headings.
In this exercise you will learn how to write a CSS file and use it to render an XML file.
Create a CSS file intended to render the file named ala.xml created in a previous exercise.
Open ala.xml in NotePad.
Add the XML processing instruction <?xml-stylesheet href="ala.css" type="text/css"?> to the top of the file. Save it.
Create a new, empty file in NotePad, and save it as ala.css.
In ala.css, list each XML element in ala.xml on a line by itself.
Assign each element a display selector with a value of block (ex. para { display: block; }).
Open ala.xml in your Web browser to check your progress.
Add a blank line between each of the letter's sections by adding a margin-bottom: 1em to each section's selector (ex. para { display: block; margin-bottom: 1em; }).
Open ala.xml in your Web browser to check on your progress.
Change the display selector within the salutation so its sub-element is displayed as inline text, not a block (ex. salutation { display: inline; } ).
Open ala.xml in your Web browser to check on your progress.
Indent the paragraphs by adding text-indent: 2em; to the para element. The final result should look something like this:

Table of Contents

Besides CSS files, there is another method for transforming XML documents into something more human readable. Its called eXtensible Stylesheet Language: Transormation (XSLT). XSLT is a programming language implemented as an XML semantic. Like CSS, you first write/create an XML file, you then write an XSLT file and use a computer program to combine the two to make a third file. The third file can be any plain text file including another XML file, a narrative text, or even a set of sophisticated commands such as structured query language (SQL) queries intended to be applied against a relational database application.
Unlike CSS or XHTML, XSLT is a programming language. It is complete with input parameters, conditional processing, and function calls. Unlike most programming languages, XSLT is declarative and not procedural. This means parts of the computer program are executed as particular characteristics of the data are met and less in a linear top to bottom fashion. This also means it is not possible to change the value of variables once they have been defined.
There are a number of XSLT processors available for various Java, Perl, and operating-system specific platforms:
Xerces and Xalan - Java-based implementations
xsltproc - A binary application built using a number of C libraries, and also comes with a program named xmllint used to validate XML documents
Sablotron - Another binary distribution built using C++ libraries and has both a Perl and a Python API
Saxon - another Java implementation
XSLT is a programming language in the form of an XML file. Therefore, each of the commands is an XML element, and commands are qualified using XML attributes. Here is a simple list of some of those commands:
stylesheet - This is the root of all XSLT files. It requires attributes defining the XSLT namespace and version number. This is pretty much the standard XSLT stylesheet definition: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">.
output - This is used to denote what type of text file will be created as output and whether or not it requires indentation and/or a DTD specification. For example, this use of output tells the XSLT processor to indent the output to make the resulting text easier to read: <xsl:output indent="yes" />.
template - This command is used to match/search for a particular part of an XML file. It requires an attribute named match and is used to denote what branch of the XML tree to process. For example, this use of template identifies all the things in the root element of the XML input file: <xsl:template match="/">.
value-of - Used to output the result of the required attribute named select which defines exactly what to output. In this example, the XSLT processor will output the value of a letter's date element: <xsl:value-of select="/letter/date/" />.
apply-templates - Searches the current XSLT file for a template named in the command's select statement or outputs the content of the current node of the XML file if there is no corresponding template. Here the apply-templates command tells the processor to find templates in the current XSLT file matching paragraph or list elements: <xsl:apply-templates select="paragraph | list" />.
Besides XSLT commands (elements), XSLT files can contain plain text and/or XML markup. When this plain text or markup is encountered, the XSLT processor is expected to simply output these values. This is what allows us to create XHTML output. The processor reads an XML file as well as the XSLT file. As it reads the XSLT file it processes the XSLT commands or outputs the values of the non-XSLT commands resulting in another XML file or some other plain text file.
In this exercise you will transform the simpliest of XML documents using XSLT.
Here is a very simple XML document:
<content>Hello, World!</content>
Our goal is to transform this document into a plain text output. To do that we will use this XSLT stylesheet:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- plain o' text --> <xsl:output method='text'/> <!-- match the root element --> <xsl:template match="/content"> <!-- output the contents of content and a line-feed --> <xsl:value-of select='.'/> <xsl:text>
</xsl:text> <!-- clean up --> </xsl:template> </xsl:stylesheet>
This is what the stylesheet does:
Defines itself as an XML file
Defines itself as an XSLT stylesheet
Defines the output format as plain text
Looks for the root element, content.
Outputs the value of the root element
Outputs a line-feed character
Closes the opened elements
Give the stylesheet a try:
Opening a command prompt.
Change directories to the getting-started directory of the workshop's directory.
Transform the hello-world.xml file: xsltproc hello-world.xsl hello-world.xml .
In a previous exercise you created a letter (ala.xml) using our own mark up. We will now use XSLT to transform it to plain text file.
As review, open ala.xml. Not too complicated.
To create a plain text version of this file, we will use the following stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- plain o' text -->
<xsl:output method='text'/>
<!-- let's get started -->
<xsl:template match="/letter">
<!-- date -->
<xsl:text>
</xsl:text>
<xsl:value-of select='normalize-space(date)'/>
<xsl:text>

</xsl:text>
<!-- address -->
<xsl:value-of select='normalize-space(addressee/name)'/>
<xsl:text>
</xsl:text>
<xsl:value-of select='normalize-space(addressee/address_one)'/>
<xsl:text>
</xsl:text>
<xsl:value-of select='normalize-space(addressee/address_two)'/>
<xsl:text>

</xsl:text>
<!-- greeting -->
<xsl:value-of select='normalize-space(greeting)'/>
<xsl:text>

</xsl:text>
<!-- paragraphs -->
<xsl:for-each select='paragraph'>
<xsl:value-of select='normalize-space(.)'/>
<xsl:text>

</xsl:text>
</xsl:for-each>
<!-- closing -->
<xsl:value-of select='normalize-space(closing)'/>
<xsl:text>

</xsl:text>
</xsl:template>
</xsl:stylesheet>
Here is how the stylesheet works:
Like before, the file is denoted as an XML file and specifically an XSLT stylesheet.
Like before, output is defined as plain o' text.
The root of the XML to be transformed is located.
The stylesheet outputs a line feed, outputs the value of the date element while removing extraneous white space, and outputs two more line feeds.
The stylesheet continues in this fashion for the address and greeting elements.
The stylesheet loops through each paragraph element, normalizes the content it finds, and ouputs line feeds after each one.
The stylesheet finishes by reading the closing element, and then closing all opened elements.
Try using the stylesheet:
Open a command prompt and change directories to the getting-started directory.
Transform the document: xsltproc ala2txt.xsl ala.xml .
In this exercise you will learn how to make XSLT file a bit more modular and less like CSS files.
Many of the elements of the file ala.xml where intended to be processed similarly. Computer programmers don't like to do the same thing over and over again. The want to code it once and leave it at that.
Note the following stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- plain ol' text; get rid of some white space -->
<xsl:output method='text'/>
<xsl:strip-space elements='*'/>
<!-- let's get started -->
<xsl:template match="/letter">
<!-- add a line-feed for formatting's sake -->
<xsl:text>
</xsl:text>
<!-- do the work -->
<xsl:apply-templates/>
</xsl:template>
<!-- trap all the various elements -->
<xsl:template match='date | address_two | greeting | paragraph | closing'>
<xsl:value-of select='normalize-space(.)'/>
<xsl:text>

</xsl:text>
</xsl:template>
<xsl:template match='name | address_one'>
<xsl:value-of select='normalize-space(.)'/>
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match='list'>
<xsl:apply-templates/>
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match='item'>
<!-- insert a tab, an asterisk, and a space for formatting -->
<xsl:text>	* </xsl:text>
<xsl:value-of select='normalize-space(.)'/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
It works very much like the previous examples but with a number of exceptions:
The use of the strip-space eliminates extraneous spaces within elements
The use of apply-templatges is very important. When this element is encountered the XSLT processor transverses the XML input for elements. When it finds them the processor looks for a matching template containing the current element. If it finds a matching template, then processing is done within the template element. Otherwise the element's content is output.
Notice how the date, address_two, greeting, paragraph, and closing elements are all processed the same.
Notice how the name and adress_one elements are processed the same
The list and item elements are a bit tricky. The list element is first trapped and then the item element is rendered. Processing then returns to the template for list elements and a simple line feed is output
Transform letter.xml with letter2text.xsl:
Open a command prompt to the getting-started directory of the workbook's CD.
Transform letter.xml: xsltproc letter2txt.xsl letter.xml .
Below is our first XSLT/XHTML example. Designed to be applied against the file named letter.xml, it will output a valid XHTML file. You can see this in action by using an XSLT processor named xsltproc. Assuming all the necessary files exist in the same directory, the xstlproc command is xsltproc -o letter.html letter2html.xsl letter.xml .
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<!-- letter2html.xsl; an XSL file -->
<!-- define the output as an XML file, specficially, an XHTML file -->
<xsl:output
method="xml"
omit-xml-declaration="no"
indent="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
<!-- start at the root of the file, letter -->
<xsl:template match="letter">
<!-- output an XHTML root element -->
<html>
<!-- open the XHTML's head element -->
<head>
<!-- output a title element with the addressee's name -->
<title><xsl:value-of select="addressee/name"/></title>
<!-- close the head element -->
</head>
<!-- open the body tag and give it some style -->
<body style="margin: 5%">
<!-- find various templates in the XSLT file with their
associated values -->
<xsl:apply-templates select="date"/>
<xsl:apply-templates select="addressee"/>
<xsl:apply-templates select="greeting"/>
<xsl:apply-templates select="paragraph | list" />
<xsl:apply-templates select="closing"/>
<!-- close the body tag -->
</body>
<!-- close the XHTML file -->
</html>
</xsl:template>
<!-- date -->
<xsl:template match="date">
<!-- output a paragraph tag and the content of the current
node, date -->
<p><xsl:apply-templates/></p>
</xsl:template>
<!-- addressee -->
<xsl:template match="addressee">
<!-- open a paragraph -->
<p>
<!-- output the content of letter.xml's name, address_one,
and address_two elements, as well a couple br tags -->
<xsl:value-of select="name"/><br />
<xsl:value-of select="address_one"/><br />
<xsl:value-of select="address_two"/>
<!-- close the paragraph -->
</p>
</xsl:template>
<!-- each of the following templates operate exactly like the
date template -->
<!-- greeting -->
<xsl:template match="greeting">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<!-- paragraph -->
<xsl:template match="paragraph">
<p style="text-indent: 1em">
<xsl:apply-templates/>
</p>
</xsl:template>
<!-- closing -->
<xsl:template match="closing">
<p style="margin-top: 3em; text-align: right">
<xsl:apply-templates/>
</p>
</xsl:template>
<!-- italics -->
<xsl:template match='italics'>
<i>
<xsl:apply-templates/>
</i>
</xsl:template>
<!-- list -->
<xsl:template match='list'>
<ul>
<xsl:apply-templates/>
</ul>
</xsl:template>
<!-- item -->
<xsl:template match='item'>
<li>
<xsl:apply-templates/>
</li>
</xsl:template>
</xsl:stylesheet>
The end result should look something like this:

Admittedly, the example above looks rather complicated and truthfully functions exactly like our CSS files. At the same time, displaying the letter.xml file with CSS requires a modern browser. If the letter2html.xsl file were incorporated into a Web server, then Web browser's would not need to understand CSS. Given the example above, there is not a compelling reason to use XSLT, yet.
Here is yet another example of transforming an XML document into an HTML document. The XSLT file below is intended to convert a CIMI Schema document (an XML vocabulary used to describe objects in museum collections) into an HTML file. Once processed, this XSLT file will:
output an HTML declaration
find the root of the CIMI Schema document
output the beginnings of an HTML document
loop through all the object elements of the CIMI Schema document outputing an unordered list of hypertext links pointing to a set of images
output the end of an HTML document
<?xml version="1.0"?>
<!-- cimi2html.xsl - convert a CIMI Schema document into a rudimentary HTML file -->
<!-- Eric Lease Morgan (emorgan@nd.edu) - October 20, 2003 -->
<!-- lots o' credit goes to Stephen Yearl of Yale who helped with XSL weirdness! -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:c="http://www.cimi.org/wg/xml_spectrum/Schema-v1.5"
version="1.0">
<!-- output an HTML header -->
<xsl:output method='html'
doctype-public='-//W3C//DTD XHTML 1.0 Transitional//EN'
doctype-system='http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'
indent='no' />
<!-- find the root of the input -->
<xsl:template match="/">
<!-- start the XHTML output -->
<html>
<body>
<h1>Water Collection</h1>
<ol>
<!-- find all the Schemas object -->
<xsl:apply-templates />
</ol>
</body>
</html>
</xsl:template>
<!-- trap the objects of the file -->
<xsl:template match="//c:object">
<!-- extract the parts of the object we desire and format them -->
<li>
<a>
<xsl:attribute name='href'>
<xsl:value-of select='./c:reproduction/c:location' />
</xsl:attribute>
<xsl:value-of select='./c:identification/c:object-title/c:title' />
</a> -
<xsl:value-of select='./c:identification/c:comments' />
(Collected by
<xsl:value-of select='./c:acquisition/c:source/c:source/c:person/c:name/c:forename' />
<xsl:value-of select='./c:acquisition/c:source/c:source/c:person/c:name/c:surname' />
on
<xsl:value-of select='./c:acquisition/c:accession-date/c:year' />
-
<xsl:value-of select='./c:acquisition/c:accession-date/c:month' />
-
<xsl:value-of select='./c:acquisition/c:accession-date/c:day' />
.)
</li>
</xsl:template>
</xsl:stylesheet>
In this exercise you will transform an XML document using XSLT.
Create a directory on your computer's desktop.
Copy all the *.dll files from the CD to your newly created directory.
Copy all the *.exe files from the CD to your newly created directory.
Copy cimi2html.xsl and water.xml from the CD to your newly created directory.
Open a new terminal window by running cmd.exe from the Start menu's Run command.
Change directories to your newly created directory.
Transform the XML document into an HTML document using this command: xsltproc -o water.html cimi2html.xsl water.xml .
Open the newly created file named water.html in your Web browser.
In this part of the exercise you will change the content of the output.
Open cimi2html.xsl in your text editor.
Add a signature as a footer; insert <p> Brought to you by [yourname]. </p> after the </ol> element of the XSLT file.
Process the XML again: xsltproc -o water.html cimi2html.xsl water.xml .
Open and/or reload the output, water.html, in your browser.
Go to Step #2 and make some other changes until you get tired.
Here is an other example of an XSLT file used to render an XML file. This example renders our catalog.xml file. It too functions very much like a plain o' CSS file. You can transform it using xsltproc like this: xsltproc -o catalog.html catalog2html.xsl catalog.xml .
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<!-- catalog2html.xsl -->
<xsl:output
method="xml"
omit-xml-declaration="no"
indent="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
<!-- catalog -->
<xsl:template match="catalog">
<html>
<head>
<title><xsl:value-of select="caption"/></title>
</head>
<body>
<table>
<xsl:apply-templates select="caption"/>
<xsl:apply-templates select="structure"/>
<xsl:apply-templates select="work"/>
</table>
</body>
</html>
</xsl:template>
<!-- caption -->
<xsl:template match="caption">
<caption style="text-align: center; margin-bottom: 1em">
<xsl:value-of select="."/>
</caption>
</xsl:template>
<!-- structure -->
<xsl:template match="structure">
<thead style="font-weight: bold">
<tr><xsl:apply-templates/></tr>
</thead>
</xsl:template>
<!-- work -->
<xsl:template match="work">
<tr><xsl:apply-templates/></tr>
</xsl:template>
<!-- title -->
<xsl:template match="title">
<td style="text-align: right; padding: 3px"><xsl:value-of select="."/></td>
</xsl:template>
<!-- author, type, or date -->
<xsl:template match="author | type | date">
<td><xsl:value-of select="."/></td>
</xsl:template>
</xsl:stylesheet>
Again, the end result should look something like this:

CSS files, just like the XSLT files above, process the XML input from top to bottom. This technique does not take advantage of the programmatic characteristics of XSLT. The next example does. First of all, the next example takes input, namely a value to sort by. Second, this XSLT file takes advantage of a few function calls such as count, sum and sort. Herein lies an important distinction between CSS and XSLT. CSS is intended for display, only. XSLT can be used to display XML content. It can be used to manipulate content as well.
In this example, calculations are done on our list of pets. First of all, a count of the number of pets is displayed as well as their average age. Second, the list of pets can be sorted by their name, age, type, or color. To see this in action, try the following command: xsltproc -o pets.html --stringparam sortby age pets2html.xsl pets.xml . Different output can be gotten by changing the sortby value to name, color, or type. What happens if an invalid sortby value is passed to the XSLT file? What happens to the output if no --stringparam values are passed? Why?
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<!-- pets2html.xsl -->
<xsl:output
method="xml"
omit-xml-declaration="no"
indent="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
<!-- get an input parameter and save it to the variable named
sortby; use name by default -->
<xsl:param name="sortby" select="'name'"/>
<!-- pets -->
<xsl:template match="pets">
<html>
<head>
<title>Pets</title>
</head>
<body style="margin: 5%">
<h1>Pets</h1>
<ul>
<!-- use the count function to determine the number of pets -->
<li>Total number of pets: <xsl:value-of select="count(pet)"/></li>
<!-- calculate the average age of the pets by using the sum
and count functions, as well as the div operator -->
<li>Average age of pets: <xsl:value-of select="sum(pet/age) div count(pet)"/></li>
</ul>
<p>Pets sorted by: <xsl:value-of select="$sortby"/></p>
<table>
<thead>
<tr>
<td style="text-align: right; font-weight: bold">Name</td>
<td style="text-align: right; font-weight: bold">Age</td>
<td style="font-weight: bold">Type</td>
<td style="font-weight: bold">Color</td>
</tr>
</thead>
<xsl:apply-templates select="pet">
<!-- sort the pets by a particular sub element ($sortby); tricky! -->
<xsl:sort select="*[name()=$sortby]"/>
</xsl:apply-templates>
</table>
</body>
</html>
</xsl:template>
<!-- pet -->
<xsl:template match="pet">
<tr><xsl:apply-templates/></tr>
</xsl:template>
<!-- name -->
<xsl:template match="name">
<td style="text-align: right"><xsl:value-of select="."/></td>
</xsl:template>
<!-- age -->
<xsl:template match="age">
<td style="text-align: right"><xsl:value-of select="."/></td>
</xsl:template>
<!-- type or color -->
<xsl:template match="type | color">
<td><xsl:value-of select="."/></td>
</xsl:template>
</xsl:stylesheet>
Here some same output:

This final example uses the pets.xml file, again. This time the XSLT file is used to create another type of output, namely a very simple set of SQL statements. The point of this example is to illustrate how the pets.xml file can be repurposed. Once for display, and once for storage. Use this command to see the result: xsltproc -o pets.sql pets2sql.xsl pets.xml . What could you do to make the output prettier?
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<!-- pets2sql.xsl -->
<!-- create plain o' text output -->
<xsl:output method="text" />
<!-- find each each pet -->
<xsl:template match="pets">
<!-- loop through each pet -->
<xsl:for-each select="pet">
<!-- output an SQL INSERT statement for the pet -->
INSERT INTO pets (name, age, type, color)
WITH VALUES ('<xsl:value-of select="name" />',
'<xsl:value-of select="age" />',
'<xsl:value-of select="type" />',
'<xsl:value-of select="color" />');
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
SQL created by the XSLT file above looks like this:
INSERT INTO pets (name, age, type, color) WITH VALUES ('Tilly', '14', 'cat', 'silver');
INSERT INTO pets (name, age, type, color) WITH VALUES ('Amanda', '10', 'dog', 'brown');
INSERT INTO pets (name, age, type, color) WITH VALUES ('Jack', '3', 'cat', 'black');
INSERT INTO pets (name, age, type, color) WITH VALUES ('Blake', '12', 'dog', 'blue');
INSERT INTO pets (name, age, type, color) WITH VALUES ('Loosey', '1', 'cat', 'brown');
INSERT INTO pets (name, age, type, color) WITH VALUES ('Stop', '5', 'pig', 'brown');
This file could then be feed to a relational database program that understands SQL and populate a table with data.
This section barely scratched the surface of XSLT. It is an entire programming language unto itself and much of the promise of XML lies in the exploitation of XSLT to generate various types of output be it output for Web browsers, databases, or input for other computer programs.
In this exercise you will learn how to use your Web browser to transform and display XML data using TEI files as examples. This exercise assumes you are using a rather modern browser such as a newer version of Mozilla, Firefox, or Internet Explorer since these browsers have XSLT transformation capability built-in.
Open the XSLT stylesheet xslt/tei2html.xsl in your Web browser. Notice how it initializes a valid XHTML DTD declaration as well as provides the framework for rich metadata in the head element. Notice how the content of the body is created by searching the document for div1 tags to build a table of contents. Finally, notice how the balance of the body's content is created by trapping a limited number of TEI elements and transforming them into HTML.
Open a TEI file, say xml-data/tei/machiavelli-prince-1081003648.xml in your Web browser. It should display the raw XML data.
Open xml-data/tei/machiavelli-prince-1081003648.xml in your favorite text editor and make the following text the second XML processing instruction in the file: <?xml-stylesheet type='text/xsl' href='../../xslt/tei2html.xsl'?>. In other words insert the processing instruction before the beginning of the DTD declaration. Save the edited file.
Open, again, xml-data/tei/machiavelli-prince-1081003648.xml in your Web browser. If all goes well, then you should see nicely rendered text that is more human readable than the raw XML. This happens because your Web browser read the XML file. Learned that it needed an XSLT file for transformation. Retrieved the XSLT file. Combined the XSLT file with the raw XML, and displayed the results.
MODS is an XML bibliographic vocabulary; it is very similar to MARC. In this exercise you will transform a set of MODS data into individual XHTML files.
Open xml-data/mods/single/catalog.xml in your favorite text editor. Examine the element names and take notice how they are similar to the fields of standard bibliographic data sets.
Open xslt/mods2xhtml.xsl in your favorite text editor or Web browser. Notice how the stylesheet declaration imports a namespace called mods. Notice how the namespace prefixes all of the MODS element names. Notice how the parameter named filename is assigned a value from the MODS record. Notice the use of xsl:document element/command. It is used to define where output files will be saved. Finally, notice how the XSLT tries create rich and va