Black Pixel
Black Pixel
Menu
Black Pixel
Black Pixel
Black Pixel
    :Contents
    :Chapter 1
    :Chapter 2
    :Chapter 3
    :Chapter 4
    :Chapter 5
    :Literature
Black Pixel
Black Pixel


Valid HTML 4.0!
Valid CSS


Black Pixel
Black Pixel
Ground Zero - My test site
Black Pixel
Black Pixel
Black Pixel
Black Pixel
Black Pixel
Black Pixel

Chapter 4

More on Document Type Definitions

[back to index]

Using Attributes in XML

So far, we have used elements to describe textual content. Towards the end of chapter two I mentioned that attributes are a central concept concerning the physical structure of a document. Attributes are name-value pairs that occur within an element's start tag. This attribute provides additional information for the application that reads the XML document and is as a general rule not intended for the human viewer. So, if they are not intended for the user of the document, then why do we sometimes feel the need to use attributes in XML elements?

The answer to this question is already given, in part, in the preceding paragraph. Attributes are used when we want to add additional information to a text, without changing the text itself. This could be almost anything from a short ID-number to a lengthier description. We have already seen, in the previous chapters, that we can differentiate between year of production and year of purchase in our CD collection by using attributes. When I say that these attributes are not intended for the human user, this is not entirely true. If you use attributes that make some sort of sense, they can add a lot of extra information to the document for human readers. But in most cases, you will not see the document - only the result after it has been processed. This is where the main benefit of attributes enters into the picture - they can make it easier to process your XML document. As an example, you can tell a style sheet to display the different years for each CD in different colours, or display one and not the other. Since style sheets is something that we will not get around to in this manual, let us focus on how attributes work in XML.


[back to index]

How do attributes fit into a document ?

First a quick recap from chapter two. Attributes are name-value pairs that are separated by an equals sign (=). They may occur inside start-tags or empty tags, but never inside end-tags. This is an example of a start-tag with an attribute:


<YEAR function="release">

We say that the element YEAR has a function attribute, which has the value "release". This value provides additional, useful information about the content of the YEAR tag. In order to use an attribute in an XML document instance, we must declare it in the DTD, just like we have already done with the elements. An attribute declaration looks fairly similar to an element declaration. The main difference between them, is that where you use an <!ELEMENT> tag for element type declarations in the DTD - you use an <!ATTLIST> tag for attributes.

To illustrate this, we can go back to our CD Collection again. In the previous chapter we created a DTD that allowed the tag <YEAR> to be used in the document. The element type declaration looked like this:


<!ELEMENT YEAR (#PCDATA)>

Let's say that we want to an attribute that gives a more accurate description of this element, like we have already done in the examples above. In the previous chapter, we learned that it is not possible to add elements without having declared them in the document type definition first. This is, of course, also true of attributes. If you add an attribute to an element without declaring it, you will not be able to validate your document. To add an attribute to our <YEAR> element, we need to add this line in the DTD:


<!ATTLIST YEAR function CDATA "release">

Like element declarations, it does not matter where in the DTD you put this line, but it may be a good idea to place it directly following the element declaration for YEAR. The reason for this is, of course, that it makes your XML document instance more structured and easier for other people to understand.

Let's go through the attribute declaration above and see how it works. The <!ATTLIST> tag in itself does nothing more than tell the DTD that we are starting to declare an attribute for an element in the DTD. If we go through the example above from left to right, we can see how an attribute declaration works in XML:

Directly after the <!ATTLIST> tag, you have to specify the name of the element that is supposed to include this attribute. In this example, so far, we have specified that the element YEAR should have an attribute of some sort. The next step would be to tell the DTD the name of the attribute for YEAR. This is done by typing the name of the attribute directly after the name of the element (separated by a space though). In our little example, we have stated that the element YEAR is allowed to contain an attribute called 'function'.

The final steps towards creating a valid attribute declaration, deals with what type of data and which values are valid in the specified attribute. After you have specified the name of the attribute, you must decide what kind of attribute you have just created. This will be one of ten possible types, where CDATA is the most common one. CDATA means that the attribute can contain text, and nothing else. We will get back to a more in-depth description of the most common attribute types later in this chapter, but here is a list of the different types that can be used in an attribute declaration:

  • CDATA
  • Enumerated
  • NMTOKEN
  • NMTOKENS
  • ID
  • IDREF
  • ENTITY
  • ENTITIES
  • NOTATION
  • Enumerated NOTATION

As I mentioned above, not all of these types are commonly used, but the ones that are useful to us will be described later. The last item concerning the attribute declaration, deals with the value the attribute takes on if it has not been specified in the element. This is called the default value of the attribute. In our example we have told the DTD that unless anything else is specified, YEAR should mean the year the CD is actually released on the market. If you paste the example above into the document from chapter three, you should be able to validate it without any problems - even if you haven't entered any attributes yet.


[back to index]

How to work with Attributes.

As we have seen above, it is relatively easy to add attributes to an existing DTD. It works in very much the same way as element declarations, and like element declarations there are a number of things you can do to modify the attributes. The example we have used so far can, in many, ways be regarded as an "average" attribute. It contains regular text and we have one default value that can be used. Not all attributes will fit neatly into this pattern, so over the next few pages we will have a look at how attribute declarations can be changed to fit our needs.

One of the more immediate questions that come to mind when we work with attributes is whether or not it is possible to include several attributes inside the same element. The answer to this is quite simple - yes it is possible. This is done in more or less exactly the same way as with one attribute. To illustrate this, let's use our standard example again. We have stated in our DTD that each individual disc contains any number of tracks, and that these tracks can be described according to name and time. This was done to simplify our work in the previous chapters, but at this stage we realise that more information can be added to the TRACK element. We can for instance add which number it is on the CD or the size (if we have a digital copy). Let's say that we would like to add this information to the collection example, but as attributes to the TRACK element - not as new sub elements. This can be done in two different ways:

  1. Creating two new attribute declarations.

    <!ELEMENT TRACK (NAME*,TIME?)>
    <!ATTLIST TRACK number CDATA #IMPLIED>
    <!ATTLIST TRACK size CDATA #IMPLIED>

  2. Including both attributes in the same declaration.

    <!ELEMENT TRACK (NAME*,TIME?)>
    <!ATTLIST TRACK number CDATA #IMPLIED
                 size CDATA #IMPLIED>

Both of these solutions are perfectly valid XML. Which one of them you decide to use depends solely on personal taste, in terms of what you feel gives the best document structure. As far as the parser is concerned, however, these two are identical. In some cases, depending on which parser you use, you may get a warning when you try to validate the first option. (See the illustration below). This is not because anything is wrong with your document, but it is a reminder that you have two attribute declarations for the same element. The reason why the parser returns this message is simply that in XML you are allowed to declare the same attribute more than once. If this happens, the first declaration takes precedence over the other.

In the example above you may have noticed that I have you used the value #IMPLIED where the default value of the attributes should have been. This brings us to our next item on the agenda: how to deal with default values. In the example where we went through the process of creating an attribute declaration, we specified that "released" was to be used as default value for the 'function' attribute. The problem with this approach is that the author of a document may not always have a particular value that can be used as default. Instead of specifying a default value yourself, you can use certain keywords to do one of three things:

  • Require the author to specify a value (any value)
  • Allow the value to be omitted
  • Force the use of a given value

This is done by using one of these keywords instead of the attribute value: #REQUIRED, #IMPLIED or #FIXED.


Screendump of error warning
Even if the parser returns a warning - this is still valid XML

#REQUIRED: This keyword is used in situations where the author of the DTD wants to force the users to provide a value for a particular attribute. If we go back to our first attribute example again, we can try to replace the default value ('released') with the #REQUIRED keyword. If we make this change to our existing document and try to parse it again, the program should return an error message like this:


line 26, http://129.177.24.81/xml/testing.xml:
error (1201): required attribute missing: YEAR (missing "function")

Why? Quite simply because we have taken away the default value of the 'function' attribute and told the DTD that this must be provided by the users for the document to be valid. In our example there is only one YEAR element, so all we have to do in this case is to add a function attribute to it. The content of this attribute can be whatever you feel best describes the element, as long as it is plain text. Add this inside the element tag and validate it again: function="whatever". Your document should be valid again.


#IMPLIED: This keyword is used in situations where the author of the DTD wants to provide the users with the possibility of adding an attribute value, without forcing them to do so. This is what we did in the second example, with the two new attributes of the TRACK element. The 'number' and 'size' attributes are allowed within the element, but the user is not required to supply these attributes, nor is a specific default value given by the author.


#FIXED: The final keyword we are going to discuss, is also the one that is least used. This option is used when you want to provide a specific default value and you don't want it to be changed by anyone. If we look at our example, there are not really any of the elements where we would want to use a fixed attribute. But for demonstration purposes, let's say that we want to create an attribute in the RECORD element that identifies the owner of each individual CD in the collection. To do this, we add the following attribute declaration to the DTD:


<!ATTLIST RECORD owner CDATA "Vemund">

At this point we have allowed the RECORD element to hold information on the owner of the CD's in the collection, and that this should be me by default. People will, however, be able to change the value of the 'owner' attribute if they wish to. To prevent this, we can insert the #FIXED keyword in the attribute declaration. Like this:


<!ATTLIST RECORD owner CDATA #FIXED "Vemund">

The DTD will now understand that "Vemund" is the default value of the 'owner' attribute, and that this can not be changed. If we add this to our example and then try to give the 'owner' attribute a different value then the one we have specified in the DTD, the parser will return an error message when it tries to validate the document:


line 22, http://129.177.24.81/xml/testing.xml:>
error (1200): attribute value doesn't match fixed default: owner="whoever" (default "Vemund")

This kind of solution may be useful in situations where you would like to insert some sort of standard information to each document you create, like a copyright-line for example.


[back to index]

Different types of attributes

So far, all the attributes we have used have been of the type CDATA, which basically means plain text. CDATA is the most commonly used attribute type, but nine other attributes types are allowed in XML. It goes without saying that some of these are more commonly used than others, since each type is designed to serve a particular purpose in XML. In the following overview I will therefore give certain attribute types more weight than others, simply because I feel that these are more important to our particular need in XML. Furthermore, I will at this point only include the first six attributes from the list below in the discussion. The reason for this is that the last four attribute types will require some knowledge about entities and entity references. This is something we will get back to later, so we will wait a little while before we go into detail about these attribute types.

Currently, these ten attribute types are allowed in XML:

  • CDATA
  • Enumerated
  • NMTOKEN
  • NMTOKENS
  • ID
  • IDREF
  • ENTITY
  • ENTITIES
  • NOTATION
  • Enumerated NOTATION

[back to index]

CDATA

As we have already seen, this is the most common and general attribute type. An attribute of this type can contain any string of text, save for a few rules and exceptions. A CDATA attribute can not contain the following characters:

  • Less than sign ( < )
  • Ampersand ( & )
  • Quotation mark ( " )

These signs can, however, be used if they are replaced by their usual entity references ( &lt; , &amp; and &quot; , respectively). Double quotes can actually be used without resorting to entity references, but then you will have to surround the attribute value by single quotes. These two examples are interpreted in the same way by the parser, and are both valid XML:


length="7&quot;" or length='7"'

If your attribute value contains both single and double quotes, they must both be escaped by their entity references (&apos; and &quot; respectively).


[back to index]

Enumerated

Unlike the other attribute types we will be discussing here, 'Enumerated' is not used as a keyword in XML. 'Enumerated' is used to provide a list of possible attribute values in the DTD. These values must be written inside parenthesis and be separated by vertical bars. We can use the YEAR element to illustrate this:


<!ATTLIST YEAR function (released | purchased) "released">

This looks fairly similar to what we have already done. We have just replaced CDATA with (released | purchased). The effect of this is that the attribute 'function' must contain one of the two predefined values, and that the value is assumed to be 'released' unless otherwise stated.


[back to index]

NMTOKEN

The NMTOKEN keyword is used when you want the attribute value to be a valid XML name. A valid XML name must conform to the same rules as valid element names as they are described at the start of chapter two. XML names must begin with an underscore ( _ ) or a letter. Subsequent letters in the name may include: letters, digits, underscores, hyphens and periods. XML names may never contain white spaces.

The NMTOKEN keyword is used mainly when you want to manipulate your XML data with other programming languages. Since the name restrictions in XML are more or less the same as in Java and JavaScript, you can use NMTOKEN to associate particular Java classes with XML elements. Since we will not go into Java, or any other programming language, in this manual this is not an attribute type that we will have to use either.


NMTOKENS

This is the plural form of NMTOKEN. It is very rarely used, but it can be used to create an attribute value from several XML names.


[back to index]

ID

The ID type is used to uniquely identify elements in an XML document instance. An attribute value of this type must be a valid XML name, as described above in the NMTOKEN section. Since this type is a unique identifier, a particular value may not be used as an ID attribute more than once. Furthermore, each individual element may not contain more than one attribute of the ID type. For obvious reasons, the ID attribute type can not be used together with #FIXED. Fixed attributes must always have the same value, whereas ID attributes must always have different values. To illustrate this with an example, we can use our CD Collection again. In a CD archive, each individual disc has a unique Id number. This Id number can, among other things, be used to retrieve information about the CD from databases on the Internet. We can therefore decide to include this Id number in the DISC element. Before we go ahead with the example, let me just specify that this only works if you do not have two copies of the same CD in the collection. Unless you are creating a list for a record store, this is probably not the case, so let's go ahead with the example.

In the example we have been using so far, we have entered information on the first track from each of the four CD's in Bruce Springsteen's "Tracks" collection. This means that we have four DISC elements that can be given unique ID numbers. Let's decide to call the attribute "discid" and create an attribute declaration for it in the DTD. It would look like this:


<!ATTLIST DISC discid ID #REQUIRED>

The ID type does not necessarily have to be #REQUIRED, it could be #IMPLIED (but never #FIXED). Before this will work we need to assign the Id numbers to each of the elements. This is done by adding the following four lines inside the DISC elements:


discid="a410410c"
discid="a50b910c"
discid="a5Toa5"
discid="ad0bf70d"

It is important to notice that since the ID type must conform to the rules regarding XML names, numbers cannot be used to start an ID attribute. If the parser encounters an illegal ID attribute value, it will return an error message:


line 30, http://129.177.24.81/xml/testing.xml:
error (1221): character in attribute value is illegal according to declaration: 4

Similarly, if a unique ID value is encountered more than once in the same document, the parser should return an error message:


line 43, http://129.177.24.81/xml/testing.xml:
error (661): duplicate value for ID attribute: discid

[back to index]

IDREF

As you probably have guessed, the IDREF type is closely associated with the ID type. The IDREF attribute effectively allows an element elsewhere in the XML document instance to be the value of an attribute. This means that the value of the IDREF attribute must be identical to the ID value of another element. If we elaborate a little bit on the example we used above, this is how it works in practise:

We have already seen that each individual Compact Disc in our collection has it's own unique ID number. If want to make sure that the tracks on these CDs are linked to this unique number, this can be done by using the IDREF attribute type. We have already made sure that the TRACK element can contain the attributes 'number' and 'size', so let us allow another attribute called 'parent', like this:


<!ATTLIST TRACK number CDATA #IMPLIED
size CDATA #IMPLIED
parent IDREF #IMPLIED>

Now that we have allowed this attribute to be used we can use this attribute to associate the individual tracks with their parent element:


<DISC discid="a410410c">
<TRACK parent="a410410c">
<NAME>Mary Queen of Arkansas</NAME>
<TIME>3:26</TIME>
</TRACK>
</DISC>

The ID/IDREF combination is normally used when you want to establish a parent - child relationship between elements within a document. It is important to remember, however, that XML parsers in these cases only check that the attributes are 'grammatically' correct. They will not be able to check whether or not your IDREF value refers to the correct ID. This means that i could have written parent="abcdef" in the example above and the document would still be valid.

Before we start to explain the last four attribute types, we need to have a closer look at how entities and entity references are dealt with in XML.


[back to index]

Entities in XML

Towards the end of chapter two, we briefly discussed the concept of entities in XML. In the next chapter, we go on to a more in-depth look at how entities work, but now that we are more familiar with XML, it might be a good idea repeat some of the most important points from chapter two.

As we stated in chapter two, an entity is an 'item' that holds data, like a database record, an image file or a text document. The purpose of an entity is, in other words, to hold content. The example we have been using so far, is an entity that contains textual content. Furthermore there are two types of entities: external and internal. Internal entities are defined completely within one document, whereas external entities get their content from another source through an URL.

[back to index]

In addition to this, entities may be parsed or unparsed. Parsed entities are made up of text that follows the rules of XML. The content of unparsed entities, on the other hand, is binary data or text that does not adhere to the XML rules. The rest of this chapter will deal specifically with one type of entities that was covered to some extent in chapter two also: General Entity References. More specifically, general entity references are used to merge text into already existing documents. I compared them to macros in Word processing, and this is not very far off the mark - they simply substitute one portion of text for another. Like all other things in XML, general entity references must obey certain rules:

  • General Entity references must always begin with an ampersand ( & )
  • General Entity references must always end with a semicolon ( ; )
  • General Entity references are case-sensitive
  • General Entity references are composed of alphanumeric characters
  • General Entity references must be declared in a DTD, unless you are using one of the five pre-defined entity references listed below.

These five entity references are pre-defined in XML:

  • Ampersand ( & ) - &amp;
  • Left tag ( < ) - &lt;
  • Right tag ( > ) - &gt;
  • Apostrophe ( ' ) - &apos;
  • Quotation marks ( " ) - &quot;

To create your own entity references in a document, you will need to use the declare them in the DTD with the <!ENTITY> tag. We can illustrate this with an example from the collection we have created. Let's assume that we have not only one, but all of Bruce Springsteen's albums in our collection. Instead of writing "Bruce Springsteen and the E Street Band" in all the appropriate places, we decide to save ourselves some time by creating an entity reference for this artist. First we have to declare the entity reference in the DTD:


<!ENTITY bs "Bruce Springsteen and the E Street Band">

With this change in the DTD, we can now use the entity reference &bs; in our document. This example hides the real significance of entity references to a certain degree. After all - Bruce Springsteen hasn't released that many albums. Entity references are used when you have repeated occurrences of one particular string of text. As an example we could imagine a large archive, consisting of thousands of XML documents that share the same DTD. These documents all have the e-mail of the owner as a footer. If this persons e-mail changes for some reason, it would be easier to change this once in the DTD, than to go through each individual document and change it.

Before we end this chapter, we will have a brief look at a few special points regarding entity references. All of these points deal with how you can use entity references inside the DTD of a document. If we have a look at the example we used for our general entity reference, we see that the 'and' in the artist name might be replaced by an ampersand. As we have seen, the ampersand is one of the five predefined entity references in XML. This means that we don't have to create a new declaration for this entity reference, but can use it directly with the one we have already created, like this:


<!ENTITY bs "Bruce Springsteen &amp; the E Street Band">

[back to index]

You can also use your own entity references inside the DTD, but they are subject to a couple of restrictions:

  • Unlike elements and attributes, the position of the entity declaration is not irrelevant in XML. Entity references must be declared before they are used.
  • General Entity references can not insert text that will be a part of the DTD and not the document content.

This last point means that you can not use general entity references to replace XML keywords like #PCDATA for example. So far, we have been discussing so-called general entity references. These can only be used to merge text into the document content, so if you want to use entity references to replace parts of the DTD itself, you will have to use something called parameter entity references. Parameter entity references are very similar to General entity references, with these two very important distinctions:

  • Parameter entity references begin with a percent sign ( % )
  • Parameter entity references can only occur inside the DTD

Apart from this, parameter entity references are used and declared in the same way as general entity references. To demonstrate this, let us use a parameter entity reference to substitute all occurrences of the keyword #PCDATA in our DTD. The only difference between this entity declaration and the general one above is that we have to insert a percent sign before the entity name, like this:

<!ENTITY % pc"(#PCDATA)">

If we insert this line into the DTD, and then change all (#PCDATA) with %pc;, the document should still be valid XML. There is, however, a problem with this use of parameter entities. According the XML 1.0 specification, paramenter entity references are not allowed in XML Documents with an internal DTD. In other words, for this example to work, we need to separate the DTD from the rest of the XML document. So, before we proceed with entities, we will start the next chapter by exporting our DTD. To make sure that we all do the same thing, this is the file that failed to validate in our last attempt:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE COLLECTION [
  <!ELEMENT COLLECTION (RECORD*)>
  <!ELEMENT RECORD (ARTIST, TITLE, YEAR?, LABEL?, TIME?, RATING?, COMMENT?, DISC+)>
  <!ATTLIST RECORD owner CDATA #FIXED "Vemund">
  <!ENTITY % pc "(#PCDATA)">
  <!ELEMENT ARTIST %pc;>
  <!ELEMENT TITLE %pc;>
  <!ELEMENT YEAR %pc;>
  <!ATTLIST YEAR function CDATA #REQUIRED>
  <!ELEMENT LABEL %pc;>
  <!ELEMENT RATING %pc;>
  <!ELEMENT COMMENT %pc;>
  <!ELEMENT DISC (TRACK+)>
  <!ATTLIST DISC discid ID #REQUIRED>
  <!ELEMENT TRACK (NAME*,TIME?)>
  <!ATTLIST TRACK number CDATA #IMPLIED
   size CDATA #IMPLIED
   parent IDREF #IMPLIED>
  <!ELEMENT NAME %pc;>
  <!ELEMENT TIME %pc;>
  <!ENTITY bs "Bruce Springsteen &amp; the E Street Band">
]>

<COLLECTION>
  <RECORD>
    <ARTIST>&bs;</ARTIST>
    <TITLE>Tracks</TITLE>
    <YEAR function="release">1998</YEAR>
    <LABEL>Columbia Records</LABEL>
    <TIME>250 minutes</TIME>
    <RATING>8/10</RATING>
    <DISC discid="a410410c">
      <TRACK parent="a410410c">
        <NAME>Mary Queen of Arkansas</NAME>
        <TIME>3:26</TIME>
      </TRACK>
    </DISC>
    <DISC discid="a50b910c">
      <TRACK parent="a50b910c">
        <NAME>Restless nights</NAME>
        <TIME>4:05</TIME>
      </TRACK>
    </DISC>
    <DISC discid="a50b911c">
      <TRACK parent="a50b911c">
        <NAME>Cynthia</NAME>
        <TIME>4:26</TIME>
      </TRACK>
    </DISC>
    <DISC discid="ad0bf70d">
      <TRACK parent="ad0bf70d">
        <NAME>Leavin' train</NAME>
        <TIME>3:46</TIME>
      </TRACK>
    </DISC>
  </RECORD>
</COLLECTION>


< Previous | Index | Next >


Black Pixel
Black Pixel
Black Pixel Black Pixel
Black Pixel