Skip to main content

XML Background

This topic provides a quick summary of XML terms and concepts, as a refresher for users working with the InterSystems IRIS® data platform %XML classes. It is assumed that the reader is familiar with the basic syntax of XML, so this topic is not intended as a primer on XML. It does not, for example, describe the syntax or list reserved characters. However, it does provide a list of the terms with short definitions, as a reference.

attribute

A name-value pair of the following form:

ID="QD5690"

Attributes reside within elements, as shown below, and an element can have any number of attributes.

<Patient ID="QD5690">Cromley,Marcia N.</Patient>
CDATA section

Denotes text that should not be validated, as follows:

<myelementname><![CDATA[ 
Non-validated data goes here.  
You can even have stray "<" or ">" symbols in it. 
]]></myelementname>

A CDATA (character data) section cannot contain the string ]]> because this string marks the end of the section. This also means that CDATA sections cannot be nested.

Note that the contents of a CDATA section must conform to the encoding specified for the XML document, as does the rest of the XML document.

comment

A parenthetical note that is not part of the main data of an XML document. A comment looks like this:

<!--Output for the class: GXML.PersonNS7-->

content model

An abstract description of the possible contents of an XML element. The possible content models are as follows:

  • Empty content model (no child elements or text nodes are permitted)

  • Simple content model (only text nodes are permitted)

  • Complex content model (only child elements)

  • Mixed content model (both child elements and text nodes are permitted)

In all cases, the element may or may not have attributes; the phrase content model does not refer to the presence or absence of attributes in the element.

default namespace

The namespace to which any unqualified elements belong, in a given context. A default namespace is added without a prefix. For example:

<Person xmlns="http://www.person.org">
  <Name>Isaacs,Rob G.</Name>
  <DOB>1981-01-29</DOB>
</Person>

Because this namespace declaration does not use a prefix, the <Person>, <Name>, and <DOB> elements all belong to this namespace.

Note that the following XML, which does not use a default namespace, is effectively equivalent to the preceding example:

<s01:Person s01:xmlns="http://www.person.org">
  <s01:Name>Isaacs,Rob G.</s01:Name>
  <s01:DOB>1981-01-29</s01:DOB>
</s01:Person>
DOM

Document Object Model (DOM) is an object model for representing XML and related formats.

DTD (document type definition)

A series of text directives contained within an XML document or in an external file. It defines all the valid elements and attributes that can be used within a document. DTDs do not themselves use XML syntax.

element

An element typically consists of two tags (a start tag and an end tag), possibly surrounding text and other elements. The content of the element is everything between these two tags, including text and any child elements. The following is a complete XML element, with start tag, text content, and end tag:

<Patient>Cromley,Marcia N.</Patient>

An element can have any number of attributes and any number of child elements.

An empty element can either include a start tag and an end tag, or just a single tag. The following examples are equivalent:

<EndDate></EndDate>

<EndDate/>

In practice, elements are likely to refer to different parts of data records, such as:

<Student level="undergraduate">
   <Name>Barnes,Gerry</Name>
   <DOB>1981-04-23</DOB>
</Student>
entity

A unit of text (within an XML file) that represents one or more characters. An entity has the following structure:

&characters;
global element

The concepts of global and local elements apply to documents that use namespaces. The names of global elements are placed in a separate symbol space from those of local elements. A global element is an element whose type has global scope, that is, an element whose type is defined at the top level in the corresponding XML schema. Element declarations that appear as children of the <xs:schema> element are considered to be global declarations. Any other element declaration is a local element, unless it references a global declaration through the ref attribute, which effectively makes it a global element.

Attributes are global or local in the same way.

local element

An XML element that is not global. Local elements do not belong explicitly to any namespace, unless elements are qualified. See qualified and global element.

namespace

A namespace is a unique string that defines a domain for identifiers so that XML-based applications do not confuse one type of document with another. It is typically given as a URI (uniform resource indicator) in the form of a URL (uniform resource location), which may or may not correspond to an actual web address. For example, "http://www.w3.org" is a namespace.

You include a namespace declaration with one of the following syntaxes:

xmlns="your_namespace_here"
pre:xmlns="your_namespace_here"

In either case, the namespace is used only within the context where you inserted the namespace declaration. In the latter case, the namespace is associated with the given prefix (pre). Then an element or attribute belongs to this namespace if and only if the element or attribute also has this prefix. For example:

<s01:Person xmlns:s01="http://www.person.com">
   <Name>Ravazzolo,Roberta X.</Name>
   <DOB>1943-10-24</DOB>
</s01:Person>

The namespace declaration uses the s01 prefix. The <Person> element uses this prefix as well, so this element belongs to this namespace. The <Name> and <DOB> elements, however, do not explicitly belong to any namespace.

processing instructions (PI)

An instruction (within the prolog) intended to tell an application how to use an XML document or what to do with it. An example follows; this associates a stylesheet with the document.

<?xml-stylesheet type="text/css" href="mystyles.css"?>
prolog

The part of the XML document that precedes the root element. The prolog starts with the XML declaration (which indicates the XML version used) and then may include a DTD declaration or schema declaration, as well as processing instructions. (Technically you do not need a DTD nor a schema. Also, technically, you can have both in the same file.)

root, root element, document element

Each XML document is required to have exactly one element at the outermost level. This is known as the root, root element, or document element. The root element follows the prolog.

qualified

An element or attribute is qualified if it is explicitly assigned to a namespace. Consider the following example, in which the elements and attribute of <Person> are not qualified:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
   <s01:Person xmlns:s01="http://www.person.com" GroupID="J1151">
      <Name>Frost,Sally O.</Name>
      <DOB>1957-03-11</DOB>
   </s01:Person>
</Root>

Here, the namespace declaration uses the s01 prefix. There is no default namespace. The <Person> element uses this prefix as well, so that element belongs to this namespace. There is no prefix for the <Name> and <DOB> elements or the <GroupID> attribute, so these do not explicitly belong to any namespace.

In contrast, consider the following case where the elements and attribute of <Person> are qualified:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
   <Person xmlns="http://www.person.com" GroupID="J1151">
      <Name>Frost,Sally O.</Name>
      <DOB>1957-03-11</DOB>
   </Person>
</Root>

In this case, the <Person> element defines a default namespace, which applies to the child elements and the attribute.

Note:

The XML schema attributes elementFormDefault attribute and attributeFormDefault attribute control whether elements and attributes are qualified in a given schema. In InterSystems IRIS XML support, you use a class parameter to specify whether elements are qualified.

schema

A document that specifies meta-information for a set of XML documents, as an alternative to a DTD. As with a DTD, you can use a schema to validate the contents of specific XML documents. XML schemas offer several advantages over DTDs for certain applications, including these:

  • An XML schema is a valid XML document, making it easier to develop tools that operate on schemas.

  • An XML schema can specify a richer set of features and includes type information for values.

Formally, a schema document is a XML document that complies with the W3 XML Schema specification (https://www.w3.org/XML/SchemaOpens in a new tab). It obeys the rules of XML and uses some additional syntax. Typically the extension of the file is .xsd.

style sheet

A document written in XSLT that describes how to transform a given XML document into another XML or other human-readable document.

text node

One or more characters included between an opening element and the corresponding closing element. For example:

<SampleElement>
sample text node
</SampleElement>
type

A constraint placed upon the interpretation of data. In an XML schema, the definition of each element and attribute corresponds to a type.

Types are either simple or complex.

Each attribute has a simple type. Simple types also represent elements that have no attributes and no child elements (only text nodes). Complex types represent other elements.

The following fragment of a schema shows some type definitions:

<s:complexType name="Person">
    <s:sequence>
        <s:element name="Name" type="s:string" minOccurs="0" />
        <s:element name="DOB" type="s:date" minOccurs="0" />
        <s:element name="Address" type="s_Address" minOccurs="0" />
    </s:sequence>
    <s:attribute name="GroupID" type="s:string" />
</s:complexType>
<s:complexType name="s_Address">
    <s:sequence>
        <s:element name="City" type="s:string" minOccurs="0" />
        <s:element name="Zip" type="s:string" minOccurs="0" />
    </s:sequence>
</s:complexType>

unqualified

An element or attribute is unqualified if it is not explicitly assigned to a namespace. See qualified.

well-formed XML

An XML document or fragment that obeys the rules of XML, such as having an end tag to match a start tag.

XML declaration

A statement that indicates which XML version (and, optionally, which character set) is used in a given document. If included, it must be the first line in the document. An example follows:

<?xml version="1.0" encoding="UTF-8"?>

XPath

XPath (XML Path Language) is an XML-based expression language for obtaining data from an XML document. The result can either be scalar or an XML subtree of the original document.

XSLT

XSLT (Extensible Stylesheet Language Transformations) is an XML-based language that you use to describe how to transform a given XML document into another XML or other human-readable document.