Using Caché XML Tools
Using %XML.TextReader
[Back] [Next]
   
Server:docs2
Instance:LATEST
User:UnknownUser
 
-
Go to:
Search:    

The %XML.TextReader class offers a simple, easy way to read arbitrary XML documents that may or may not map directly to Caché objects. Specifically, this class provides ways to navigate a well-formed XML document and view the information in it (elements, attributes, comments, namespace URIs, and so on). This class also provides complete document validation, based on either a DTD or an XML schema. Unlike %XML.Reader, however, %XML.TextReader does not provide a way to return a DOM. If you require a DOM, see the chapter Importing XML into Caché Objects,” earlier in this book.

This chapter discusses the following topics:
Note:
The XML declaration of any XML document that you use should indicate the character encoding of that document, and the document should be encoded as declared. If the character encoding is not declared, Caché uses the defaults described in Character Encoding of Input and Output,” earlier in this book. If these defaults are not correct, modify the XML declaration so that it specifies the character set actually used.
Creating a Text Reader Method
To read an arbitrary XML document that does not necessarily have any relationship to a Caché object class, you invoke methods of the %XML.TextReader class, which opens the document and loads it into temporary storage as a text reader object. The text reader object contains a navigable tree of nodes, each of which contains information about the source document. Your method can then navigate the document and find out information about it. Properties of the object give you information about the document that depend on your current location within the document. If there are validation errors, those errors are also available as nodes in the tree.
Overall Structure
Your method should do some or all of the following:
  1. Specify a document source, via the first argument of one of the following methods:
    Method First Argument
    ParseFile() A file name, with complete path
    ParseStream() A stream
    ParseString() A string
    ParseURL() A URL
    In any case, the source document must be a well-formed XML document; that is, it must obey the basic rules of XML syntax. Each of these methods returns a status ($$$OK or a failure code) to indicate whether the result was successful. You can test the status with the usual mechanisms; in particular, you can use $System.Status.DisplayError(status) to see the text of the error message.
    For each of these methods, if the method returns $$$OK, it returns by reference (its second argument) the text reader object that contains the information in the XML document.
    Additional arguments let you control entity resolution, validation, which items are found, and so on. These are described later in this chapter, in Argument Lists for the Parse Methods.
  2. Check the status returned by the parse method and quit if appropriate.
    If the parse method returned $$$OK, you have an text reader object that corresponds to the source XML document. You can navigate this object.
    Your document is likely to contain nodes such as "element", "endelement", "startprefixmapping", and so on. The node types are listed in Node Types, later in this chapter.
    Important:
    In the case of any validation errors, your document contains "error" or "warning" nodes. Your code should check for such nodes. See Performing Validation.
  3. Use one of the following instance methods to start reading the document.
    See Navigating the Document, later in this chapter.
  4. Get the values of the properties of interest for this node, if any. Available properties include Name, Value, Depth, and so on. See Node Properties,” later in this chapter.
  5. Continue to navigate through the document as needed and get property values.
    If the current node is an element, you can use the MoveToAttributeIndex() or MoveToAttributeName() methods to move the focus to attributes of that element. To return to the element, if applicable, use MoveToElement().
  6. If needed, use the Rewind() method to return to the start of the document (before the first node). This is the only method that can go backward in the source.
After your method runs, the text reader object is destroyed and all related temporary storage is cleaned up.
Example 1
Here is a simple method that reads any XML file and shows the sequence number, type, name, and value of every node:
ClassMethod WriteNodes(myfile As %String)
{
    set status=##class(%XML.TextReader).ParseFile(myfile,.textreader)
    //check status
    if $$$ISERR(status) {do $System.Status.DisplayError(status) quit}
    //iterate through document, node by node
    while textreader.Read()
    {
        Write !, "Node ", textreader.seq, " is a(n) "
        Write textreader.NodeType," "
        If textreader.Name'=""
        {
            Write "named: ", textreader.Name
            }
            Else
            {
                Write "and has no name"
                }
        Write !, "    path: ",textreader.Path
        If textreader.Value'="" 
        {
            Write !, "    value: ", textreader.Value
            }
        }
}
This example does the following:
  1. It calls the ParseFile() class method. This reads the source file, creates a text reader object, and returns that in the variable doc by reference.
  2. If ParseFile() is successful, the method then invokes the Read() method to find each successive node within the document.
  3. For each node, the method writes output lines that contain the sequence number of the node, the node type, the node name (if any), the node path, and the node value (if any). Output is written to the current device.
Consider the following example source document:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="mystyles.css"?>
<Root>
   <s01:Person xmlns:s01="http://www.root.org">
      <Name attr="xyz">Willeke,Clint B.</Name>
      <DOB>1925-10-01</DOB>
   </s01:Person>
</Root>
For this source document, the preceding method generates the following output:
Node 1 is a(n) processinginstruction named: xml-stylesheet
    path:
    value: type="text/css" href="mystyles.css"
Node 2 is a(n) element named: Root
    path: /Root
Node 3 is a(n) startprefixmapping named: s01
    path: /Root
    value: s01 http://www.root.org
Node 4 is a(n) element named: s01:Person
    path: /Root/s01:Person
Node 5 is a(n) element named: Name
    path: /Root/s01:Person/Name
Node 6 is a(n) chars and has no name
    path: /Root/s01:Person/Name
    value: Willeke,Clint B.
Node 7 is a(n) endelement named: Name
    path: /Root/s01:Person/Name
Node 8 is a(n) element named: DOB
    path: /Root/s01:Person/DOB
Node 9 is a(n) chars and has no name
    path: /Root/s01:Person/DOB
    value: 1925-10-01
Node 10 is a(n) endelement named: DOB
    path: /Root/s01:Person/DOB
Node 11 is a(n) endelement named: s01:Person
    path: /Root/s01:Person
Node 12 is a(n) endprefixmapping named: s01
    path: /Root
    value: s01
Node 13 is a(n) endelement named: Root
    path: /Root
Notice that the comment has been ignored; by default, the %XML.TextReader class ignores comments. For information on changing this, see Argument Lists for the Parse Methods,” later in this chapter.
Example 2
The following example reads an XML file and lists every element in it:
ClassMethod ShowElements(myfile As %String)
{
    set status = ##class(%XML.TextReader).ParseFile(myfile,.textreader)
    //check status
    if $$$ISERR(status) {do $System.Status.DisplayError(status) quit}
    //iterate through document, node by node
    while textreader.Read()
    {
        if (textreader.NodeType = "element") 
        {
            write textreader.Name,!
            }
        }
}
This method checks the type of each node, by using the NodeType property. If the node is an element, the method prints its name to the current device. For the XML source document shown earlier, this method generates the following output:
Root
s01:Person
Name
DOB
Node Types
Each node of a document is one of the following types:
Node Types in a Text Reader Document
Type Description
"attribute" An XML attribute.
"chars" A set of characters (such as content of an element).
The %XML.TextReader class recognizes other node types ("CDATA", "EntityReference", and "EndEntity") but automatically converts them to "chars".
"comment" An XML comment.
"element" The start of an XML element.
"endelement" The end of an XML element.
"endprefixmapping" End of the context where a namespace is declared.
"entity" An XML entity.
"error" A validation error found by the parser. See Performing Validation.
"ignorablewhitespace" The white space between markup in a mixed content model.
"processinginstruction" An XML processing instruction.
"startprefixmapping" An XML namespace declaration, which may or may not include a namespace.
"warning" A validation warning found by the parser. See Performing Validation.
Notice that an XML element consists of multiple nodes. For example, consider the following XML fragment:
<Person>
   <Name>Willeke,Clint B.</Name>
   <DOB>1925-10-01</DOB>
</Person>
The SAX parser views this XML as the following set of nodes:
Example of Document Nodes
Node Number Type of Node Name of Node, If Any Value of Node, If Any
1 element Person  
2 element Name  
3 chars   Willeke,Clint B.
4 endelement Name  
5 element DOB  
6 chars   1925-10-01
7 endelement DOB  
8 endelement Person  
For example, notice that the <DOB> element is considered to be three nodes: an element node, a chars node, and an endelement node. Also notice that the contents of this element are available only as the value of the chars node.
Node Properties
As mentioned earlier, the %XML.TextReader class parses an XML document and creates an text reader object that consists of a set of nodes that correspond to the components of the document; the node types are described in Document Nodes,” earlier in this chapter.
When you change focus to a different node, the properties of the text reader object are updated to contain information about the node that you are currently examining. This section describes all the properties of the %XML.TextReader class.
AttributeCount
If the current node is an element or an attribute, this property indicates the number of attributes of the element. Within a given element, the first attribute is numbered 1.
For any other type of node, this property is 0.
Depth
Indicates the depth of the current node within the document. The root element is at depth 1; items outside the root element are at depth 0. Note that an attribute is at the same depth as the element to which it belongs. Similarly, an error or warning is at the same depth as the item that caused the error or warning.
EOF
True if the reader has reached the end of the source document; false otherwise.
HasAttributes
If the current node is an element, this property is true if that element has attributes (or false if it does not). If the current node is an attribute, this property is true.
For any other type of node, this property is false.
HasValue
True if the current node is a type of node that has a value (even if that value is null). Otherwise this property is false. Specifically, this property is true for the following types of nodes:
Note that HasValue is false for nodes of type error and warning, even though those node types have values.
IsEmptyElement
True if the current node is an element and is empty. Otherwise this property is false.
LocalName
For nodes of type attribute, element, or endelement, this is the name of the current element or attribute, without the namespace prefix. For all other types of nodes, this property is null.
Name
Fully qualified name of the current node, as appropriate for the type of node. The following table gives the details:
Names for Nodes, by Type
Node Type Name and Example
attribute The name of the attribute. For example, if an attribute is:
then Name is:
element
or
endelement
The name of the element. For example, if an element is:
then Name is:
entity The name of the entity.
startprefixmapping
or
endprefixmapping
The prefix, if any. For example, if a namespace declaration is as follows:
xmlns:s01="http://www.root.org"
then Name is:
For another example, if a namespace declaration is as follows:
xmlns="http://www.root.org"
then Name is null.
processinginstruction The target of the processing instruction. For example, if a processing instruction is:
then Name is:
all other types null
NamespaceUri
For nodes of type attribute, element, or endelement, this is the namespace to which attribute or element belongs, if any. For all other types of nodes, this property is null.
NodeType
Type of the current node. See Document Nodes,” earlier in this chapter.
Path
Path to the element. For example, consider the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="mystyles.css"?>
<s01:Root xmlns:s01="http://www.root.org" xmlns="www.default.org">
   <Person>
      <Name>Willeke,Clint B.</Name>
      <DOB>1925-10-01</DOB>
      <GroupID>U3577</GroupID>
      <Address xmlns="www.address.org">
         <City>Newton</City>
         <Zip>56762</Zip>
      </Address>
   </Person>
</s01:Root>
For the City element, the Path property is /s01:Root/Person/Address/City. Other elements are treated similarly.
ReadState
Indicates the overall state of the text reader object, one of the following:
Value
Value, if any, of the current node, as appropriate for the type of node. The following table gives the details:
Values for Nodes, by Type
Node Type Value and Example
attribute The value of the attribute. For example, if an attribute is:
then Value is:
chars The content of the text node. For example, if an element is:
then for the chars node, Value is:
comment The content of the comment. For example, if a comment is:
then Value is:
entity The definition of the entity.
error The error message. For an example, see Performing Validation,” later in this chapter.
ignorablewhitespace The content of the white space.
processinginstruction The entire content of the processing instruction, excluding the target. For example, if a processing instruction is:
then Value is:
startprefixmapping The prefix, followed by a space, followed by the URI. For example, if a namespace declaration is as follows:
xmlns:s01="http://www.root.org"
then Value is:
warning The warning message. For an example, see Performing Validation,” later in this chapter.
all other types (including element) null
seq
The sequence number of this node within the document. The first node is numbered 1. Note that an attribute has the same sequence number as the element to which it belongs.
Argument Lists for the Parse Methods
To specify a document source, you use the ParseFile(), ParseStream(), ParseString(), or ParseURL() method of your text reader. In any case, the source document must be a well-formed XML document; that is, it must obey the basic rules of XML syntax. For these methods, only the first two arguments are required. For reference, these methods have the following arguments, in order:
  1. Filename, Stream, String, or URL — Document source.
  2. TextReader — Text reader object, returned as an output parameter if the method returns $$$OK.
  3. Resolver — An entity resolver to use when parsing the source. See Performing Custom Entity Resolution in the chapter Customizing How the SAX Parser Is Used.
  4. Flags — A flag or combination of flags to control the validation and processing performed by the SAX parser. See Setting the Parser Flags in the chapter Customizing How the SAX Parser Is Used.
  5. Mask — A mask to specify which items are of interest in the XML source. See Specifying the Event Mask in the chapter Customizing How the SAX Parser Is Used.
    Tip:
    For the parsing methods of %XML.TextReader, the default mask is $$$SAXCONTENTEVENTS. Note that this ignores comments. To parse all possible types of nodes, use $$$SAXALLEVENTS for this argument. Note that these macros are defined in the %occSAX.inc include file.
  6. SchemaSpec — A schema specification, against which to validate the document source. This argument is a string that contains a comma-separated list of namespace/URL pairs:
    "namespace URL,namespace URL"
    Here namespace is the XML namespace used for the schema and URL is a URL that gives the location of the schema document. There is a single space character between the namespace and URL values.
  7. KeepWhiteSpace — An option to keep white space or not.
  8. pHttpRequest — (For the ParseURL() method only) A request for the web server, as an instance of %Net.HttpRequest. By default, Caché creates a new instance of %Net.HttpRequest and uses that, but you can instead make a request with a different instance of %Net.HttpRequest. This is useful in the case where you have a pre-existing %Net.HttpRequest with proxy and other properties already set. This option applies only to URLs of type http (not file or ftp, for example).
    For details on %Net.HttpRequest, see the book Using Caché Internet Utilities. Or see the class documentation for %Net.HttpRequest.
Navigating the Document
To navigate through the document, you use the following methods of your text reader: Read(), ReadStartElement(), MoveToAttributeIndex(), MoveToAttributeName(), MoveToElement(), MoveToContent(), and Rewind().
Navigating to the Next Node
To move to the next node in a document, use the Read() method. The Read() method returns a true value until there are no more nodes to read (that is, until the end of the document is reached). The previous examples used this method in a loop like the following:
 While (textreader.Read()) {

...

 }
Navigating to the First Occurrence of a Specific Element
You can move to the first occurrence of a specific element within a document. To do so, use the ReadStartElement() method. This method returns true unless the element is not found. If the element is not found, the method reaches the end of the file.
The ReadStartElement() method takes two arguments: the name of the element and (optionally) the namespace URI. Note that the %XML.TextReader class does not do any processing of namespace prefixes. Therefore the ReadStartElement() method regards the following two elements as having different names:
<Person>Smith,Ellen W. xmlns="http://www.person.org"</Person>

<s01:Person>Smith,Ellen W. xmlns:s01="http://www.person.org"</s01:Person>
Navigating to an Attribute
When you navigate to an element, if that element has attributes, you can navigate to them, in either of two ways:
When you are finished with the attributes for the current element, you can move to the next element in the document by invoking one of the navigation methods such as Read(). Alternatively, you can invoke the MoveToElement() method to return to the element that contains the current attribute.
For example, the following code lists all the attributes for the current node by index number:
 If (textreader.NodeType = "element") {
     // list attributes for this node
     For a = 1:1:textreader.AttributeCount {
         Do textreader.MoveToAttributeIndex(a)
         Write textreader.LocalName," = ",textreader.Value,!
     }
 }
The following code finds the value of the color attribute for the current node:
 If (textreader.NodeType = "element") {
     // find color attribute for this node
     If (textreader.MoveToAttributeName("color")) {
         Write "color = ",textreader.Value,!
     }
 }
Navigating to the Next Node with Content
The MoveToContent() method helps you find content. Specifically:
Rewinding
All the methods described here go forward in a document, except for the Rewind() method, which navigates to the start of the document and resets all properties.
Performing Validation
By default, the source document is validated against any DTD or schema document provided. If the document includes a DTD section, the document is validating against that DTD. To validate against a schema document instead, specify the schema within the argument list for ParseFile(), ParseStream(), ParseString(), or ParseURL(), as described in Argument Lists for the Parse Methods.”
Most types of validation issues are nonfatal and cause either an error or a warning. Specifically, nodes of type "error" or "warning" are automatically added to the document tree, at the location where the error occurred. You can navigate to and inspect these nodes in the same way as any other type of node.
For example, consider the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Root [
  <!ELEMENT Root (Person)>
  <!ELEMENT Person (#PCDATA)>
]>
<Root>
   <Person>Smith,Joe C.</Person>
</Root>
In this case, we do not expect any validation errors. Recall the example method WriteNodes() shown earlier in this chapter. If we used that method to read this document, the output would be as follows:
Node 1 is a(n) element named: Root
    and has no value
Node 2 is a(n) ignorablewhitespace and has no name
    with value:
 
Node 3 is a(n) element named: Person
    and has no value
Node 4 is a(n) chars and has no name
    with value: Smith,Joe C.
Node 5 is a(n) endelement named: Person
    and has no value
Node 6 is a(n) ignorablewhitespace and has no name
    with value:
 
Node 7 is a(n) endelement named: Root
    and has no value
In contrast, suppose that the file looked like this instead:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Root [
  <!ELEMENT Root (Person)>
  <!ELEMENT Person (#PCDATA)>
]>
<Root>
   <Employee>Smith,Joe C.</Employee>
</Root>
In this case, we expect errors because the <Employee> element is not declared in the DTD section. Here, if we use the example method WriteNodes() to read this document, the output would be as follows:
Node 1 is a(n) element named: Root
    and has no value
Node 2 is a(n) ignorablewhitespace and has no name
    with value:
 
Node 3 is a(n) error and has no name
    with value: Unknown element 'Employee' 
while processing c:/TextReader/docwdtd2.txt at line 7 offset 14
Node 4 is a(n) element named: Employee
    and has no value
Node 5 is a(n) chars and has no name
    with value: Smith,Joe C.
Node 6 is a(n) endelement named: Employee
    and has no value
Node 7 is a(n) ignorablewhitespace and has no name
    with value:
 
Node 8 is a(n) error and has no name
    with value: Element 'Employee' is not valid for content model: '(Person)' 
while processing c:/TextReader/docwdtd2.txt at line 8 offset 8
Node 9 is a(n) endelement named: Root
    and has no value
Also see Setting the Parser Flags in the chapter Customizing How the SAX Parser Is Used.
Examples: Namespace Reporting
The following example method reads an arbitrary XML file and indicates the namespaces to which each element and attribute belongs:
ClassMethod ShowNamespacesInFile(filename As %String)
{
  Set status = ##class(%XML.TextReader).ParseFile(filename,.textreader)
  
  //check status
  If $$$ISERR(status) {do $System.Status.DisplayError(status) quit}
  
  //iterate through document, node by node
  While textreader.Read()
  {
    If (textreader.NodeType = "element")
    {
       Write !,"The element ",textreader.LocalName
       Write " is in the namespace ",textreader.NamespaceUri
       }
    If (textreader.NodeType = "attribute")
    {
       Write !,"The attribute ",textreader.LocalName
       Write " is in the namespace ",textreader.NamespaceUri
       }
     }
}
When used in the Terminal, this method produces output like the following:
 
The element Person is in the namespace www://www.person.com
The element Name is in the namespace www://www.person.com
The following variation accepts an XML-enabled object, writes it to a stream, and then uses that stream to generate the same type of report:
ClassMethod ShowNamespacesInObject(obj)
{
  set writer=##class(%XML.Writer).%New()

  set str=##class(%GlobalCharacterStream).%New()
  set status=writer.OutputToStream(str)
  if $$$ISERR(status) {do $System.Status.DisplayError(status) quit ""}

  //write to the stream
  set status=writer.RootObject(obj)
  if $$$ISERR(status) {do $System.Status.DisplayError(status) quit $$$ERROR()}

  Set status = ##class(%XML.TextReader).ParseStream(str,.textreader)
  
  //check status
  If $$$ISERR(status) {do $System.Status.DisplayError(status) quit}
  
  //iterate through document, node by node
  While textreader.Read()
  {
    If (textreader.NodeType = "element")
    {
       Write !,"The element ",textreader.LocalName
       Write " is in the namespace ",textreader.NamespaceUri
       }
    If (textreader.NodeType = "attribute")
    {
       Write !,"The attribute ",textreader.LocalName
       Write " is in the namespace ",textreader.NamespaceUri
       }
     }
  }