Introduction to Caché XML Tools

This book describes how to use Caché XML tools.

Caché brings the power of objects to XML processing — you can use objects as a direct representation of XML documents and vice versa. Because Caché includes a native object database, you can use such objects directly with a database. Furthermore, Caché provides tools for working with XML documents and DOMs (Document Object Model), even if these are not related to any Caché classes.

This topic discusses the following items:

Representing Object Data in XML

Some of the Caché XML tools are intended for use mainly with XML-enabled classes. To XML-enable a class, you add %XML.AdaptorOpens in a new tab to its superclass list. The %XML.AdaptorOpens in a new tab class enables you to represent instances of that class as XML documents. You add class parameters and property parameters to fine-tune the projection. Because there are so many options, an entire book is devoted to them: Projecting Objects to XML.

For an XML-enabled class, your data can be available in all the following forms:

Contained in class instances. Depending on the class, the data can also possibly be saved to disk, where it is available in all the same ways as other persistent classes.
Contained in XML documents, which could be files, streams, or other documents.
Contained in a DOM (Document Object ModelOpens in a new tab).

The following figure provides an overview of the tools that you use to convert your data among these forms:

The %XML.WriterOpens in a new tab class enables you to create XML documents. The output destination is typically a file or a stream. You identify the objects to include in the output, and the system generates output based on the rules established within the class definitions. For information, see “Writing XML Output from Caché Objects” and “Serving XML from Caché.”

The %XML.ReaderOpens in a new tab class enables you to import a suitable XML document into a class instance. The source is typically a file or a stream. To use this class, you specify a correlation between a class name and an element contained in the XML document. The given element must have the structure that is expected by the corresponding class. Then you read the document, node by node. When you do so, the system creates in-memory instances of that class, containing the data found in the XML document. For information, see “Importing XML into Caché Objects.”

A DOM is also a useful way to work with an XML document. You can use the %XML.ReaderOpens in a new tab class to read an XML document and create a DOM that represents it. In this representation, the DOM is a series of nodes, and you navigate among them as needed. Specifically, you create an instance of %XML.DocumentOpens in a new tab, which represents the document itself and which contains the nodes. Then you use %XML.NodeOpens in a new tab to examine and manipulate the nodes. You can use %XML.WriterOpens in a new tab to write the XML document again, if needed. For information, see “Representing an XML Document as a DOM.”

The Caché XML tools provide many ways to access data in and modify both XML documents and DOMs.

Creating Arbitrary XML

You can also use Caché XML tools to create and work with arbitrary XML — that is, XML that does not map to any Caché class. To create an arbitrary XML document, use %XML.WriterOpens in a new tab. This class provides methods for adding elements, adding attributes, adding namespace declarations, and so on.

To create an arbitrary DOM, use %XML.DocumentOpens in a new tab. This class provides a class method that returns a DOM with a single empty node. Then use instance methods of that class to add nodes as needed.

Or use %XML.ReaderOpens in a new tab to read an arbitrary XML document and then create a DOM from that document.

Accessing Data

The Caché XML tools provide several ways to access data in XML form. The following figure shows a summary:

For any well-formed XML document, you can use the following classes to work with data in that document:

%XML.TextReaderOpens in a new tab — You can use this to read and parse a document, node by node. See “Using %XML.TextReader.”
%XML.XPATH.DocumentOpens in a new tab — You can use this to obtain data, by using an XPATH expression that refers to a specific node in the document. See “Evaluating XPath Expressions.”

In Caché, a DOM is an instance of %XML.DocumentOpens in a new tab. This instance represents the document itself and contains the nodes. You use the properties and methods of this class to retrieve values from the DOM. You use %XML.NodeOpens in a new tab to examine and manipulate the nodes. For information, see “Representing an XML Document as a DOM.”

Modifying XML

The Caché XML tools also provide ways to modify data in XML form. The following figure shows a summary:

For an XML document, you can use class methods in %XML.XSLT.TransformerOpens in a new tab to perform XSLT transformations and obtain a modified version of the document. See “Performing XSLT Transformations.”

For a DOM, you can use methods of %XML.DocumentOpens in a new tab to modify the DOM. For example, you can add or remove elements or attributes.

The SAX Parser

Caché XML Tools use the Caché SAX (Simple API for XML) Parser. This is a built-in SAX XML validating parser using the standard Xerces library. SAX is a parsing engine that provides complete XML validation and document parsing. Caché SAX communicates with a Caché process using a high-performance, in-process call-in mechanism. Using this parser, you can process XML documents using either the built-in Caché XML support or by providing your own custom SAX interface classes within Caché.

For special applications, you can customize the Caché XML support. This includes an easy way to create custom XML server code; see “Serving XML from Caché.” You can also create custom entity resolvers and content handlers; see “Customizing How the SAX Parser Is Used.”

You can validate any incoming XML using industry-standard XML DTD or schema validation. You can also specify which XML items to parse. See “Customizing How the SAX Parser Is Used.”

Additional XML Tools

Caché XML support includes the following additional tools:

The XML Schema Wizard reads an XML schema document and generates a set of XML-enabled classes that correspond to the types defined in the schema. You specify a package to contain the classes, as well as various options that control the details of the class definitions. See “Generating Classes from XML Schemas.”
The %XML.SchemaOpens in a new tab class enables you to generate an XML schema from a set of XML-enabled classes. See “Generating XML Schemas from Classes.”
The %XML.NamespacesOpens in a new tab class enables you to examine the XML namespaces and the classes in them, for a Caché namespace. See “Examining Namespaces and Classes.”
The %XML.Security.EncryptedDataOpens in a new tab class and other classes enable you to encrypt XML documents, as well as decrypt encrypted documents. See “Encrypting XML Documents.”
The %XML.Security.SignatureOpens in a new tab class and other classes enable you to digitally sign XML documents, as well validate digital signatures. See “Signing XML Documents.”

Considerations When Using the XML Tools

When you work with XML tools of any kind, there are at least three general points to consider:

Character Encoding of Input and Output

When you export an XML document, you can specify the character encoding to use; otherwise, Cache chooses the encoding, depending on the destination:

If the output destination is a file or a binary stream, the default is "UTF-8".
If the output destination is a string or a character stream, the default depends on the Caché system:
- On a Unicode Caché system, the default is "UTF-16".
- On an 8–bit Caché system, the default is the default character set of the locale.

For any XML document read by Caché, the XML declaration of the document should indicate the character encoding of that file, and the document should be encoded as declared. For example:

<?xml version="1.0" encoding="UTF-16"?>

However, if the character encoding is not declared in the document, Caché assumes the following:

If the document is a file or a binary stream, Caché assumes that the character set is "UTF-8".
If the document is a string or a character stream, then:
- On a Unicode Caché system, Caché assumes the character set is "UTF-16".
- On an 8–bit Caché system, Caché assumes the character set is the default character set of the locale.

For background information on character translation in Caché, see “Localization Support” in the Caché Programming Orientation Guide.

Choosing a Document Format

When you work with an XML document, you must know the format to use when mapping the document to Caché classes. Similarly, when you create an XML document, you specify the document format to use when writing the document. The XML document formats are as follows:

Literal means that the document is a literal copy of the object instance. In most cases, you use literal format, even when working with SOAP.
Except where otherwise noted, the examples in the documentation use literal format.
Encoded means encoded as described in the SOAP 1.1 standard or the SOAP 1.2 standard. For links to these standards, see “Standards Supported in Caché,” later in this topic.
The details are slightly different for SOAP 1.1 and SOAP 1.2.

The following subsections show the differences between these document formats.

Literal Format

The following sample shows an XML document in literal format:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
   <Person>
      <Name>Klingman,Julie G.</Name>
      <DOB>1946-07-21</DOB>
      <GroupID>W897</GroupID>
      <Address>
         <City>Bensonhurst</City>
         <Zip>60302</Zip>
      </Address>
      <Doctors>
         <DoctorClass>
            <Name>Jung,Kirsten K.</Name>
         </DoctorClass>
         <DoctorClass>
            <Name>Xiang,Charles R.</Name>
         </DoctorClass>
         <DoctorClass>
            <Name>Frith,Terry R.</Name>
         </DoctorClass>
      </Doctors>
   </Person>
</Root>

Encoded Format

In contrast, the following example shows the same data in encoded format:

<?xml version="1.0" encoding="UTF-8"?>
<Root xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" 
xmlns:s="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
   <DoctorClass id="id2" xsi:type="DoctorClass">
      <Name>Jung,Kirsten K.</Name>
   </DoctorClass>
...
   <DoctorClass id="id3" xsi:type="DoctorClass">
      <Name>Quixote,Umberto D.</Name>
   </DoctorClass>
...
   <DoctorClass id="id8" xsi:type="DoctorClass">
      <Name>Chadwick,Mark L.</Name>
   </DoctorClass>
...
   <Person>
      <Name>Klingman,Julie G.</Name>
      <DOB>1946-07-21</DOB>
      <GroupID>W897</GroupID>
      <Address href="#id17" />
      <Doctors SOAP-ENC:arrayType="DoctorClass[3]">
         <DoctorClass href="#id8" />
         <DoctorClass href="#id2" />
         <DoctorClass href="#id3" />
      </Doctors>
   </Person>
   <AddressClass id="id17" xsi:type="s_AddressClass">
      <City>Bensonhurst</City>
      <Zip>60302</Zip>
   </AddressClass>
...
</Root>

Note the following differences in the encoded version:

The root element of the output includes declarations for the SOAP encoding namespace and other standard namespaces.
This document includes person, address, and doctor elements all at the same level. The address and doctor elements are listed with unique IDs that are used by the person elements that refer to them. Each object-valued property is treated this way.
The names of the top-level address and doctor elements are named the same as the respective classes, rather than being named the same as the property that refers to them.
Encoded format does not include any attributes. The GroupID property is mapped as an attribute in the Person class. In literal format, this property is projected as an attribute. In the encoded version, however, the property is projected as an element.
Collections are treated differently. For example, the list element has the attribute ENC:arrayType.
Each element has a value for the xsi:type attribute.

Note:

For SOAP 1.2, the encoded version is slightly different. To easily distinguish the versions, check the declaration for the SOAP encoding namespace:

For SOAP 1.1, the SOAP encoding namespace is "http://schemas.xmlsoap.org/soap/encoding/"
For SOAP 1.2, the SOAP encoding namespace is "http://schemas.xmlsoap.org/wsdl/soap12/"

Parser Behavior

The Caché SAX Parser is used whenever Caché reads an XML document, so it is useful to know its default behavior. Among other tasks, the parser does the following:

It verifies whether the XML document is well-formed.
It attempts to validate the document, using the given schema or DTD.
Here it is useful to remember that a schema can contain <import> and <include> elements that refer to other schemas. For example:
```
<xsd:import namespace="target-namespace-of-the-importing-schema"
                  schemaLocation="uri-of-the-schema"/>

<xsd:include schemaLocation="uri-of-the-schema"/>
```
The validation fails unless these other schemas are available to the parser. Especially with WSDL documents, it is sometimes necessary to download all the schemas and edit the primary schema to use the corrected locations.
It attempts to resolve all entities, including all external entities. (Other XML parsers do this as well.) This process can be time-consuming, depending on their locations. In particular, Xerces uses a network accessor to resolve some URLs, and the implementation uses blocking I/O. Consequently, there is no timeout and network fetches can hang in error conditions, which have been rare in practice.
Also, Xerces does not support https; that is, it cannot resolve entities that are at https locations.
If needed, you can create custom entity resolvers and you can disable entity resolution; see “Customizing How the SAX Parser Is Used.”

Standards Supported in Caché

Caché XML support follows these standards:

XML 1.0 (https://www.w3.org/TR/REC-xml/Opens in a new tab)
Namespaces in XML 1.0 (https://www.w3.org/TR/REC-xml-names/Opens in a new tab)
XML Schema 1.0 (https://www.w3.org/TR/xmlschema-0/Opens in a new tab, https://www.w3.org/TR/xmlschema-1/Opens in a new tab, https://www.w3.org/TR/xmlschema-2/Opens in a new tab)
XPath 1.0 as specified by https://www.w3.org/TR/xpathOpens in a new tab
SOAP 1.1 encoding as specified by section 5 of the SOAP 1.1 standard.
SOAP 1.2 encoding as specified by section 3 Part 2: Adjuncts (https://www.w3.org/TR/soap12-part2/Opens in a new tab) of the SOAP 1.2 standard.
For more information on SOAP, see the W3 web site (for example, https://www.w3.org/TR/2003/REC-soap12-part1-20030624/Opens in a new tab).
XML Canonicalization Version 1.0 (also known as inclusive canonicalization), as specified by https://www.w3.org/TR/xml-c14nOpens in a new tab.
XML Exclusive Canonicalization Version 1.0 as specified by https://www.w3.org/TR/xml-exc-c14n/Opens in a new tab, including the InclusiveNamespaces PrefixList feature (https://www.w3.org/TR/xml-exc-c14n/#def-InclusiveNamespaces-PrefixListOpens in a new tab)
XML Encryption (https://www.w3.org/TR/xmlenc-core/Opens in a new tab)
Caché supports key encryption using RSA-OAEP or RSA-1.5 and data encryption of the message body using AES-128, AES-192, or AES-256.
XML Signature using Exclusive XML Canonicalization and RSA SHA-1 (https://www.w3.org/TR/xmldsig-core/Opens in a new tab)

The Caché SAX Parser uses the standard Xerces-C++ library, which complies with the XML 1.0 recommendation. For a list of these standards, see http://xml.apache.org/xerces-c/Opens in a new tab.

Caché provides two XSLT processors:

The Xalan processor supports XSLT 1.0.
The Saxon processor supports XSLT 2.0.

For information on additional standards related to web services and clients, see Creating Web Services and Web Clients in Caché and Securing Caché Web Services.

For information on the character sets expected in XML, see the W3 web site (https://www.w3.org/TR/2006/REC-xml-20060816/#charsetsOpens in a new tab).

Note:

If you have enabled support for long string operations, an attribute can be larger than 32 KB. Otherwise, each attribute must be less than 32 KB. Also, Caché XML does not support, within one element, multiple attributes with the same name, each in a different namespace.