About XML

XML (eXtensible Markup Language) provides a set of rules for defining semantic tags that can describe virtually any type of data in a text file. Data stored in XML-format files is both human- and machine-readable, and is often relatively easy to interpret either visually or programmatically. The structure of data stored in an XML file is described by either a Document Type Definition (DTD) or an XML schema, which can either be included in the file itself or referenced from an external network location.

The IDL parsers support the following encodings: UTF-8, USASCII, ISO8859-1, UTF-16, UTF-16BE, UTF-16LE, UCS-4, UCS-4BE, UCS-4LE, WINDOWS-1252, IBM1140, IBM037, and IBM1047.

Note
IDL can parse XML documents that are stored using any of the above encodings. When an IDL application reads string data from the XML document using either the SAX or DOM parser, the string data is transcoded from the document's encoding into the encoding appropriate for IDL string variables. In order to read the string data correctly, the XML string data must be mappable into an IDL string. Since IDL strings use 1-byte characters, the XML strings must be transcodable into strings that use 1 byte per character. Further, they must be transcodable into strings that use the current character encoding. The IDL XML parsers may return an empty string if the XML string data cannot be converted into an IDL string.

It is beyond the scope of this manual to describe XML in detail. Numerous third-party books and electronic resources are available. The following texts may be useful:

About XML Parsers

There are two basic types of parsers for XML data:

Tree-Based Parsers

Tree-based parsers map an XML document into a tree structure in memory, allowing you to select elements by navigating through the tree. This type of parser is generally based on the Document Object Model (DOM) and the tree is often referred to as a DOM tree. The IDLffXMLDOM object classes implement a tree-based parser; for more information, see Using the XML DOM Object Classes.

Tree-based parsers are especially useful when the XML data file being parsed is relatively small. Having access to the entire data set at one time can be convenient and makes processing data based on multiple data values stored in the tree easy. However, if the tree structure is larger than will fit in physical memory or if the data must be converted into a new (local) data structure before use, then tree-based parsers can be slow and cumbersome.

Event-Based Parsers

Event-based parsers read the XML document sequentially and report parsing events (such as the start or end of an element) as they occur, without building an internal representation of the data structure. The most common examples of event-based XML parsers use the Simple API for XML (SAX), and are often referred to as a SAX parsers.

Event-based parsers allow the programmer to write callback routines that perform an appropriate action in response to an event reported by the parser. Using an event-based parser, you can parse very large data files and create application-specific data structures. The IDLffXMLSAX object class implements an event-based parser based on the SAX version 2 API.