Example: Reading Data Into an Array

This example subclasses the IDLffXMLSAX parser object class to create an object class named xml_to_array. The xml_to_array object class is designed to read numerical values from an XML file with the following structure:

<array>   
  <number>0</number> 
  <number>1</number> 
  ... 
</array> 

and place those values into an IDL array variable.

Note
This example is a very simple example. It is designed to illustrate how an event-based XML parser is constructed using the IDLffXMLSAX object class. An application that reads real data from an XML file will most likely be quite a bit more complicated.

Creating the xml_to_array Object Class

In order to read the XML file and return an array variable, we will need to create an object class definition that inherits from the IDLffXMLSAX object class, and override the following superclass methods: Init, Cleanup, StartDocument, Characters, StartElement, and EndElement. Since this example does not retrieve data using any of the other IDLffXMLSAX methods, we do not need to override those methods. In addition, we will create a new method that allows us to retrieve the array data from the object instance data.

Example Code
This example is included in the file xml_to_array__define.pro in the examples/doc/file_io subdirectory of the IDL distribution. Run the example procedure by entering xml_to_array__define at the IDL command prompt or view the file in an IDL Editor window by entering .EDIT xml_to_array__define.pro.

Object Class Definition

The following routine is the definition of the xml_to_array object class:

PRO xml_to_array__define 
 
void = {xml_to_array, $ 
   INHERITS IDLffXMLSAX, $ 
   charBuffer:'', $ 
   pArray:PTR_NEW()}  
END 

The following items should be considered when defining this class structure:

Why do we store the array data in a pointer variable? Because the fields of a named structure (xml_to_array, in this case) must always contain the same type of data as when that structure was defined. Since we want to be able to add values to the data array as we parse the XML file, we will need to extend the array with each new value. If we began by defining the size of the array in the structure variable, we would not be able to extend the array. By holding the data array in a pointer, we can extend the array without changing the format of the xml_to_array object class structure.

Note
Although we describe this routine first here, the xml_to_array__define routine must be the last routine in the xml_to_array__define.pro file.

Init Method

The Init method is called when the an xml_to_array parser object is created by a call to OBJ_NEW. The following routine is the definition of the Init method:

FUNCTION xml_to_array::Init 
  self.pArray = PTR_NEW(/ALLOCATE_HEAP) 
  RETURN, self->IDLffxmlsax::Init() 
END 

We do two things in this method:

Note
The initialization task (setting the value of the pArray field) is performed before calling the superclass's Init method.

See "IDLffXMLSAX::Init" (IDL Reference Guide) for details on the method we are overriding.

Cleanup Method

The Cleanup method is called when the xml_to_array parser object is destroyed by a call to OBJ_DESTROY. The following routine is the definition of the Cleanup method:

PRO xml_to_array::Cleanup 
 
IF (PTR_VALID(self.pArray)) THEN PTR_FREE, self.pArray 
 
self->IDLffXMLSAX::Cleanup 
 
END 

Here, we release the pArray pointer, if it exists, and call the superclass cleanup method.

See "IDLffXMLSAX::Cleanup" (IDL Reference Guide) for details on the method we are overriding.

Characters Method

The Characters method is called when the xml_to_array parser encounters character data inside an element. The following routine is the definition of the Characters method:

PRO xml_to_array::characters, data 
 
self.charBuffer = self.charBuffer + data 
 
END 

As it parses the character data in an element, the parser will read characters until it reaches the end of the text section. Here, we simply add the current characters to the charBuffer field of the object's instance data structure.

See "IDLffXMLSAX::Characters" (IDL Reference Guide) for details on the method we are overriding.

StartDocument Method

The StartDocument method is called when the xml_to_array parser encounters the beginning of the XML document. The following routine is the definition of the StartDocument method:

PRO xml_to_array::StartDocument 
 
IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ 
   void = TEMPORARY(*self.pArray) 
 
END 

Here, we check to see if the array pointed at by the pArray pointer contains any data. Since we are just beginning to parse the XML document at this point, it should not contain any data. If data is present, we reinitialize the array using the TEMPORARY function.

Note
Since pArray is a pointer, we must use dereferencing syntax to refer to the array.

See "IDLffXMLSAX::StartDocument" (IDL Reference Guide) for details on the method we are overriding.

StartElement Method

The StartElement method is called when the xml_to_array parser encounters the beginning of an XML element. The following routine is the definition of the StartElement method:

PRO xml_to_array::startElement, URI, local, strName, attr, value 
 
CASE strName OF 
   "array": BEGIN 
      IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ 
      void = TEMPORARY(*self.pArray);; clear out memory 
   END 
   "number" : BEGIN 
      self.charBuffer = '' 
   END 
ENDCASE 
 
END 

Here, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:

See "IDLffXMLSAX::StartElement" (IDL Reference Guide) for details on the method we are overriding.

EndElement Method

The EndElement method is called when the xml_to_array parser encounters the end of an XML element. The following routine is the definition of the EndElement method:

PRO xml_to_array::EndElement, URI, Local, strName 
 
CASE strName OF 
   "array": 
   "number": BEGIN 
      idata = FIX(self.charBuffer); 
      IF (N_ELEMENTS(*self.pArray) EQ 0) THEN $ 
         *self.pArray = iData $ 
      ELSE $ 
         *self.pArray = [*self.pArray,iData] 
   END 
ENDCASE  
 
END 

As with the StartElement method, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:

See "IDLffXMLSAX::EndElement" (IDL Reference Guide) for details on the method we are overriding.

Note
In both the StartElement and EndElement methods, we rely on the validity of the XML data file. Our CASE statements only need to handle the element types described in the XML file's DTD or schema (in this case, the only elements are <array> and <number>). We do not need an ELSE clause in the CASE statement. If an unknown element is found in the XML file, the parser will report a validation error.

GetArray Method

The GetArray method allows us to retrieve the array data stored in the pArray pointer variable. The following routine is the definition of the GetArray method:

FUNCTION xml_to_array::GetArray 
 
IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ 
   RETURN, *self.pArray $ 
ELSE RETURN , -1 
 
END 

Here, we check to see whether the array pointed at by pArray contains any data. If it does contain data, we return the array. If the array contains no data, we return the value -1.

Using the xml_to_array Parser

To see the xml_to_array parser in action, you can parse the file num_array.xml, found in the examples/data subdirectory of the IDL distribution. This num_array.xml file contains the fragment of XML like the one shown in the beginning of this section, and includes 20 extra <number> elements. The num_array.xml file also includes a DTD describing the structure of the file.

Enter the following statements at the IDL command line:

xmlObj = OBJ_NEW('xml_to_array') 
xmlFile = FILEPATH('num_array.xml', $ 
   SUBDIRECTORY = ['examples', 'data']) 
xmlObj->ParseFile, xmlFile 
myArray = xmlObj->GetArray() 
OBJ_DESTROY, xmlObj 
HELP, myArray 
PRINT, myArray 

IDL prints:

MYARRAY         INT       = Array[20] 
 0   1   2   3   4   5   6   7   8   9   10   11 
12  13  14  15  16  17  18  19