Example: Reading Data Into an Array
This example subclasses the IDLffXMLSAX parser object class to create an object class named xml_to_array. The xml_to_array object class is designed to read numerical values from an XML file with the following structure:
and place those values into an IDL array variable.
Note
This example is a very simple example. It is designed to illustrate how an event-based XML parser is constructed using the IDLffXMLSAX object class. An application that reads real data from an XML file will most likely be quite a bit more complicated.
Creating the xml_to_array Object Class
In order to read the XML file and return an array variable, we will need to create an object class definition that inherits from the IDLffXMLSAX object class, and override the following superclass methods: Init, Cleanup, StartDocument, Characters, StartElement, and EndElement. Since this example does not retrieve data using any of the other IDLffXMLSAX methods, we do not need to override those methods. In addition, we will create a new method that allows us to retrieve the array data from the object instance data.
Example Code
This example is included in the file xml_to_array__define.pro in the examples/doc/file_io subdirectory of the IDL distribution. Run the example procedure by entering xml_to_array__define at the IDL command prompt or view the file in an IDL Editor window by entering .EDIT xml_to_array__define.pro.
Object Class Definition
The following routine is the definition of the xml_to_array object class:
PRO xml_to_array__define void = {xml_to_array, $ INHERITS IDLffXMLSAX, $ charBuffer:'', $ pArray:PTR_NEW()} END
The following items should be considered when defining this class structure:
- The structure definition uses the INHERITS keyword to inherit the object class structure and methods of the IDLffXMLSAX object.
- The
charBufferstructure field is set equal to an empty string. - The
pArraystructure field is set equal to an IDL pointer. We will use this pointer to store the numerical array data we retrieve. - The routine name is created by adding the string "
__define" (note the two underscore characters) to the class name.
Why do we store the array data in a pointer variable? Because the fields of a named structure (xml_to_array, in this case) must always contain the same type of data as when that structure was defined. Since we want to be able to add values to the data array as we parse the XML file, we will need to extend the array with each new value. If we began by defining the size of the array in the structure variable, we would not be able to extend the array. By holding the data array in a pointer, we can extend the array without changing the format of the xml_to_array object class structure.
Note
Although we describe this routine first here, the xml_to_array__define routine must be the last routine in the xml_to_array__define.pro file.
Init Method
The Init method is called when the an xml_to_array parser object is created by a call to OBJ_NEW. The following routine is the definition of the Init method:
FUNCTION xml_to_array::Init self.pArray = PTR_NEW(/ALLOCATE_HEAP) RETURN, self->IDLffxmlsax::Init() END
We do two things in this method:
- We initialize the pointer in the
pArrayfield of the class structure variable. - The return value from this function is the return value of the superclass's
Initmethod, called on theselfobject reference.
Note
Within a method, we can refer to the class structure variable with the implicit parameter self. Remember that self is actually a reference to the xml_to_array object instance.
Note
The initialization task (setting the value of the pArray field) is performed before calling the superclass's Init method.
See "IDLffXMLSAX::Init" (IDL Reference Guide) for details on the method we are overriding.
Cleanup Method
The Cleanup method is called when the xml_to_array parser object is destroyed by a call to OBJ_DESTROY. The following routine is the definition of the Cleanup method:
PRO xml_to_array::Cleanup IF (PTR_VALID(self.pArray)) THEN PTR_FREE, self.pArray self->IDLffXMLSAX::Cleanup END
Here, we release the pArray pointer, if it exists, and call the superclass cleanup method.
See "IDLffXMLSAX::Cleanup" (IDL Reference Guide) for details on the method we are overriding.
Characters Method
The Characters method is called when the xml_to_array parser encounters character data inside an element. The following routine is the definition of the Characters method:
As it parses the character data in an element, the parser will read characters until it reaches the end of the text section. Here, we simply add the current characters to the charBuffer field of the object's instance data structure.
See "IDLffXMLSAX::Characters" (IDL Reference Guide) for details on the method we are overriding.
StartDocument Method
The StartDocument method is called when the xml_to_array parser encounters the beginning of the XML document. The following routine is the definition of the StartDocument method:
PRO xml_to_array::StartDocument IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ void = TEMPORARY(*self.pArray) END
Here, we check to see if the array pointed at by the pArray pointer contains any data. Since we are just beginning to parse the XML document at this point, it should not contain any data. If data is present, we reinitialize the array using the TEMPORARY function.
Note
Since pArray is a pointer, we must use dereferencing syntax to refer to the array.
See "IDLffXMLSAX::StartDocument" (IDL Reference Guide) for details on the method we are overriding.
StartElement Method
The StartElement method is called when the xml_to_array parser encounters the beginning of an XML element. The following routine is the definition of the StartElement method:
PRO xml_to_array::startElement, URI, local, strName, attr, value CASE strName OF "array": BEGIN IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ void = TEMPORARY(*self.pArray);; clear out memory END "number" : BEGIN self.charBuffer = '' END ENDCASE END
Here, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:
- If the element is an
<array>element, we check to see if the array pointed at by thepArraypointer is empty. Since we are just beginning to read the array data at this point, there should be no data. If data already exists, we reinitialize the array using the TEMPORARY function. - If the element is a
<number>element, we reinitialize thecharBufferfield. Since we are just beginning to read the number data, nothing should be in the buffer.
See "IDLffXMLSAX::StartElement" (IDL Reference Guide) for details on the method we are overriding.
EndElement Method
The EndElement method is called when the xml_to_array parser encounters the end of an XML element. The following routine is the definition of the EndElement method:
PRO xml_to_array::EndElement, URI, Local, strName CASE strName OF "array": "number": BEGIN idata = FIX(self.charBuffer); IF (N_ELEMENTS(*self.pArray) EQ 0) THEN $ *self.pArray = iData $ ELSE $ *self.pArray = [*self.pArray,iData] END ENDCASE END
As with the StartElement method, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:
- If the element is an
<array>element, we do nothing. - If the element is a
<number>element, we must get the data stored in thecharBufferfield of the instance data structure and place it in the array: - First, we convert the string data in the
charBufferinto an IDL integer. - Next, we check to see if the array pointed at by
pArrayis empty. If it is empty, we simply set the array equal to the data value we retrieved from thecharBuffer. - If the array pointed at by
pArrayis not empty, we redefine the array to include the new data retrieved from thecharBuffer.
See "IDLffXMLSAX::EndElement" (IDL Reference Guide) for details on the method we are overriding.
Note
In both the StartElement and EndElement methods, we rely on the validity of the XML data file. Our CASE statements only need to handle the element types described in the XML file's DTD or schema (in this case, the only elements are <array> and <number>). We do not need an ELSE clause in the CASE statement. If an unknown element is found in the XML file, the parser will report a validation error.
GetArray Method
The GetArray method allows us to retrieve the array data stored in the pArray pointer variable. The following routine is the definition of the GetArray method:
FUNCTION xml_to_array::GetArray IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ RETURN, *self.pArray $ ELSE RETURN , -1 END
Here, we check to see whether the array pointed at by pArray contains any data. If it does contain data, we return the array. If the array contains no data, we return the value -1.
Using the xml_to_array Parser
To see the xml_to_array parser in action, you can parse the file num_array.xml, found in the examples/data subdirectory of the IDL distribution. This num_array.xml file contains the fragment of XML like the one shown in the beginning of this section, and includes 20 extra <number> elements. The num_array.xml file also includes a DTD describing the structure of the file.
Enter the following statements at the IDL command line:
xmlObj = OBJ_NEW('xml_to_array')
xmlFile = FILEPATH('num_array.xml', $
SUBDIRECTORY = ['examples', 'data'])
xmlObj->ParseFile, xmlFile
myArray = xmlObj->GetArray()
OBJ_DESTROY, xmlObj
HELP, myArray
PRINT, myArray
IDL prints: