Example: Reading Data Into Structures
This example subclasses the IDLffXMLSAX parser object class to create an object class named xml_to_struct. The xml_to_struct object class is designed to read data from an XML file with the following structure:
<Solar_System> <Planet NAME='Mercury'> <Orbit UNITS='kilometers' TYPE='ulong64'>579100000</Orbit> <Period UNITS='days' TYPE='float'>87.97</Period> <Satellites TYPE='int'>0</Satellites> </Planet> ... </Solar_System>
and place those values into an IDL array containing one structure variable for each <Planet> element. We use a structure variable for each <Planet> element so we can capture data of several data types in a single place.
Note
While this example is more complicated than the previous example, it is still rather simple. It is designed to illustrate a method whereby more complex XML data structures can be represented in IDL.
Creating the xml_to_struct Object Class
To read the XML file and return a structure variable, we will need to create an object class definition that inherits from the IDLffXMLSAX object class, and override the following superclass methods: Init, Characters, StartElement, and EndElement. Since this example does not retrieve data using any of the other IDLffXMLSAX methods, we do not need to override those methods. In addition, we will create a new method that allows us to retrieve the structure data from the object instance data.
Notice that the elements of the XML data file include attributes. While we will retrieve and use some of the attribute data from the file, we will ignore some of it.
Note
When parsing an XML data file, you can pick and choose the data you wish to pull into IDL. This ability to selectively retrieve data from the XML file is one of the great advantages of an event-based parser over a tree-based parser.
Example Code
This example is included in the file xml_to_struct__define.pro in the examples/doc/file_io subdirectory of the IDL distribution. Run the example procedure by entering xml_to_struct__define at the IDL command prompt or view the file in an IDL Editor window by entering .EDIT xml_to_struct__define.pro.
Object Class Definition
The following routine is the definition of the xml_to_struct object class:
PRO xml_to_struct__define void = {PLANET, NAME: "", Orbit: 0ull, period:0.0, Moons:0} void = {xml_to_struct, $ INHERITS IDLffXMLSAX, $ CharBuffer:"", $ planetNum:0, $ currentPlanet:{PLANET}, $ Planets : MAKE_ARRAY(9, VALUE = {PLANET})} END
The following items should be considered when defining this class structure:
- Before creating the object class structure, we define a structure named PLANET. We will use the PLANET structure to store data from the
<Planet>elements of the XML file. - The object class structure definition uses the
INHERITSkeyword to inherit the object class structure and methods of the IDLffXMLSAX object. - The
charBufferstructure field is set equal to a string value. We will use this field to accumulate character data stored in XML elements. - The
planetNumstructure field is set equal to an integer value. We will use this field to keep track of which array element we are currently populating. - The
currentPlanetstructure field is set equal to a PLANET structure. - The
Planetsstructure field is set equal to a nine-element array of PLANET structures. - The routine name is created by adding the string "
__define" (note the two underscore characters) to the class name.
We have explicitly defined our Planets structure field as a nine-element array of PLANET structures, which we can do because we know exactly how many <Planet> elements will be read from our XML file. Specifying the exact size of the data array in the class structure definition is very efficient (since we create the array only once) and eliminates the need to free the pointer in the Cleanup method. However, it has the following consequences:
- We must explicitly keep track of the index of the array element we are populating, and increment it after we have finished with a given element (see the
EndElementmethod below). - We must know in advance how many elements the array will hold. If the size of the final array is unknown, it is more efficient to use a pointer to an array, as we did in the previous example, and allow the array to grow as elements are added. See Building Complex Data Structures for additional discussion of ways to configure the instance data structure.
Note
Although we describe this routine here first, the xml_to_struct__define routine must be the last routine in the xml_to_struct__define.pro file.
Init Method
The Init method is called when the an xml_to_struct parser object is created by a call to OBJ_NEW. The following routine is the definition of the Init method:
We do two things in this method:
- We initialize the
planetNumfield with the value of zero. We will increment this value as we populate thePlanetsarray. - The return value from this function is the return value of the superclass's
Initmethod, called on theselfobject reference.
Note
Within a method, we can refer to the class structure variable with the implicit parameter self. Remember self is actually a reference to the xml_to_struct object instance.
Note
We perform our own initialization task (setting the value of the planetNum field) before calling the superclass's Init method.
See "IDLffXMLSAX::Init" (IDL Reference Guide) for details on the method we are overriding.
Characters Method
The Characters method is called when the xml_to_struct parser encounters character data inside an element. The following routine is the definition of the Characters method:
As it parses the character data in an element, the parser will read characters until it reaches the end of the text section. Here, we simply add the current characters to the charBuffer field of the object's instance data structure.
See "IDLffXMLSAX::Characters" (IDL Reference Guide) for details on the method we are overriding.
StartElement Method
The StartElement method is called when the xml_to_struct parser encounters the beginning of an XML element. The following routine is the definition of the StartElement method:
PRO xml_to_struct::startElement, URI, local, strName, attrName, attrValue CASE strName OF "Solar_System": ; Do nothing "Planet" : BEGIN self.currentPlanet = {PLANET, "", 0ull, 0.0, 0} self.currentPlanet.Name = attrValue[0] END "Orbit" : self.charBuffer = '' "Period" : self.charBuffer = '' "Moons" : self.charBuffer = '' ENDCASE END
Here, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:
- If the element is a
<Solar_System>element, we do nothing. - If the element is a
<Planet>element, we do the following things: - Set the value of the
currentPlanetfield of theselfinstance data structure equal to a PLANET structure, setting the values of the structure fields to zero values. - Set the value of the
Namefield of the PLANET structure held in thecurrentPlanetfield equal to the value of theNameattribute of the element. This field contains the name of the planet whose data we are reading. - If the element is an
<Orbit>,<Period>, or<Moons>element, we reinitialize the value of thecharBufferfield of theselfinstance data structure.
See "IDLffXMLSAX::StartElement" (IDL Reference Guide) for details on the method we are overriding.
EndElement Method
The EndElement method is called when the xml_to_struct parser encounters the end of an XML element. The following routine is the definition of the EndElement method:
PRO xml_to_struct::EndElement, URI, Local, strName CASE strName of "Solar_System": "Planet": BEGIN self.Planets[self.planetNum] = self.currentPlanet self.planetNum = self.planetNum + 1 END "Orbit" : self.currentPlanet.Orbit = self.charBuffer "Period" : self.currentPlanet.Period = self.charBuffer "Moons" : self.currentPlanet.Moons= self.charBuffer ENDCASE END
As with the StartElement method, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:
- If the element is a
<Solar_System>element, we do nothing. - If the element is a
<Planet>element, we set the element of thePlanetsarray specified byplanetNumequal to the PLANET structure contained incurrentPlanet. Then, we increment theplanetNumcounter. - If the element is an
<Orbit>,<Period>, or<Satellites>element, we place the value in thecharBufferfield into the appropriate field within the PLANET structure contained incurrentPlanet.
See "IDLffXMLSAX::EndElement" (IDL Reference Guide) for details on the method we are overriding.
Note
In both the StartElement and EndElement methods, we rely on the validity of the XML data file. Our CASE statements only need to handle the element types described in the XML file's DTD or schema. We do not need an ELSE clause in the CASE statement. If an unknown element is found in the XML file, the parser will report a validation error.
GetArray Method
The GetArray method allows us to retrieve the array of structures stored in the Planets variable. The following routine is the definition of the GetArray method:
FUNCTION xml_to_struct::GetArray IF (self.planetNum EQ 0) THEN $ RETURN, -1 $ ELSE RETURN, self.Planets[0:self.planetNum-1] END
Here, we check to see whether the planetNum counter has been incremented. If it has been incremented, we return as the number of array elements specified by the counter. If the counter has not been incremented (indicating that no data has been stored in the array), we return the value -1.
Using the xml_to_struct Parser
To see the xml_to_struct parser in action, you can parse the file planets.xml, found in the examples/data subdirectory of the IDL distribution. The planets.xml file contains the fragment of XML like the one shown at the beginning of this section, and includes a <Planet> element for each planet in the solar system. The planets.xml file also includes a DTD describing the structure of the file.
Enter the following statements at the IDL command line:
xmlObj = OBJ_NEW('xml_to_struct')
xmlFile = FILEPATH('planets.xml', $
SUBDIRECTORY = ['examples', 'data'])
xmlObj->ParseFile, xmlFile
planets = xmlObj->GetArray()
OBJ_DESTROY, xmlObj
The variable planets now holds an array of PLANET structures, one for each planet. To print the number of moons for each planet, you could use the following IDL statement:
FOR i = 0, (N_ELEMENTS(planets.Name) - 1) DO $ PRINT, planets[i].Name, planets[i].Moons, $ FORMAT = '(A7, " has ", I2, " moons")'
IDL prints:
Mercury has 0 moons Venus has 0 moons Earth has 1 moons Mars has 2 moons Jupiter has 16 moons Saturn has 18 moons Uranus has 21 moons Neptune has 8 moons Pluto has 1 moons
To view all the information about the planet Mars, you could use the following IDL statement:
IDL prints:
** Structure PLANET, 4 tags, length=32, data length=26: NAME STRING 'Mars' ORBIT ULONG64 227940000 PERIOD FLOAT 686.980 MOONS INT 2