Example: Reading Data Into Structures

This example subclasses the IDLffXMLSAX parser object class to create an object class named xml_to_struct. The xml_to_struct object class is designed to read data from an XML file with the following structure:

<Solar_System>   
   <Planet NAME='Mercury'> 
      <Orbit UNITS='kilometers' TYPE='ulong64'>579100000</Orbit> 
      <Period UNITS='days' TYPE='float'>87.97</Period> 
      <Satellites TYPE='int'>0</Satellites> 
   </Planet> 
  ... 
</Solar_System> 

and place those values into an IDL array containing one structure variable for each <Planet> element. We use a structure variable for each <Planet> element so we can capture data of several data types in a single place.

Note
While this example is more complicated than the previous example, it is still rather simple. It is designed to illustrate a method whereby more complex XML data structures can be represented in IDL.

Creating the xml_to_struct Object Class

To read the XML file and return a structure variable, we will need to create an object class definition that inherits from the IDLffXMLSAX object class, and override the following superclass methods: Init, Characters, StartElement, and EndElement. Since this example does not retrieve data using any of the other IDLffXMLSAX methods, we do not need to override those methods. In addition, we will create a new method that allows us to retrieve the structure data from the object instance data.

Notice that the elements of the XML data file include attributes. While we will retrieve and use some of the attribute data from the file, we will ignore some of it.

Note
When parsing an XML data file, you can pick and choose the data you wish to pull into IDL. This ability to selectively retrieve data from the XML file is one of the great advantages of an event-based parser over a tree-based parser.

Example Code
This example is included in the file xml_to_struct__define.pro in the examples/doc/file_io subdirectory of the IDL distribution. Run the example procedure by entering xml_to_struct__define at the IDL command prompt or view the file in an IDL Editor window by entering .EDIT xml_to_struct__define.pro.

Object Class Definition

The following routine is the definition of the xml_to_struct object class:

PRO xml_to_struct__define 
 
void = {PLANET, NAME: "", Orbit: 0ull, period:0.0, Moons:0} 
void = {xml_to_struct, $ 
   INHERITS IDLffXMLSAX, $ 
   CharBuffer:"", $ 
   planetNum:0, $ 
   currentPlanet:{PLANET}, $ 
   Planets : MAKE_ARRAY(9, VALUE = {PLANET})} 
 
END 

The following items should be considered when defining this class structure:

We have explicitly defined our Planets structure field as a nine-element array of PLANET structures, which we can do because we know exactly how many <Planet> elements will be read from our XML file. Specifying the exact size of the data array in the class structure definition is very efficient (since we create the array only once) and eliminates the need to free the pointer in the Cleanup method. However, it has the following consequences:

Init Method

The Init method is called when the an xml_to_struct parser object is created by a call to OBJ_NEW. The following routine is the definition of the Init method:

FUNCTION xml_to_struct::Init 
 
self.planetNum = 0 
RETURN, self->IDLffXMLSAX::Init() 
 
END 

We do two things in this method:

Note
We perform our own initialization task (setting the value of the planetNum field) before calling the superclass's Init method.

See "IDLffXMLSAX::Init" (IDL Reference Guide) for details on the method we are overriding.

Characters Method

The Characters method is called when the xml_to_struct parser encounters character data inside an element. The following routine is the definition of the Characters method:

PRO xml_to_struct::characters, data 
 
self.charBuffer = self.charBuffer + data 
 
END 

As it parses the character data in an element, the parser will read characters until it reaches the end of the text section. Here, we simply add the current characters to the charBuffer field of the object's instance data structure.

See "IDLffXMLSAX::Characters" (IDL Reference Guide) for details on the method we are overriding.

StartElement Method

The StartElement method is called when the xml_to_struct parser encounters the beginning of an XML element. The following routine is the definition of the StartElement method:

PRO xml_to_struct::startElement, URI, local, strName, attrName, 
attrValue 
 
CASE strName OF 
   "Solar_System":   ; Do nothing 
   "Planet" : BEGIN 
      self.currentPlanet = {PLANET, "", 0ull, 0.0, 0} 
      self.currentPlanet.Name = attrValue[0] 
   END 
   "Orbit" : self.charBuffer = '' 
   "Period" : self.charBuffer = '' 
   "Moons" : self.charBuffer = '' 
ENDCASE 
 
END 

Here, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:

See "IDLffXMLSAX::StartElement" (IDL Reference Guide) for details on the method we are overriding.

EndElement Method

The EndElement method is called when the xml_to_struct parser encounters the end of an XML element. The following routine is the definition of the EndElement method:

PRO xml_to_struct::EndElement, URI, Local, strName 
 
CASE strName of 
   "Solar_System": 
   "Planet": BEGIN 
      self.Planets[self.planetNum] = self.currentPlanet 
      self.planetNum = self.planetNum + 1 
   END 
   "Orbit" : self.currentPlanet.Orbit = self.charBuffer 
   "Period" : self.currentPlanet.Period = self.charBuffer 
   "Moons" : self.currentPlanet.Moons= self.charBuffer 
ENDCASE 
 
END 

As with the StartElement method, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:

See "IDLffXMLSAX::EndElement" (IDL Reference Guide) for details on the method we are overriding.

Note
In both the StartElement and EndElement methods, we rely on the validity of the XML data file. Our CASE statements only need to handle the element types described in the XML file's DTD or schema. We do not need an ELSE clause in the CASE statement. If an unknown element is found in the XML file, the parser will report a validation error.

GetArray Method

The GetArray method allows us to retrieve the array of structures stored in the Planets variable. The following routine is the definition of the GetArray method:

FUNCTION xml_to_struct::GetArray 
 
IF (self.planetNum EQ 0) THEN $ 
   RETURN, -1 $ 
ELSE RETURN, self.Planets[0:self.planetNum-1] 
 
END 

Here, we check to see whether the planetNum counter has been incremented. If it has been incremented, we return as the number of array elements specified by the counter. If the counter has not been incremented (indicating that no data has been stored in the array), we return the value -1.

Using the xml_to_struct Parser

To see the xml_to_struct parser in action, you can parse the file planets.xml, found in the examples/data subdirectory of the IDL distribution. The planets.xml file contains the fragment of XML like the one shown at the beginning of this section, and includes a <Planet> element for each planet in the solar system. The planets.xml file also includes a DTD describing the structure of the file.

Enter the following statements at the IDL command line:

xmlObj = OBJ_NEW('xml_to_struct') 
xmlFile = FILEPATH('planets.xml', $ 
   SUBDIRECTORY = ['examples', 'data']) 
xmlObj->ParseFile, xmlFile 
planets = xmlObj->GetArray() 
OBJ_DESTROY, xmlObj 

The variable planets now holds an array of PLANET structures, one for each planet. To print the number of moons for each planet, you could use the following IDL statement:

FOR i = 0, (N_ELEMENTS(planets.Name) - 1) DO $ 
   PRINT, planets[i].Name, planets[i].Moons, $ 
   FORMAT = '(A7, " has ", I2, " moons")' 

IDL prints:

Mercury has  0 moons 
Venus   has  0 moons 
Earth   has  1 moons 
Mars    has  2 moons 
Jupiter has 16 moons 
Saturn  has 18 moons 
Uranus  has 21 moons 
Neptune has  8 moons 
Pluto   has  1 moons 

To view all the information about the planet Mars, you could use the following IDL statement:

HELP, planets[3], /STRUCTURE 

IDL prints:

** Structure PLANET, 4 tags, length=32, data length=26: 
   NAME       STRING      'Mars' 
   ORBIT      ULONG64     227940000 
   PERIOD     FLOAT       686.980 
   MOONS      INT         2