Associated Input/Output
Unformatted data stored in files often consists of a repetitive series of arrays or structures. A common example is a series of images. IDL-associated file variables offer a convenient and efficient way to access such data.
An associated variable is a variable that maps the structure of an IDL array or structure variable onto the contents of a file. The file is treated as an array of these repeating units of data. The first array or structure in the file has an index of zero, the second has index one, and so on. Such variables do not keep data in memory like a normal variable. Instead, when an associated variable is subscripted with the index of the desired array or structure within the file, IDL performs the input/output operation required to access the data.
When their use is appropriate (the file consists of a sequence of identical arrays or structures), associated file variables offer the following advantages over READU and WRITEU for unformatted input/output:
- Input/output occurs when an associated file variable is subscripted. Thus, it is possible to perform input/output within an expression without a separate input/output statement.
- The size of the data set is limited primarily by the maximum possible size of the file containing the data instead of the maximum memory available. Data sets too large for memory can be accessed.
- There is no need to declare the maximum number of arrays or structures contained in the file.
- Associated variables offer transparent access to data. Direct access to any element in the file is rapid and simple—there is no need to calculate offsets into the file and/or position the file pointer prior to performing the input/output operation.
An associated file variable is created by assigning the result of the ASSOC function to a variable. See "ASSOC" (IDL Reference Guide) for details.
Example of Using Associated Input/Output
Assume that a file named data.dat exists, and that this file contains a series of 10 x 20 arrays of floating-point data. The following two IDL statements open the file and create an associated file variable mapped to the file:
;Open the file. OPENU, 1, 'data.dat' ;Make a file variable. Using the NOZERO keyword with FLTARR ;increases efficiency. A = ASSOC(1, FLTARR(10, 20, /NOZERO))
The order of these two statements is not important—it would be equally valid to call ASSOC first, and then open the file. This is because the association is between the variable and the logical file unit, not the file itself. It is also legitimate to close the file, open a new file using the same LUN, and then use the associated variable without first executing a new ASSOC. Naturally, an error occurs if the file is not open when the file variable is subscripted in an expression or if the file is open for the wrong type of access (for example, trying to assign to an associated file variable linked with a file opened for read-only access).
As a result of executing the two statements above, the variable A is now an associated file variable. Executing the statement,
gives the following response:
The associated variable A maps the structure of a 10 x 20, floating-point array onto the contents of the file data.dat. Thus, the response from the HELP procedure shows it as having the structure of a two-dimensional array. An associated file variable only performs input/output to the file when it is subscripted. Thus, the following two IDL statements do not cause input/output to happen:
This assignment does not transfer data from the file to variable B because A is not subscripted. Instead, B becomes an associated file variable with the same structure, and to the same logical file unit, as A.
This assignment does not result in the value 23 being transferred to the file because variable B (which became a file variable in the previous statement) is not subscripted. Instead, B becomes a scalar integer variable containing the value 23. It is no longer an associated file variable.
Reading Data from Associated Files
Once a variable has been associated with a file, data are read from the file whenever the associated variable appears in an expression with a subscript. The position of the array or structure read from the file is given by the value of the subscript. The following IDL statements assume that the associated file variable A is defined as in the previous section, and give some examples of using file variables:
;Copy the contents of the first array into normal variable Z. Z is ;now a 10 x 20, floating-point array. Z = A[0] ;Form the sum of the first 10 arrays. (Z was initialized in the ;previous statement to the value of the first array. This statement ;adds the following nine to it.) Note the use of the compound ;operator += to avoid creating a new copy of Z each time we add a ;new array. FOR I = 1, 9 DO Z += A[I] ;Read fourth array and plot it. PLOT, A[3] ;Subtract array four from array five, and plot the result. The ;result of the subtraction is then discarded. PLOT, A[5] - A[4]
Writing Data to Associated Files
When a subscripted associated variable appears on the left side of an assignment statement, the expression on the right side is written into the file at the given array position:
;Sets sixth record to zero. A[5] = FLTARR(10, 20) ;Write ARR into sixth record after any necessary type conversions. A[5] = ARR ;Averages records J and J+1, and writes the result into record J. A[J] = (A[J] + A[J + 1])/2
Multiple Subscripts With Associated File Variables
Usually, when subscripts are used with associated file variables, only a single subscript is present, specifying an array within the associated file. This is the most efficient way to access associated file variables. However, IDL allows you to specify individual elements within the selected array using multiple subscripts. When multiple subscripts are present with an associated file variable, the rightmost subscript selects the array within the file, and the other subscripts specify the specific element within that array.
For example, consider the following statement using the variable A defined above:
This statement assigns the value of element [0,0] of the second array within the file to the variable Z. The rightmost subscript is interpreted as the subscript of the array within the file, causing IDL to read the entire array into memory. This resulting array expression is then further subscripted by the remaining subscripts.
Similarly, the statement:
assigns the value 45 to element [2,3] of the fifth array within the file. When a file variable is referenced, the last (and possibly only) subscript denoting the element within that array must be a simple subscript. Other subscripts and subscript ranges, except the last, have the same meaning as when used with normal array variables.
An implicit extraction of an element or subarray in a data record can also be performed. For example:
; Variable A associates the file open on unit 1 with the records of ;200-element, floating-point vectors. A = ASSOC(1, FLTARR(200)) ; Then, X is set to the first 100 points of record number 2, the ; third record of the file. X = A[0:99, 2] ; Set the 24th point of record 16 to 12. A[23, 16] = 12 ; Increment points 10 to 199 of record 12. Points 0 to 9 of the ; record remain unchanged. A[10, 12] = A[10:*, 12]+1
Note
Although the ability to directly refer to array elements within an associated file can be convenient, it can also be very slow because every access to an array element causes the entire array to be transferred to or from memory. Unless only one operation on the array is required, it is faster to assign the contents of the array to a normal variable by subscripting the file variable with a single subscript, and then access the individual array elements within the normal variable as needed. If you make changes to the value of the normal variable that should be reflected in the file, a final assignment to the associated variable, indexed with a single subscript, can be used to update the file and complete the operation.
Files with Multiple Structures
The same file may be associated with a number of different structures. Assume a number of 128 x 128-byte images are contained on a file. The statement,
will map the file into rows of 128 bytes each. ROW[3] is the fourth row of the first image, while ROW[128] is the first row of the second image. The statement,
maps the file into entire images; IMAGE[4] will be the fifth image.
Offset Parameter
The Offset parameter to ASSOC specifies the position in the file at which the first array starts. This parameter is useful when a file contains a header followed by data records. For example, if a file uses the first 1,024 bytes of the file to contain header information, followed by 512 x 512-byte images, the statement,
sets the variable IMAGE to access the images while skipping the header.
Efficiency
Arrays are accessed most efficiently if their length is an integer multiple of the block size of the filesystem holding the file. Common values are powers of 2, such as 512, 2K (2048), 4K (4096), or 8K (8192) bytes. For example, on a disk with 512-byte blocks, one benchmark program required approximately one-eighth of the time required to read a 512 x 512-byte image that started and ended on a block boundary, as compared to a similar program that read an image that was not stored on even block boundaries.
Each time a subscripted associated variable is referenced, one or more records are read from or written to the file. Therefore, if a record is to be accessed more than a few times, it is more efficient to read the entire record into a variable. After making the required changes to the in-memory variable, it can be written back to the file if necessary.
Unformatted Data from UNIX FORTRAN Programs
Unformatted data files generated by FORTRAN programs under UNIX contain an extra long word before and after each logical record in the file. ASSOC does not interpret these extra bytes but considers them to be part of the data. This is true even if the F77_UNFORMATTED keyword is specified on the OPEN statement. Therefore, ASSOC should not be used with such files. Instead, such files should be processed using READU and WRITEU. An example of using IDL to read such data is given in Using Unformatted Input/Output.