IMSL_DISCR_ANALYSIS
Syntax | Arguments | Keywords | Discussion | Example | Errors | Version History
The IMSL_DISCR_ANALYSIS procedure performs a linear or a quadratic discriminant function analysis among several known groups.
Note
This routine requires an IDL Advanced Math and Stats license. For more information, contact your ITT Visual Information Solutions sales or technical support representative.
Syntax
IMSL_DISCR_ANALYSIS, x, n_groups [, CLASS_MEMBER=variable] [, CLASS_TABLE=variable] [, COEFFICIENTS=variable] [, COVARIANCES=variable] [, /DOUBLE] [, GROUP_COUNTS=variable] [, IDX_COLS=array] [, IDX_VARS=array] [, METHOD=value] [, /PRIOR_EQUAL] [, PRIOR_INPUT=array] [, PRIOR_OUTPUT=variable] [, /PRIOR_PROP] [, MAHALANOBIS=variable] [, MEANS=variable] [, NMISSING=variable] [, PROB=variable] [, STATS=variable]
Arguments
n_groups
Number of groups in the data.
x
Two-dimensional array of size n_rows by n_variables + 1 containing the data where n_rows = N_ELEMENTS(x(*,0)), the number of rows to be processed and n_variables = number of variables to be used in the discrimination. The first n_variables columns correspond to the variables, and the last column contains the group numbers. The groups must be numbered 1, 2, ..., n_groups.
Keywords
CLASS_MEMBER
Named variable into which an one-dimensional integer array of length n_rows containing the group to which the observation was classified is stored.
If an observation has an invalid group number, frequency, or weight when the leaving-out-one method has been specified, then the observation is not classified and the corresponding elements of Class_Member (and Prob, see Prob below) are set to zero.
CLASS_TABLE
Named variable into which a two-dimensional array of size n_groups by n_groups containing the classification table is stored. Each observation that is classified and has a group number 1.0, 2.0, ..., n_groups is entered into the table. The rows of the table correspond to the known group membership. The columns refer to the group to which the observation was classified.
COEFFICIENTS
Named variable into which a two-dimensional array of size n_groups by (n_variables + 1) containing the linear discriminant coefficients is stored. The first column of Coefficients contains the constant term, and the remaining columns contain the variable coefficients. Row i – 1 of Coefficients corresponds to group i, for i = 1, 2, ..., n_variables + 1. Array Coefficients are always computed as the linear discriminant function coefficients even when quadratic discrimination is specified.
COVARIANCES
Named variable into which a three-dimensional array of size g by n_variables by n_variables containing covariance results is stored. The within-group covariance matrices (Method 1, 2, 4, and 5 only) is the first g-1 matrices, and the pooled covariance matrix is the g-th matrix.
DOUBLE
If present and nonzero, double precision is used.
GROUP_COUNTS
Named variable into which an one-dimensional integer array of length n_groups containing the number of observations in each group is stored.
IDX_COLS
One-dimensional array containing the indices of the variables to be used in the analysis.
IDX_VARS
Three element array indicating the column numbers of x in which particular types of data are stored. Columns are numbered 0 ... N_ELEMENTS(Idx_Cols) - 1.
Idx_Vars(0) contains the index for the column of x in which the group numbers are stored.
Idx_Vars(1) and Idx_Vars(2) contain the column numbers of x in which the frequencies and weights, respectively, are stored. Set Idx_Vars(1) = -1 if there will be no column for frequencies. Set Idx_Vars(2) = -1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.
Defaults: Idx_Cols = 0, 1, ..., n_variables – 1,
Idx_Vars(0) = n_variables,
Idx_Vars(1) = -1, and
Idx_Vars(2) = -1
METHOD
Method of discrimination. The method chosen determines whether linear or quadratic discrimination is used, whether the group covariance matrices are computed (the pooled covariance matrix is always computed), and whether the leaving-out-one or the reclassification method is used to classify each observation. The Method values are shown in Table 21-1.
In the leaving-out-one method of classification, the posterior probabilities are adjusted so as to eliminate the effect of the observation from the sample statistics prior to its classification. In the classification method, the effect of the observation is not eliminated from the classification function. Default: Method = 1
PRIOR_EQUAL
By default, (or if Prior_Equal is used), equal prior probabilities are calculated as 1.0/n_groups. Keywords Prior_Equal, Prior_Prop, and Prior_Input must not be used together.
PRIOR_INPUT
If present, an array of length n_groups containing the prior probabilities for each group, such that the sum of all prior probabilities is equal to 1.0. Keywords Prior_Input, Prior_Equal, and Prior_Prop must not be used together.
PRIOR_OUTPUT
Named variable into which an one-dimensional array of length n_groups containing the most recently calculated or input prior probabilities is stored.
PRIOR_PROP
If present, prior probabilities are calculated to be proportional to the sample size in each group. Keywords Prior_Prop, Prior_Equal, and Prior_Input must not be used together.
MAHALANOBIS
Named variable into which a two-dimensional array of size n_groups by n_groups containing the Mahalanobis distances:
between the group means is stored.
For linear discrimination, the Mahalanobis distance is computed using the pooled covariance matrix. Otherwise, the Mahalanobis distance:
between group means i and j is computed using the within covariance matrix for group i in place of the pooled covariance matrix.
MEANS
Named variable into which a two-dimensional array of size
n_groups by n_variables containing the variable means is stored. The i-th row of means contains the group i variable means.
NMISSING
Named variable into which the number of rows of data encountered containing missing values (NaN) for the classification, group, weight, and/or frequency variables is stored. If a row of data contains a missing value (NaN) for any of these variables, that row is excluded from the computations.
PROB
Named variable into which a two-dimensional array of size n_rows by n_groups containing the posterior probabilities for each observation is stored.
STATS
Named variable into which an one-dimensional array of length 4 + 2 * (n_groups + 1) containing various statistics of interest is stored. The first element of Stats is the sum of the degrees of freedom for the within-covariance matrices. The second, third, and fourth elements of Stats correspond to the chi-squared statistic, its degrees of freedom, and the probability of a greater chi-squared, respectively, of a test of the homogeneity of the within-covariance matrices (not computed if Method is equal to 3 or 6). The fifth through 5 + n_groups elements of Stats contain the log of the determinants of each group's covariance matrix (not computed if Method is equal to 3 or 6) and of the pooled covariance matrix (element 4 + n_groups). Finally, the last n_groups + 1 elements of Stats contain the sum of the weights within each group, and in the last position, the sum of the weights in all groups.
Comments
Discussion
IMSL_DISCR_ANALYSIS performs discriminant function analysis using either linear or quadratic discrimination. The output includes a measure of distance between the groups, a table summarizing the classification results, a matrix containing the posterior probabilities of group membership for each observation, and the within-sample means and covariance matrices. Linear discriminant function coefficients are also computed.
Covariance matrices are defined as follows: Let Ni denote the sum of frequencies of observations in group i and Mi denote the number of observations in group i. Then, if Si denotes the within-group i covariance matrix:

Where wj is the weight of the j-th observation in group i, fj is the frequency, xj is the j-th observation column vector (in group i), and:
denotes the mean vector of the observations in group i. The mean vectors are computed as:

Given the means and the covariance matrices, the linear discriminant function for group i is computed as:
where ln (pi) is the natural log of the prior probability for the i-th group, x is the observation to be classified, and Sp denoted the pooled covariance matrix.
Let S denote either the pooled covariance matrix of one of the within-group covariance matrices Si. (S will be the pooled covariance matrix in linear discrimination, and Si otherwise.) The Mahalanobis distance between group i and group j is computed as:
Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, p. 252):

where ni is the number of degrees of freedom in the i-th sample covariance matrix, k is the number of groups, and:

where p is the number of variables.
The estimated posterior probability of each observation x belonging to group is computed using the prior probabilities and the sample mean vectors and estimated covariance matrices under a multivariate normal assumption. Under quadratic discrimination, the within-group covariance matrices are used to compute the estimated posterior probabilities. The estimated posterior probability of an observation x belonging to group i is:

where:

For the leaving-out-one method of classification (Method equal to 4, 5 or 6), the sample mean vector and sample covariance matrices in the formula for:
are adjusted so as to remove the observation x from their computation. For linear discrimination (Method equal to 1, 2, 4, or 6), the linear discriminant function coefficients are actually used to compute the same posterior probabilities.
Using the posterior probabilities, each observation in x is classified into a group; the result is tabulated in the array Class_Table and saved in the array Class_Member. Array Class_Table is not altered at this stage if x(i)(Idx_Vars(0)) contains a group number that is out of range. If the reclassification method is specified, then all observations with no missing values in the n_variables classification variables are classified. When the leaving-out-one method is used, observations with invalid group numbers, weights, frequencies, or classification variables are not classified. Regardless of the frequency, a 1 is added (or subtracted) from Class_Table for each row of x that is classified and contains a valid group number.
When Method > 3, adjustment is made to the posterior probabilities to remove the effect of the observation in the classification rule. In this adjustment, each observation is presumed to have a weight of x(i)(Idx_Vars(2)) if Idx_Vars(2) > -1 (and a weight of 1.0 if Idx_Vars(2) = -1), and a frequency of 1.0. See Lachenbruch (1975, p. 36) for the required adjustment.
The covariance matrices are computed from their LU factorizations.
Example
The following example uses liner discrimination with equal prior probabilities on Fisher's (1936) iris data.
.RUN PRO print_results, counts, table, d2, prior_out, coef, means, $ cov, stats, nrmiss num = INDGEN(3) PRINT, ' Counts' PRINT, num + 1, FORMAT = '(3I5)' PRINT, counts, FORMAT = '(3I5)' PRINT PRINT, ' Table' PRINT, num + 1, FORMAT = '(2X, 3I5)' FOR i = 0, 2 DO $ PRINT, num(i) + 1, table(i, *), FORMAT = '(I2, 3I5)' PRINT PRINT, ' D2' PRINT, num + 1, FORMAT = '(3I7)' FOR i = 0, 2 DO $ PRINT, num(i) + 1, d2(i, *), FORMAT = '(I2, 3F7.1)' PRINT PRINT, ' Prior OUT' PRINT, num + 1, FORMAT = '(3I10)' PRINT, prior_out, FORMAT = '(3F10.4)' PRINT num = INDGEN(5) PRINT, ' Coef' PRINT, num + 1, FORMAT = '(1X, 5I10) FOR i = 0, 2 DO $ PRINT, num(i) + 1, coef(i, *), FORMAT = '(I2, 5F10.1)' PRINT num = INDGEN(4) PRINT, ' Means' PRINT, num + 1, FORMAT = '(4I10)' FOR i = 0, 2 DO $ PRINT, num(i) + 1, means(i, *), FORMAT = '(I2, 4F10.3)' PRINT PRINT, ' Covariance' PRINT, num + 1, FORMAT = '(4I10)' FOR i = 0, 3 DO $ PRINT, num(i) + 1, cov(0, *, i), FORMAT = '(I2, 4F10.4)' PRINT num = INDGEN(12) PRINT, ' Stats' FOR i = 0, 11 DO $ PRINT, num(i) + 1, stats(i) PRINT PRINT, 'nrmiss = ', nrmiss END idxv = [1, 2, 3, 4] idxc = [0, -1, -1] n_groups = 3 method = 3 ; Retrieve the Fisher Iris Data Set x = IMSL_STATDATA(3) IMSL_DISCR_ANALYSIS, x, n_groups, Idx_Vars = idxv, $ Idx_cols = idxc, Method = method, /Prior_Equal, $ Prior_Output = prior_out, Group_Counts = counts, $ Means = means, Covariances = cov, $ Coefficients = coef, Class_Member = cm, $ Class_Table = table, Prob = prob, $ Mahalanobis = d2, Stats = stats, Nmissing = nrmiss print_results, counts, table, d2, prior_out, coef, means, $ cov, stats, nrmiss Counts 1 2 3 50 50 50 Table 1 2 3 1 50 0 0 2 0 48 2 3 0 1 49 D2 1 2 3 1 0.0 89.9 179.4 2 89.9 0.0 17.2 3 179.4 17.2 0.0 Prior OUT 1 2 3 0.3333 0.3333 0.3333 Coef 1 2 3 4 5 1 -86.3 23.5 23.6 -16.4 -17.4 2 -72.9 15.7 7.1 5.2 6.4 3 -104.4 12.4 3.7 12.8 21.1 Means 1 2 3 4 1 5.006 3.428 1.462 0.246 2 5.936 2.770 4.260 1.326 3 6.588 2.974 5.552 2.026 Covariance 1 2 3 4 1 0.2650 0.0927 0.1675 0.0384 2 0.0927 0.1154 0.0552 0.0327 3 0.1675 0.0552 0.1852 0.0427 4 0.0384 0.0327 0.0427 0.0419 Stats 1 147.000 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 NaN 8 -9.95854 9 50.0000 10 50.0000 11 50.0000 12 150.000 nrmiss = 0
Errors
Warning Errors
STAT_BAD_OBS_1—In call #, row # of the data matrix, "x", has group number = #. The group number must be an integer between 1.0 and "n_groups" = #, inclusively. This observation will be ignored.
STAT_BAD_OBS_2—The leaving-out-one method is specified but this observation does not have a valid group number (Its group number is #.). This observation (row #) is ignored.
STAT_BAD_OBS_3—The leaving-out-one method is specified but this observation does not have a valid weight or it does not have a valid frequency. This observation (row #) is ignored.
STAT_COV_SINGULAR_3—The group # covariance matrix is singular. "Stats(1)" cannot be computed. "Stats(1)" and "Stats(3)" are set to the missing value code (NaN).
Fatal Errors
STAT_COV_SINGULAR_1—The variance-covariance matrix for population number # is singular. The computations cannot continue.
STAT_COV_SINGULAR_2—The pooled variance-covariance matrix is singular. The computations cannot continue.
STAT_COV_SINGULAR_4—A variance-covariance matrix is singular. The index of the first zero element is equal to #.
Version History