IMSL_EXACT_NETWORK

Syntax | Return Value | Arguments | Keywords | Discussion | Example | Errors | Version History

The IMSL_EXACT_NETWORK function computes Fisher exact probabilities and a hybrid approximation of the Fisher exact method for a two-way contingency table using the network algorithm.

Note
This routine requires an IDL Advanced Math and Stats license. For more information, contact your ITT Visual Information Solutions sales or technical support representative.

Syntax

Result = IMSL_EXACT_NETWORK(table [, APPROX_PARAMS=array]
[, /DOUBLE] [, /NO_APPROX] [, P_VALUE=variable] [, PROB_TABLE=variable] [, WK_PARAMS=array])

Return Value

The p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where "extreme" is taken in the Neyman-Pearson sense. The p-value is "two-sided".

Arguments

table

Two-dimensional array containing the observed counts in the contingency table.

Keywords

APPROX_PARAMS

One-dimensional array of size 3. Approx_Params(0) is the expected value used in the hybrid approximation to Fisher's exact test algorithm for deciding when to use asymptotic probabilities when computing path lengths. Approx_Params(1) is the percentage of remaining cells that must have estimated expected values greater than Approx_Params(0) before asymptotic probabilities can be used in computing path lengths. Approx_Params(2) is the minimum cell estimated value allowed for asymptotic chi-squared probabilities to be used.

Asymptotic probabilities are used in computing path lengths whenever Approx_Params(1) or more of the cells in the table have estimated expected values of Approx_Params(0) or more, with no cell having expected value less than Approx_Params(2). See the Discussion section for details.

Defaults: Approx_Params(0) = 5.0

Approx_Params(1) = 80.0

Approx_Params(2) = 1.0

Note
These defaults correspond to the "Cochran" condition.

DOUBLE

If present and nonzero, double precision is used.

NO_APPROX

If present and nonzero, the Fisher exact test is used and Approx_Param is ignored.

P_VALUE

Named variable into which the p-value for independence of rows and columns is stored. The p-value represents the probability of a more extreme table where "extreme" is in the Neyman-Pearson sense. The P_Value is "two-sided". The p-value is also returned in functional form (see Returned Value).

A table is more extreme if its probability (for fixed marginals) is less than or equal to Prob_Table.

PROB_TABLE

Named variable into which the probability of the observed table occurring given that the null hypothesis of independent rows and columns is true is stored.

WK_PARAMS

One-dimensional array of size 3. The network algorithm requires a large amount of workspace. Some of the workspace requirements are well-defined, while most of the workspace requirements can only be estimated. The estimate is based primarily on table size.

The IMSL_EXACT_ENUM function allocates a default amount of workspace suitable for small problems. If the algorithm determines that this initial allocation of workspace is inadequate, the memory is freed, a larger amount of memory allocated (twice as much as the previous allocation), and the network algorithm is re-started. The algorithm allows for up to Wk_Params(2) attempts to complete the algorithm.

Because each attempt requires computer time, it is suggested that Wk_Params(0) and Wk_Params(1) be set to some large numbers (like 1,000 and 30,000) if the problem to be solved is large. It is suggested that Wk_Params(1) be 30 times larger than Wk_Params(0). Although IMSL_EXACT_ENUM will eventually work its way up to a large enough memory allocation, it is quicker to allocate enough memory initially.

The known (well-defined) workspace requirements are as follows: Define f·· = ΣΣfij equal to the sum of all cell frequencies in the observed table, nt = f·· + 1, mx = max (n_rows, n_columns), mn = min (n_rows, n_columns), t1 = max (800 + 7mx, (5 + 2mx) (n_rows + n_columns + 1) ), and t2 = max(400 + mx, + 1, n_rows + n_columns + 1) where n_rows = N_ELEMENTS(table(*,0)) and n_columns = N_ELEMENTS(table(0,*)).

The following amount of integer workspace is allocated: 3mx + 2mn + t1.

The following amount of real workspace is allocated: nt + t2.

The remainder of workspace that is required must be estimated and allocated based on Wk_Params(0) and Wk_Params(1). The amount of integer workspace allocated is 6n (Wk_Params(0) + Wk_Params(1)). The amount of real workspace allocated is n (6*Wk_Params(0) + 2* Wk_Params(1)). Variable n is the index for the attempt, 1 < n Wk_Params(2).

Defaults: Wk_Params(0) = 100

Wk_Params(1) = 3000

Wk_Params(2) = 10

Discussion

The IMSL_EXACT_NETWORK function computes Fisher exact probabilities or a hybrid algorithm approximation to Fisher exact probabilities for an r by c contingency table with fixed row and column marginals (a marginal is the number of counts in a row or column), where r = n_rows and c = n_columns. Let fij denote the count in row i and column j of a table, and let fi and f·j denote the row and column marginals. Under the hypothesis of independence, the (conditional) probability of the fixed marginals of the observed table is given by:

IMSL_EXACT_NETWORK-28.jpg

where f·· is the total number of counts in the table. Pf corresponds to output keyword Prob_Table.

A "more extreme" table X is defined in the probabilistic sense as more extreme than the observed table if the conditional probability computed for table X (for the same marginal sums) is less than the conditional probability computed for the observed table. Note that this definition can be considered "two-sided" in the cell counts.

Example

This example demonstrates various methods of computing chi-squared p-value with respect to accuracy. As seen in the output of this example, the Fisher exact probability and the usual asymptotic chi-squared probability (generated using IMSL_CONTINGENCY) can be different.

.RUN 
PRO print_results, p, p2, p3, p4 
   PRINT, 'Asymptotic Chi-Squared p-value' 
   PRINT, 'p-value =', p 
   PRINT, 'Network Algorithm with Approximation' 
   PRINT, 'p-value =', p2 
   PRINT, 'Network Algorithm without Approximation' 
   PRINT, 'p-value =', p3 
   PRINT, 'Total Enumeration Method' 
   PRINT, 'p-value =', p4 
END 
 
table = TRANSPOSE([[20, 20, 0, 0, 0], [10, 10, 2, 2, 1], $ 
   [20, 20, 0, 0, 0]]) 
p  = IMSL_CONTINGENCY(table) 
p2 = IMSL_EXACT_NETWORK(table) 
p3 = IMSL_EXACT_NETWORK(table, /NO_APPROX) 
p4 = IMSL_EXACT_ENUM(table) 
print_results, p, p2, p3, p4 
 
Asymptotic Chi-Squared p-value 
p-value =    0.0322604 
Network Algorithm with Approximation 
p-value =    0.0601165 
Network Algorithm without Approximation 
p-value =    0.0598085 
Total Enumeration Method 
p-value =    0.0597294 

Errors

Warning Errors

STAT_HASH_TABLE_ERROR_2The value "ldkey" = # is too small. "ldkey" is calculated as Wk_Params(0)*pow(10, N_Attempts-1) ending this execution attempt.

STAT_HASH_TABLE_ERROR_3The value "ldstp" = # is too small. "ldstp" is calculated as Wk_Params(1)*pow(10, N_Attempts-1) ending this execution attempt.

Fatal Errors

STAT_HASH_TABLE_ERROR_1The hash table key cannot be computed because the largest key is larger than the largest representable integer. The algorithm cannot proceed.

Version History

6.4

Introduced