CLUSTER_TREE
Syntax | Return Value | Arguments | Keywords | Example | Version History
The CLUSTER_TREE function computes the hierarchical clustering for a set of m items in an n-dimensional space. The CLUSTER_TREE function is designed to be used with the DENDROGRAM or DENDRO_PLOT procedures.
This routine is written in the IDL language. Its source code can be found in the file cluster_tree.pro in the lib subdirectory of the IDL distribution.
Syntax
Result = CLUSTER_TREE( Pairdistance, Linkdistance [, LINKAGE = value] )
or for LINKAGE = 3 (centroid):
Result = CLUSTER_TREE( Pairdistance, Linkdistance, LINKAGE = 3, [, DATA = array] [, MEASURE=value] [, POWER_MEASURE=value] )
Return Value
The Result is a 2-by-(m-1) integer array containing the cluster indices. Each row of Result contains the indices of the two items that were clustered together. The distance between the two items is contained in the corresponding element of the Linkdistance output argument.
Note
The original m items are given indices 0...m-1, while each newly-created cluster is given a new index starting at m and incrementing.
Arguments
Pairdistance
An input array containing the pairwise distances as either a compact vector or as a symmetric matrix, usually created by the DISTANCE_MEASURE function. For the compact vector form, Pairdistance should be an m*(m-1)/2 element vector, ordered as: [D0, 1, D0, 2, ..., D0, m-1, D1, 2, ..., Dm-2, m-1], where Di, j denotes the distance between items i and j. For the matrix form, Pairdistance should be an m-by-m symmetric matrix with zeroes down the diagonal.
Linkdistance
Set this argument to a named variable in which the cluster distances will be returned as an (m-1)-element single or double-precision vector. Each element of Linkdistance corresponds to the distance between the two items of the corresponding row in Result. If Pairdistance is double-precision then Linkdistance will be double-precision, otherwise Linkdistance will be single-precision.
Keywords
DATA
If the LINKAGE keyword is set equal to 3 (centroid), then the DATA keyword must be set to the array of original data as input to the DISTANCE_MEASURE function. The data array is necessary for computing the centroid of newly-created clusters.
Note
DATA does not need to be supplied if LINKAGE is not equal to 3.
LINKAGE
Set this keyword to an integer giving the method used for linking clusters together. Possible values are:
Note
If the LINKAGE keyword is equal to 3, the distance between two clusters may be less than the distance between items within one of the clusters. In a dendrogram plot this will cause the node lines to overlap.
MEASURE
If the LINKAGE keyword is equal to 3 (centroid), set this keyword to an integer giving the distance measure (the metric) to use. Possible values are:
|
Value
|
Type
|
|---|---|
|
|
(Default) Euclidean distance |
|
|
CityBlock (Manhattan) distance |
|
|
Chebyshev distance |
|
|
Correlative distance |
|
|
Percent disagreement |
For consistent results, the MEASURE value should match the value used in the original call to DISTANCE_MEASURE. This keyword is ignored if LINKAGE is not equal to 3, or if POWER_MEASURE is set.
Note
See DISTANCE_MEASURE for a detailed description of the various metrics.
POWER_MEASURE
If the LINKAGE keyword is equal to 3 (centroid), set this keyword to a scalar or a two-element vector giving the parameters p and r to be used in the power distance metric. If POWER_MEASURE is a scalar then the same value is used for both p and r. For consistent results, the POWER_MEASURE value should match the value used in the original call to DISTANCE_MEASURE. This keyword is ignored if LINKAGE is not equal to 3.
Note
See DISTANCE_MEASURE for a detailed description of the power distance metric.
Example
When this code is run, IDL prints:
Items 5 and 3 are joined to create a new cluster, which is given the item number of 7. Items 2 and 1 are joined to create a cluster with item number 8. The process continues until all items have been joined together. A graphical representation is shown below (for clarity the last cluster, between items 9 and 11, has been omitted):
Version History
See Also
DENDRO_PLOT, DENDROGRAM, DISTANCE_MEASURE
