Minimal Spanning Tree and Multivariate Planing

DESCRIPTION:
Returns a vector or a list which gives the minimal spanning tree and, optionally, multivariate planing information.

USAGE:
mstree(x, plane=T)

REQUIRED ARGUMENTS:
x:
matrix of data where rows correspond to observations, columns to variables. This should be scaled so that values on all variables are roughly comparable (the algorithm computes Euclidean distances from one observation to another). Missing values are not accepted.

OPTIONAL ARGUMENTS:
plane:
logical flag: should multivariate planing and lining information be returned?

VALUE:
if plane=FALSE, a vector as described by mst.

Otherwise, a list (with components named x, y, mst and order) describing the planing, minimal spanning tree, and two versions of lining.

x,y:
coordinates of the observations computed by the Friedman-Rafsky algorithm. That is, a two-dimensional representation of higher dimensional data.
mst:
vector of length nrow(x)-1 describing the edges in the minimal spanning tree. The ith value in this vector is an observation number, indicating that this observation and the ith observation should be linked in the minimal spanning tree.
order:
matrix, of size nrow(x) by 2, giving two types of ordering: The first column presents the standard ordering from one extreme of the minimal spanning tree to the other. This starts on one end of a diameter and numbers the points in a certain order so that points close together in Euclidean space tend to be close in the sequence. The second column presents the radial ordering, based on distance from the center of the minimal spanning tree. These can be used to detect clustering. See below for graph theory definitions.

DETAILS:
Multivariate planing uses the minimal spanning tree to represent multivariate data in two dimensions. The technique can also be used to explore the ability to discriminate between groups. See Friedman and Rafsky (1981) for more details. Results are currently computed to single-precision accuracy only.

BACKGROUND:
A tree in graph theory is a set of n points and n-1 edges (lines) connecting points such that there is a path along edges between any given points. Because there are only n-1 edges, such a path is unique (there are no cycles). A diameter of a tree is a path between two points with a maximal number of edges; a center of a tree is a point as close to the middle of a diameter as possible.

A minimal spanning tree is a tree that has the minimal sum of the lengths of the edges. Such trees are useful in a variety of situations.


REFERENCES:
Friedman, J. H. and Rafsky, L. C. (1981). Graphics for the Multivariate Two-Sample Problem. Journal of the American Statistical Association 76, 277-287.

Reingold, E. M., Nievergelt, J. and Deo, N. (1977). Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, New Jersey.


SEE ALSO:
cmdscale , hclust , discr , napsack .

EXAMPLES:
plot(x, y)   # plot original data
mst <- mstree(cbind(x, y), plane=F)   # minimal spanning tree
   # show tree on plot
segments(x[seq(mst)], y[seq(mst)], x[mst], y[mst])

i <- rbind(iris[,,1], iris[,,2], iris[,,3]) tree <- mstree(i) # multivariate planing plot(tree, type="n") # plot data in plane text(tree, label=c(rep(1, 50), rep(2, 50), rep(3, 50))) # identify points

# get the absolute value stress e distp <- dist(i) dist2 <- dist(cbind(tree$x, tree$y)) e <- sum(abs(distp - dist2))/sum(distp)

# a function for multivariate planing of two samples mult.plane <- function(samp1, samp2, all = T) { if(ncol(samp1)!=ncol(samp2)) stop("Number of columns (variables) must match in mult.plane") tree <- mstree(rbind(samp1, samp2)) rowbreak <- nrow(samp1) plpoin <- c(tree$mst[1:rowbreak] > rowbreak, tree$mst[ - seq(rowbreak)] < rowbreak) plot(tree, type = "n") points(tree$x[seq(rowbreak)], tree$y[seq(rowbreak)], col = 2) points(tree$x[ - seq(rowbreak)], tree$y[ - seq(rowbreak)], col = 3) if(all) segments(tree$x[seq(tree$mst)], tree$y[seq(tree$mst)], tree$x[ tree$mst], tree$y[tree$mst], col = 4) segments(tree$x[plpoin], tree$y[plpoin], tree$x[tree$mst[plpoin]], tree$y[tree$mst[plpoin]]) }

state.ms <- mstree(cbind(state.center$x, state.center$y), plane=F) usa() segments(state.center$x[seq(state.ms)], state.center$y[seq(state.ms)], state.center$x[state.ms], state.center$y[state.ms]) points(state.center, pch=18) title(main="Minimal Spanning Tree of State Centers")