Multivariate clustering & classification

Normal mixture models
Several codes are available that characterize multivariate datasets as mixtures of Gaussian populations via likelihood methods, often using Bayesian principles. They include: EMMIX by G. McLachlan, MCLUST by C. Fraley and A. Raftery, and AutoClass C by P. Cheeseman, and Snob by D. Dowe. P
Clustering algorithm based on dynamic altering of hierarchies. (P)
Fast Algorithm for Classification Trees"
Tree-structures classification similar to CART. (P)
Machine Learning Library in C++ (MLC++)
Data mining and multivariate classification package including data manipulation, variety of categorizers (on attributes, thresholds, nearest neighbor, perceptron, decision tree ), induction algorithms, and visualization tools of data and trees. (P)
R Package
Package in Pascal developed for ecological spatio-temporal multivariate datasets based on monograph by L. & P. Legendre (1983). Functionalities include autocorrelation using correlograms (Moran's I and Geary's c indices), hierarchical agglomerative clustering, k-means clustering, chronological clustering for multivariate time series, analysis of variance, geometrical connectors, (nearest neighbor, Gabriel's connection, Delaunay triangulation), Mantel's two-sample statistic, multidimensional scaling by principal coordinates analysis, univariate periodogram. (P)
Library of several dozen subroutines from NIST for multivariate clustering algorithm from 1975 monobraph by J. A. Hartigan.
Multivariate data analysis software
Collection of subroutines for principal components analysis, partitioning, hierarchical clustering. discriminant analyses (linear, multiple, k-nearest neighbors), correspondence analysis, multidimensional scaling, Sammon mapping, Kohonen self-organizing feature map.
Cluster analysis
Six programs computing dissimilarities, partitioning using medoids, k-medoid clustering, fuzzy clustering, agglomerative and divisive hierarchical clustering, clustering of binary data.
Average-linkage hierarchical clustering.
Agglomerative hierarchical clustering with a variety of cluster shape criteria.
Random Forest
Advancement on CART that separates objects into classes under a wide range of circumstances: unknown number of classes, non-Gaussian shapes, redundant variables. Includes density estimation, variable importance, and measure of outliers. From Leo Breiman, UC Berkeley.
Hierarchical clustering
Algorithm for agglomerative clustering using various criteria (Ward's minimum variance, single linkage, average linkage, complete linkage, McQuitty's method, median method, centroid method).
AS 15 ,
Algorithm for single-linkage and minimum intra-cluster variance clustering.
AS 58
Algorithm for single-linkage and minimum intra-cluster variance clustering.
k-means clustering ,
k-means clustering minimizing intra-cluster variance.
Classification Society of North America (CSNA)
Metasite with many links to classification meetings, journals, discussion groups, commercial and on-line software.
Software for clustering and multivariate analysis
Metasite with discriptions of on-line programs and packages.

Return to StatCodes homepage