For information on converting to InterSystems IRIS,
see the InterSystems IRIS Adoption Guide
and the InterSystems IRIS In-Place Conversion Guide,
both available on the WRC Distributions page (login required).
This class provides a base class for implementation for different Cluster Analysis algorithms.
It defines storage for clustering models and provides methods to retrieve information about data and clustering.
Cluster analysis or clustering is the assignment of a set of observations into subsets
(called clusters) so that observations in the same cluster are similar in some sense.
Clustering is a method of unsupervised learning, and a common technique for statistical
data analysis used in many fields, including machine learning, data mining, pattern recognition,
image analysis, information retrieval, and bioinformatics.
By Default model data is stored in ^CacheTemp globals.
Returns an object that can calculate an index used in Cluster Validation
and determining the optimal number of clusters.
This method returns Pearson-Gamma index which is a correlation coefficient
between distance between two points and a binary function whether they
belong to the same cluster. This index is useful when clustering is used
for dimension reduction i.e. the process of reducing the number of
random variables under consideration
method GlobalCentroid(Output z)
Returns the coordinates for the centroid for the whole dataset.
Coordinates are returned as multidimensional value: z(1), z(2), ..., z(dim)
Sets the data to be associated with this model. The method takes 3 arguments:
rs - is a result set that provides the data. The first column returned by the result set
is assumed to be a unique Id of teh record. It is not used in any clustering algorithms but can be retrieved
by the application to identify the record. It can be a database %ID or any other value that
makes sense to the application. Other columns provide numerical values for the coordinates of the record
that are used by clustering algorithms.
Result Set must contain at least dim + 1 columns.
dim - The dimensionality of the model, i.e. the number of the coordinates
consumed by clustering algorithm.
nullReplacement - Optional, of specified this is a numeric replacement for empty values.