weka.clusterers
Class ClusterEvaluation
java.lang.Object
weka.clusterers.ClusterEvaluation
- All Implemented Interfaces:
- java.io.Serializable
- public class ClusterEvaluation
- extends java.lang.Object
- implements java.io.Serializable
Class for evaluating clustering models.
Valid options are:
-t
Specify the training file.
-T
Specify the test file to apply clusterer to.
-d
Specify output file.
-l
Specifiy input file.
-p
Output predictions. Predictions are for the training file if only the
training file is specified, otherwise they are for the test file. The range
specifies attribute values to be output with the predictions.
Use '-p 0' for none.
-x
Set the number of folds for a cross validation of the training data.
Cross validation can only be done for distribution clusterers and will
be performed if the test file is missing.
-c
Set the class attribute. If set, then class based evaluation of clustering
is performed.
- Version:
- $Revision: 1.27 $
- Author:
- Mark Hall (mhall@cs.waikato.ac.nz)
- See Also:
- Serialized Form
Method Summary |
java.lang.String |
clusterResultsToString()
return the results of clustering. |
static double |
crossValidateModel(DensityBasedClusterer clusterer,
Instances data,
int numFolds,
java.util.Random random)
Perform a cross-validation for DensityBasedClusterer on a set of instances. |
static java.lang.String |
crossValidateModel(java.lang.String clustererString,
Instances data,
int numFolds,
java.lang.String[] options,
java.util.Random random)
Performs a cross-validation
for a DensityBasedClusterer clusterer on a set of instances. |
static java.lang.String |
evaluateClusterer(Clusterer clusterer,
java.lang.String[] options)
Evaluates a clusterer with the options given in an array of
strings. |
void |
evaluateClusterer(Instances test)
Evaluate the clusterer on a set of instances. |
int[] |
getClassesToClusters()
Return the array (ordered by cluster number) of minimum error class to
cluster mappings |
double[] |
getClusterAssignments()
Return an array of cluster assignments corresponding to the most
recent set of instances clustered. |
double |
getLogLikelihood()
Return the log likelihood corresponding to the most recent
set of instances clustered. |
int |
getNumClusters()
Return the number of clusters found for the most recent call to
evaluateClusterer |
static void |
main(java.lang.String[] args)
Main method for testing this class. |
void |
setClusterer(Clusterer clusterer)
set the clusterer |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ClusterEvaluation
public ClusterEvaluation()
- Constructor. Sets defaults for each member variable. Default Clusterer
is EM.
setClusterer
public void setClusterer(Clusterer clusterer)
- set the clusterer
- Parameters:
clusterer
- the clusterer to use
clusterResultsToString
public java.lang.String clusterResultsToString()
- return the results of clustering.
- Returns:
- a string detailing the results of clustering a data set
getNumClusters
public int getNumClusters()
- Return the number of clusters found for the most recent call to
evaluateClusterer
- Returns:
- the number of clusters found
getClusterAssignments
public double[] getClusterAssignments()
- Return an array of cluster assignments corresponding to the most
recent set of instances clustered.
- Returns:
- an array of cluster assignments
getClassesToClusters
public int[] getClassesToClusters()
- Return the array (ordered by cluster number) of minimum error class to
cluster mappings
- Returns:
- an array of class to cluster mappings
getLogLikelihood
public double getLogLikelihood()
- Return the log likelihood corresponding to the most recent
set of instances clustered.
- Returns:
- a
double
value
evaluateClusterer
public void evaluateClusterer(Instances test)
throws java.lang.Exception
- Evaluate the clusterer on a set of instances. Calculates clustering
statistics and stores cluster assigments for the instances in
m_clusterAssignments
- Parameters:
test
- the set of instances to cluster
- Throws:
java.lang.Exception
- if something goes wrong
evaluateClusterer
public static java.lang.String evaluateClusterer(Clusterer clusterer,
java.lang.String[] options)
throws java.lang.Exception
- Evaluates a clusterer with the options given in an array of
strings. It takes the string indicated by "-t" as training file, the
string indicated by "-T" as test file.
If the test file is missing, a stratified ten-fold
cross-validation is performed (distribution clusterers only).
Using "-x" you can change the number of
folds to be used, and using "-s" the random seed.
If the "-p" option is present it outputs the classification for
each test instance. If you provide the name of an object file using
"-l", a clusterer will be loaded from the given file. If you provide the
name of an object file using "-d", the clusterer built from the
training data will be saved to the given file.
- Parameters:
clusterer
- machine learning clustereroptions
- the array of string containing the options
- Returns:
- a string describing the results
- Throws:
java.lang.Exception
- if model could not be evaluated successfully
crossValidateModel
public static double crossValidateModel(DensityBasedClusterer clusterer,
Instances data,
int numFolds,
java.util.Random random)
throws java.lang.Exception
- Perform a cross-validation for DensityBasedClusterer on a set of instances.
- Parameters:
clusterer
- the clusterer to usedata
- the training datanumFolds
- number of folds of cross validation to performrandom
- random number seed for cross-validation
- Returns:
- the cross-validated log-likelihood
- Throws:
java.lang.Exception
- if an error occurs
crossValidateModel
public static java.lang.String crossValidateModel(java.lang.String clustererString,
Instances data,
int numFolds,
java.lang.String[] options,
java.util.Random random)
throws java.lang.Exception
- Performs a cross-validation
for a DensityBasedClusterer clusterer on a set of instances.
- Parameters:
clustererString
- a string naming the class of the clustererdata
- the data on which the cross-validation is to be
performednumFolds
- the number of folds for the cross-validationoptions
- the options to the clustererrandom
- a random number generator
- Returns:
- a string containing the cross validated log likelihood
- Throws:
java.lang.Exception
- if a clusterer could not be generated
main
public static void main(java.lang.String[] args)
- Main method for testing this class.
- Parameters:
args
- the options