Evaluate Clustering Algorithms

Data Science With Chris

The difference in Performance Measurement between Supervised and Unsupervised Learning

The performance measurement for supervised learning algorithms is simple because the evaluation can be done by comparing the prediction against the labels. However, for an unsupervised learning problem, there are no labels and therefore also no ground truth. Therefore we need other evaluation methods to determine how well our clustering algorithm performs. First, let’s start to find out what a good clustering algorithm is.

A good clustering algorithm has two characteristics
1) A clustering algorithm has a small within-cluster variance. Therefore all data points in a cluster are similar to each other.
2) Also a good clustering algorithm has a large between-cluster variance and therefore clusters are dissimilar to other clusters.

All clustering performance measurements are based on these two characteristics. Generally, there are two types of evaluation metrics for clustering,