An improvement on the k-means clustering technique known as “k-medoids” solves some of its drawbacks, most notably how it handles noise and outliers. The actual data points (medoids) are used as cluster representations in k-medoids as opposed to the mean (centroid) of the data points within a cluster. A cluster’s medoid is the data point that minimizes the total distances to all other points in the cluster.
Steps in K-medoids:
- Initialization:
- Select K initial data points as the initial medoids.
- Assignment:
- Assign each data point to the cluster represented by the closest medoid.
- Update Medoids:
- For each cluster, choose the data point that minimizes the sum of distances to all other points as the new medoid.
- Repeat:
- Iterate the assignment and medoid update steps until convergence.