K-means clustering is a popular unsupervised machine learning algorithm used for partitioning a dataset into K distinct, non-overlapping clusters. It assigns data points to clusters based on their similarity, with the goal of minimizing the intra-cluster variance (sum of squared distances between data points within the same cluster) and maximizing the inter-cluster variance.
Key Steps in K-means Clustering:
- Initialization:
- Randomly choose K value initial cluster centroids.
- Assignment:
- Assign each data point to the cluster whose centroid is closest (Euclidean distance).
- Update Centroids:
- Recalculate the centroids of each cluster based on the mean of the data points assigned to that cluster.
- Repeat:
- Iterate the assignment and centroid update steps until convergence (when centroids no longer change significantly or a set number of iterations is reached).