A clustering algorithm called DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clusters data points according to how densely they are arranged in the feature space. DBSCAN may find clusters of any shape, unlike k-means, which assumes that clusters have a spherical shape. It can treat outliers as noise and is especially good at detecting clusters divided by different densities.
Steps in DBSCAN:
- Parameter Selection:
- Choose two parameters:
eps
(epsilon) andmin_samples
. - eps: Radius around a data point to define its neighborhood.
- min_samples: Minimum number of points required to form a dense region.
- Choose two parameters:
- Core Point Identification:
- Identify core points by counting the number of points in the epsilon neighborhood of each data point.
- Cluster Expansion:
- Form clusters by connecting core points that are within each other’s epsilon neighborhood.
- Label Border Points:
- Label border points that are in the epsilon neighborhood of a core point but are not core points themselves.
- Noise Identification:
- Assign noise points that are neither core nor border points.