Silhouette Score
1. Explain Silhouette Score.
The Silhouette coefficient is a measure of how well-defined the clusters are in a clustering algorithm.
2. Explain the Silhouette Score of 1, 0 and -1
- 1 suggests that the clusters are well apart from each other and clearly distinguished.
- 0 indicates that the clusters are indifferent or the distance between clusters is not significant.
- -1 suggests that the data points may be misclassified.
3. Given the Silhouette Score formula $S_i=\frac{(b_i - a_i)} {max(a_i, b_i)}$, what does variable $b_i$ represents? Describe the implications of having small or large values for $b_i$
$b_i$ represents the mean distance between a sample and all other points in the next nearest cluster.
- If $b$ is small then the sample is tightly clustered with data points in a different cluster.
- If $b$ is large then the sample is far away from other data points in a different cluster, which is a good sign of separation.
4. Given the Silhouette Score formula, $S_i=\frac{(b_i - a_i)} {max(a_i, b_i)}$, what does variable $a_i$ represents? Describe the implications of having small or large values for $a_i$
$a_i$ represents the mean distance between a sample and all other points in the same class.
- If $a$ is small then the sample is tightly clustered with other points in the same cluster, indicating good cohesion.
- If $a$ is large then the sample is far away from other points in the same cluster.