<P> The correct choice of k is often ambiguous, with interpretations depending on the shape and scale of the distribution of points in a data set and the desired clustering resolution of the user . In addition, increasing k without penalty will always reduce the amount of error in the resulting clustering, to the extreme case of zero error if each data point is considered its own cluster (i.e., when k equals the number of data points, n). Intuitively then, the optimal choice of k will strike a balance between maximum compression of the data using a single cluster, and maximum accuracy by assigning each data point to its own cluster . If an appropriate value of k is not apparent from prior knowledge of the properties of the data set, it must be chosen somehow . There are several categories of methods for making this decision . </P> <P> The elbow method looks at the percentage of variance explained as a function of the number of clusters: One should choose a number of clusters so that adding another cluster doesn't give much better modeling of the data . More precisely, if one plots the percentage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph . The number of clusters is chosen at this point, hence the "elbow criterion". This "elbow" cannot always be unambiguously identified . Percentage of variance explained is the ratio of the between - group variance to the total variance, also known as an F - test . A slight variation of this method plots the curvature of the within group variance . </P> <P> The method can be traced to speculation by Robert L. Thorndike in 1953 . </P> <P> In statistics and data mining, X-means clustering is a variation of k - means clustering that refines cluster assignments by repeatedly attempting subdivision, and keeping the best resulting splits, until a criterion such as the Akaike information criterion (AIC) or Bayesian information criterion (BIC) is reached . </P>

How to choose the number of clusters in k means