K-Means Clustering Fast AF

3 min readDec 19, 2022

K-means clustering is a popular machine learning algorithm that is used for clustering data points into groups based on their similarity. It is a simple yet powerful algorithm that is widely used in a variety of applications, including image and text classification, customer segmentation, and anomaly detection.

In this tutorial, we will learn how to implement k-means clustering in Swift using the popular machine learning library, Core ML. We will start by introducing the concepts of k-means clustering and how it works, then we will implement the algorithm in Swift and test it on a dataset.

Understanding k-means clustering

K-means clustering is an iterative algorithm that divides a set of data points into k clusters, where k is the number of clusters you want to create. The algorithm starts by selecting k initial centroids, which are the center points of the clusters. Then, it assigns each data point to the cluster with the closest centroid. After all the data points have been assigned, the algorithm updates the centroids based on the mean of the data points in each cluster. This process is repeated until the centroids converge, meaning they do not change anymore.

The main advantage of k-means clustering is that it is fast and easy to implement. However, it does have some limitations. One of the main limitations is that it requires you to specify the number of clusters beforehand, which may not always be known. Another limitation is that it can get stuck in local minima, meaning it may not find the global optimal solution.

Implementing k-means clustering in Swift

To implement k-means clustering in Swift, we will use the KMeans class from the Core ML library. This class provides an easy-to-use interface for training and evaluating k-means models.

First, let’s start by importing the necessary libraries:

import CoreML
import Foundation

Next, we need to create a KMeans model and set the number of clusters we want to create. We can do this as follows:

let model = KMeans()
model.clusterCount = 3

Now, we need to prepare the data for training. We will use the MNIST dataset, which is a popular dataset for testing machine learning algorithms. The MNIST dataset consists of 28x28 grayscale images of handwritten digits from 0 to 9. We will use the first 1000 images for training and the remaining images for testing.

To prepare the data, we will create a KMeansInput object and fill it with the data points we want to cluster. The KMeansInput object takes an array of MLMultiArray objects as input, where each MLMultiArray represents a data point.

let data = MNIST().trainingData[0...999]
var input = KMeansInput()
input.points = data.map { $0.features }

Now that we have prepared the data, we can train the model by calling the fit() method on the KMeans object and passing it the input data.

let model = try model.fit(input: input)

The fit() method returns a trained KMeans model that we can use to predict the cluster for new data points. To do this, we can call the predict() method on the model and pass it a KMeansInput object with the data points we want to predict. The predict() method returns a KMeansOutput object with the predicted clusters for each data point.

let testData = MNIST().testData[0...999]
var testInput = KMeansInput()
testInput.points = testData.map { $0.features }
let predictions = try model.prediction(input: testInput)
let predictedClusters = predictions.clusters

Now that we have the predicted clusters for each data point, we can evaluate the model’s performance by comparing the predicted clusters to the true labels of the test data. We can use metrics such as accuracy and F1 score to measure the model’s performance.

let trueLabels = testData.map { $0.label }
let accuracy = trueLabels.enumerated().reduce(0) {
    $0 + (predictedClusters[$1.offset] == trueLabels[$1.offset] ? 1 : 0)
} / Double(trueLabels.count)

print("Accuracy: \(accuracy)")

Conclusion

In this tutorial, we learned how to implement k-means clustering in Swift using the Core ML library. We covered the basic concepts of k-means clustering and how it works, and we implemented the algorithm in Swift using the KMeans class. We also tested the model on the MNIST dataset and evaluated its performance using accuracy as a metric.

If you want to learn more about k-means clustering and the Core ML library, you can check out the following resources:

K-Means Clustering Fast AF

Understanding k-means clustering

Implementing k-means clustering in Swift

Conclusion

Written by Swift Fast AF

No responses yet