API Reference

Utility functions
KMeans / KMeans++ Clustering Algorithm
Bisecting KMeans Clustering Algorithm
Distributional Clustering Method
Full list of available functions

Utility functions

All functions used by all algorithms.

Cluster.init_centroids — Method

init_centroids(X::Matrix{Float64}, K::Int64, mode::Symbol)

Initializes centroids for the clustering algorithm based on the specified mode.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
K::Int64: Number of clusters.
mode::Symbol: Initialization mode, either :random or :kmeanspp.

Output

Returns a matrix of initialized centroid coordinates.

Algorithm

If mode is :random:
- Randomly select K data points from X as initial centroids.
If mode is :kmeanspp:
- Initialize the first centroid randomly.
- For each subsequent centroid: a. Compute the distance from each data point to the nearest centroid. b. Select the next centroid with probability proportional to the squared distance.

Examples

julia> X = rand(100, 2)
julia> centroids = init_centroids(X, 3, :kmeanspp)
3×2 Matrix{Float64}:
 0.386814  0.619566
 0.170768  0.0176449
 0.38688   0.398064

source

KMeans / KMeans++ Clustering Algorithm

Initializes centroids using either random selection or KMeans++.
Iteratively assigns points to the nearest centroid.
Updates centroids based on the mean of assigned points.
Stops when centroids converge or after a maximum number of iterations.

References:

Scikit-Learn KMeans Documentation

Cluster.KMeans — Method

KMeans(; k::Int=3, mode::Symbol=:kmeanspp, max_try::Int=100, tol::Float64=1e-4)

Constructor for the KMeans struct.

Input

k::Int: Number of clusters (default: 3).
mode::Symbol: Initialization mode, either :random or :kmeanspp (default: :kmeanspp).
max_try::Int: Maximum number of iterations (default: 100).
tol::Float64: Tolerance for convergence (default: 1e-4).

Output

Returns an instance of KMeans.

Examples

julia> model = KMeans(k=3, mode=:kmeanspp, max_try=100, tol=1e-4)
KMeans(3, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[])

source

Cluster.KMeans — Type

mutable struct KMeans

A mutable struct for the KMeans clustering algorithm.

Fields

k::Int: Number of clusters.
mode::Symbol: Initialization mode, either :random or :kmeanspp.
max_try::Int: Maximum number of iterations.
tol::Float64: Tolerance for convergence.
centroids::Array{Float64,2}: Matrix of centroid coordinates.
labels::Array{Int,1}: Vector of labels for each data point.

Examples

julia> model = KMeans(k=3, mode=:kmeanspp, max_try=100, tol=1e-4)
KMeans(3, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[])

source

Cluster.assign_center — Method

assign_center(D::Matrix{Float64})

Assigns each data point to the nearest centroid based on the distance matrix D.

Input

D::Matrix{Float64}: Distance matrix where element (i, j) is the distance between the i-th data point and the j-th centroid.

Output

Returns a vector of labels where each element is the index of the nearest centroid for the corresponding data point.

Examples

julia> D = rand(100, 3)
julia> labels = assign_center(D)

source

Cluster.compute_distance — Method

compute_distance(X::Matrix{Float64}, centroids::Matrix{Float64})

Computes the distance between each data point in X and each centroid.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
centroids::Matrix{Float64}: Matrix of centroid coordinates.

Output

Returns a distance matrix where element (i, j) is the distance between the i-th data point and the j-th centroid.

Examples

julia> X = rand(100, 2)
julia> centroids = rand(3, 2)
julia> D = compute_distance(X, centroids)
100×3 Matrix{Float64}:
 0.181333  0.539578  0.306867
 0.754863  0.48797   0.562147
 0.205116  0.360735  0.127107
 0.154926  0.552747  0.323433
 ⋮
 0.434321  0.321914  0.261909
 0.773258  0.291669  0.513668
 0.607547  0.310411  0.38714

source

Cluster.fit! — Method

fit!(model::KMeans, X::Matrix{Float64})

Fits the KMeans model to the data matrix X.

Input

model::KMeans: An instance of KMeans.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Modifies the model in-place to fit the data.

Algorithm

Initialize centroids.
Iterate up to max_try times: a. Compute distances between data points and centroids. b. Assign each data point to the nearest centroid. c. Update centroids based on the mean of assigned data points. d. Check for convergence based on tol.

Examples

julia> model = KMeans(k=3)
julia> X = rand(100, 2)
julia> fit!(model, X)

source

Cluster.predict — Method

predict(model::KMeans, X::Matrix{Float64})

Predicts the cluster labels for new data points based on the fitted model.

Input

model::KMeans: An instance of KMeans.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Returns a vector of predicted labels for each data point.

Examples

julia> model = KMeans(k=3)
julia> X_train = rand(100, 2)
julia> fit!(model, X_train)
julia> X_test = rand(10, 2)
julia> labels = predict(model, X_test)

source

Cluster.update_centroids — Method

update_centroids(X::Matrix{Float64}, label_vector::Vector{Int64}, model::KMeans)

Updates the centroids based on the current assignment of data points to centroids.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
label_vector::Vector{Int64}: Vector of labels for each data point.
model::KMeans: An instance of KMeans.

Output

Returns a matrix of updated centroid coordinates.

Examples

julia> X = rand(100, 2)
julia> labels = rand(1:3, 100)
julia> model = KMeans(k=3)
julia> centroids = update_centroids(X, labels, model)

source

Bisecting KMeans Clustering Algorithm

Starts with a single cluster containing all data points.
Recursively splits clusters based on the highest SSE until k clusters are obtained.
Uses standard KMeans for cluster splitting.

References:

Bisecting KMeans: An Improved Version of KMeans

Cluster.BKMeans — Method

BKMeans(; k::Int=3, kmeans::KMeans=KMeans(k=2, mode=:kmeanspp))

Constructor for the BKMeans struct.

Input

k::Int: Number of clusters (default: 3).
kmeans::KMeans: An instance of the KMeans struct used for bisecting (default: KMeans(k=2, mode=:kmeanspp)).

Output

Returns an instance of BKMeans.

Examples

julia> kmeans_model = KMeans(k=2, mode=:kmeanspp)
julia> model = BKMeans(k=3, kmeans=kmeans_model)
BKMeans(3, KMeans(2, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[]), Int64[], Matrix{Float64}(undef, 0, 0))

source

Cluster.BKMeans — Type

mutable struct BKMeans

A mutable struct for the Bisecting KMeans clustering algorithm.

Fields

k::Int: Number of clusters.
kmeans::KMeans: An instance of the KMeans struct used for bisecting.
labels::Array{Int,1}: Vector of labels for each data point.
centroids::Matrix{Float64}: Matrix of centroid coordinates.

Examples

julia> kmeans_model = KMeans(k=2, mode=:kmeanspp)
julia> model = BKMeans(k=3, kmeans=kmeans_model)
BKMeans(3, KMeans(2, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[]), Int64[], Matrix{Float64}(undef, 0, 0))

source

Cluster.fit! — Method

fit!(model::BKMeans, X::Matrix{Float64})

Fits the BKMeans model to the data matrix X.

Input

model::BKMeans: An instance of BKMeans.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Modifies the model in-place to fit the data.

Algorithm

Initialize clusters with the entire dataset.
While the number of clusters is less than k: a. Compute the sum of squared errors (SSE) for each cluster. b. Select the cluster with the highest SSE. c. Apply KMeans to bisect the selected cluster. d. Replace the selected cluster with the two resulting clusters.
Assign labels and centroids based on the final clusters.

Examples

julia> kmeans_model = KMeans(k=2, mode=:kmeanspp)
julia> model = BKMeans(k=3, kmeans=kmeans_model)
julia> X = rand(100, 2)
julia> fit!(model, X)

source

Cluster.predict — Method

predict(model::BKMeans, X::Matrix{Float64})

Predicts the cluster labels for new data points based on the fitted BKMeans model.

Input

model::BKMeans: An instance of BKMeans.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Returns a vector of predicted labels for each data point.

Examples

julia> kmeans_model = KMeans(k=2, mode=:kmeanspp)
julia> model = BKMeans(k=3, kmeans=kmeans_model)
julia> X_train = rand(100, 2)
julia> fit!(model, X_train)
julia> X_test = rand(10, 2)
julia> labels = predict(model, X_test)

source

Distributional Clustering Method

References:

Krishna, A., Mak, S. and Joseph, R., 2019. Distributional clustering: A distribution-preserving clustering method. arXiv preprint arXiv:1911.05940

Cluster.DC — Method

DC(; k::Int=3, mode::Symbol=:kmeanspp, max_try::Int=100, tol::Float64=1e-4)

Constructor for the DC struct.

Input

k::Int: Number of clusters (default: 3).
mode::Symbol: Initialization mode, either :random or :kmeanspp(default: :kmeanspp).
max_try::Int: Maximum number of iterations (default: 100).
tol::Float64: Tolerance for convergence (default: 1e-4).

Output

Returns an instance of DC.

Examples

julia> model = DC(k=3, mode=:kmeanspp, max_try=100, tol=1e-4)
DC(3, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[])

source

Cluster.DC — Type

mutable struct DC

A mutable struct for the Density-based Clustering (DC) algorithm.

Fields

k::Int: Number of clusters.
mode::Symbol: Initialization mode, either :random or :kmeanspp.
max_try::Int: Maximum number of iterations.
tol::Float64: Tolerance for convergence.
centroids::Array{Float64,2}: Matrix of centroid coordinates.
labels::Array{Int,1}: Vector of labels for each data point.

Examples

julia> model = DC(k=3, mode=:random, max_try=100, tol=1e-4)
DC(3, :random, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[])

source

Cluster.compute_objective_function — Method

compute_objective_function(X::Matrix{Float64}, centroids::Matrix{Float64}; p=2, delta=0.0001)

Computes the objective function for the DC algorithm.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
centroids::Matrix{Float64}: Matrix of centroid coordinates.
p: Power parameter for the distance metric (default: 2).
delta: Small constant to avoid division by zero (default: 0.0001).

Output

Returns a distance matrix where element (i, j) is the distance between the i-th data point and the j-th centroid.

Examples

julia> X = rand(100, 2)
julia> centroids = rand(3, 2)
julia> D = compute_objective_function(X, centroids)

source

Cluster.fit! — Method

fit!(model::DC, X::Matrix{Float64})

Fits the DC model to the data matrix X.

Input

model::DC: An instance of DC.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Modifies the model in-place to fit the data.

Algorithm

Initialize centroids.
Iterate up to max_try times: a. Compute the objective function between data points and centroids. b. Assign each data point to the nearest centroid. c. Update centroids based on the current assignment. d. Check for convergence based on tol.

Examples

julia> model = DC(k=3)
julia> X = rand(100, 2)
julia> fit!(model, X)

source

Cluster.predict — Method

predict(model::DC, X::Matrix{Float64})

Predicts the cluster labels for new data points based on the fitted DC model.

Input

model::DC: An instance of DC.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Returns a vector of predicted labels for each data point.

Examples

julia> model = DC(k=3)
julia> X_train = rand(100, 2)
julia> fit!(model, X_train)
julia> X_test = rand(10, 2)
julia> labels = predict(model, X_test)

source

Cluster.update_centroids — Method

update_centroids(X::Matrix{Float64}, label_vector::Vector{Int64}, model::DC; delta=0.0001)

Updates the centroids based on the current assignment of data points to centroids.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
label_vector::Vector{Int64}: Vector of labels for each data point.
model::DC: An instance of DC.
delta: Small constant to avoid division by zero (default: 0.0001).

Output

Returns a matrix of updated centroid coordinates.

Examples

julia> X = rand(100, 2)
julia> labels = rand(1:3, 100)
julia> model = DC(k=3)
julia> centroids = update_centroids(X, labels, model)

source

Full list of available functions

Cluster.BKMeans
Cluster.BKMeans
Cluster.DC
Cluster.DC
Cluster.KMeans
Cluster.KMeans
Cluster.assign_center
Cluster.compute_distance
Cluster.compute_objective_function
Cluster.fit!
Cluster.fit!
Cluster.fit!
Cluster.init_centroids
Cluster.predict
Cluster.predict
Cluster.predict
Cluster.update_centroids
Cluster.update_centroids

Cluster.BKMeans — Type

mutable struct BKMeans

A mutable struct for the Bisecting KMeans clustering algorithm.

Fields

k::Int: Number of clusters.
kmeans::KMeans: An instance of the KMeans struct used for bisecting.
labels::Array{Int,1}: Vector of labels for each data point.
centroids::Matrix{Float64}: Matrix of centroid coordinates.

Examples

julia> kmeans_model = KMeans(k=2, mode=:kmeanspp)
julia> model = BKMeans(k=3, kmeans=kmeans_model)
BKMeans(3, KMeans(2, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[]), Int64[], Matrix{Float64}(undef, 0, 0))

source

Cluster.BKMeans — Method

BKMeans(; k::Int=3, kmeans::KMeans=KMeans(k=2, mode=:kmeanspp))

Constructor for the BKMeans struct.

Input

k::Int: Number of clusters (default: 3).
kmeans::KMeans: An instance of the KMeans struct used for bisecting (default: KMeans(k=2, mode=:kmeanspp)).

Output

Returns an instance of BKMeans.

Examples

julia> kmeans_model = KMeans(k=2, mode=:kmeanspp)
julia> model = BKMeans(k=3, kmeans=kmeans_model)
BKMeans(3, KMeans(2, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[]), Int64[], Matrix{Float64}(undef, 0, 0))

source

Cluster.DC — Type

mutable struct DC

A mutable struct for the Density-based Clustering (DC) algorithm.

Fields

k::Int: Number of clusters.
mode::Symbol: Initialization mode, either :random or :kmeanspp.
max_try::Int: Maximum number of iterations.
tol::Float64: Tolerance for convergence.
centroids::Array{Float64,2}: Matrix of centroid coordinates.
labels::Array{Int,1}: Vector of labels for each data point.

Examples

julia> model = DC(k=3, mode=:random, max_try=100, tol=1e-4)
DC(3, :random, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[])

source

Cluster.DC — Method

DC(; k::Int=3, mode::Symbol=:kmeanspp, max_try::Int=100, tol::Float64=1e-4)

Constructor for the DC struct.

Input

k::Int: Number of clusters (default: 3).
mode::Symbol: Initialization mode, either :random or :kmeanspp(default: :kmeanspp).
max_try::Int: Maximum number of iterations (default: 100).
tol::Float64: Tolerance for convergence (default: 1e-4).

Output

Returns an instance of DC.

Examples

julia> model = DC(k=3, mode=:kmeanspp, max_try=100, tol=1e-4)
DC(3, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[])

source

Cluster.KMeans — Type

mutable struct KMeans

A mutable struct for the KMeans clustering algorithm.

Fields

k::Int: Number of clusters.
mode::Symbol: Initialization mode, either :random or :kmeanspp.
max_try::Int: Maximum number of iterations.
tol::Float64: Tolerance for convergence.
centroids::Array{Float64,2}: Matrix of centroid coordinates.
labels::Array{Int,1}: Vector of labels for each data point.

Examples

julia> model = KMeans(k=3, mode=:kmeanspp, max_try=100, tol=1e-4)
KMeans(3, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[])

source

Cluster.KMeans — Method

KMeans(; k::Int=3, mode::Symbol=:kmeanspp, max_try::Int=100, tol::Float64=1e-4)

Constructor for the KMeans struct.

Input

k::Int: Number of clusters (default: 3).
mode::Symbol: Initialization mode, either :random or :kmeanspp (default: :kmeanspp).
max_try::Int: Maximum number of iterations (default: 100).
tol::Float64: Tolerance for convergence (default: 1e-4).

Output

Returns an instance of KMeans.

Examples

julia> model = KMeans(k=3, mode=:kmeanspp, max_try=100, tol=1e-4)
KMeans(3, :kmeanspp, 100, 0.0001, Matrix{Float64}(undef, 0, 0), Int64[])

source

Cluster.assign_center — Method

assign_center(D::Matrix{Float64})

Assigns each data point to the nearest centroid based on the distance matrix D.

Input

D::Matrix{Float64}: Distance matrix where element (i, j) is the distance between the i-th data point and the j-th centroid.

Output

Returns a vector of labels where each element is the index of the nearest centroid for the corresponding data point.

Examples

julia> D = rand(100, 3)
julia> labels = assign_center(D)

source

Cluster.compute_distance — Method

compute_distance(X::Matrix{Float64}, centroids::Matrix{Float64})

Computes the distance between each data point in X and each centroid.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
centroids::Matrix{Float64}: Matrix of centroid coordinates.

Output

Returns a distance matrix where element (i, j) is the distance between the i-th data point and the j-th centroid.

Examples

julia> X = rand(100, 2)
julia> centroids = rand(3, 2)
julia> D = compute_distance(X, centroids)
100×3 Matrix{Float64}:
 0.181333  0.539578  0.306867
 0.754863  0.48797   0.562147
 0.205116  0.360735  0.127107
 0.154926  0.552747  0.323433
 ⋮
 0.434321  0.321914  0.261909
 0.773258  0.291669  0.513668
 0.607547  0.310411  0.38714

source

Cluster.compute_objective_function — Method

compute_objective_function(X::Matrix{Float64}, centroids::Matrix{Float64}; p=2, delta=0.0001)

Computes the objective function for the DC algorithm.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
centroids::Matrix{Float64}: Matrix of centroid coordinates.
p: Power parameter for the distance metric (default: 2).
delta: Small constant to avoid division by zero (default: 0.0001).

Output

Returns a distance matrix where element (i, j) is the distance between the i-th data point and the j-th centroid.

Examples

julia> X = rand(100, 2)
julia> centroids = rand(3, 2)
julia> D = compute_objective_function(X, centroids)

source

Cluster.fit! — Method

fit!(model::BKMeans, X::Matrix{Float64})

Fits the BKMeans model to the data matrix X.

Input

model::BKMeans: An instance of BKMeans.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Modifies the model in-place to fit the data.

Algorithm

Initialize clusters with the entire dataset.
While the number of clusters is less than k: a. Compute the sum of squared errors (SSE) for each cluster. b. Select the cluster with the highest SSE. c. Apply KMeans to bisect the selected cluster. d. Replace the selected cluster with the two resulting clusters.
Assign labels and centroids based on the final clusters.

Examples

julia> kmeans_model = KMeans(k=2, mode=:kmeanspp)
julia> model = BKMeans(k=3, kmeans=kmeans_model)
julia> X = rand(100, 2)
julia> fit!(model, X)

source

Cluster.fit! — Method

fit!(model::DC, X::Matrix{Float64})

Fits the DC model to the data matrix X.

Input

model::DC: An instance of DC.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Modifies the model in-place to fit the data.

Algorithm

Initialize centroids.
Iterate up to max_try times: a. Compute the objective function between data points and centroids. b. Assign each data point to the nearest centroid. c. Update centroids based on the current assignment. d. Check for convergence based on tol.

Examples

julia> model = DC(k=3)
julia> X = rand(100, 2)
julia> fit!(model, X)

source

Cluster.fit! — Method

fit!(model::KMeans, X::Matrix{Float64})

Fits the KMeans model to the data matrix X.

Input

model::KMeans: An instance of KMeans.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Modifies the model in-place to fit the data.

Algorithm

Initialize centroids.
Iterate up to max_try times: a. Compute distances between data points and centroids. b. Assign each data point to the nearest centroid. c. Update centroids based on the mean of assigned data points. d. Check for convergence based on tol.

Examples

julia> model = KMeans(k=3)
julia> X = rand(100, 2)
julia> fit!(model, X)

source

Cluster.init_centroids — Method

init_centroids(X::Matrix{Float64}, K::Int64, mode::Symbol)

Initializes centroids for the clustering algorithm based on the specified mode.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
K::Int64: Number of clusters.
mode::Symbol: Initialization mode, either :random or :kmeanspp.

Output

Returns a matrix of initialized centroid coordinates.

Algorithm

If mode is :random:
- Randomly select K data points from X as initial centroids.
If mode is :kmeanspp:
- Initialize the first centroid randomly.
- For each subsequent centroid: a. Compute the distance from each data point to the nearest centroid. b. Select the next centroid with probability proportional to the squared distance.

Examples

julia> X = rand(100, 2)
julia> centroids = init_centroids(X, 3, :kmeanspp)
3×2 Matrix{Float64}:
 0.386814  0.619566
 0.170768  0.0176449
 0.38688   0.398064

source

Cluster.predict — Method

predict(model::BKMeans, X::Matrix{Float64})

Predicts the cluster labels for new data points based on the fitted BKMeans model.

Input

model::BKMeans: An instance of BKMeans.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Returns a vector of predicted labels for each data point.

Examples

julia> kmeans_model = KMeans(k=2, mode=:kmeanspp)
julia> model = BKMeans(k=3, kmeans=kmeans_model)
julia> X_train = rand(100, 2)
julia> fit!(model, X_train)
julia> X_test = rand(10, 2)
julia> labels = predict(model, X_test)

source

Cluster.predict — Method

predict(model::DC, X::Matrix{Float64})

Predicts the cluster labels for new data points based on the fitted DC model.

Input

model::DC: An instance of DC.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Returns a vector of predicted labels for each data point.

Examples

julia> model = DC(k=3)
julia> X_train = rand(100, 2)
julia> fit!(model, X_train)
julia> X_test = rand(10, 2)
julia> labels = predict(model, X_test)

source

Cluster.predict — Method

predict(model::KMeans, X::Matrix{Float64})

Predicts the cluster labels for new data points based on the fitted model.

Input

model::KMeans: An instance of KMeans.
X::Matrix{Float64}: Data matrix where rows are data points and columns are features.

Output

Returns a vector of predicted labels for each data point.

Examples

julia> model = KMeans(k=3)
julia> X_train = rand(100, 2)
julia> fit!(model, X_train)
julia> X_test = rand(10, 2)
julia> labels = predict(model, X_test)

source

Cluster.update_centroids — Method

update_centroids(X::Matrix{Float64}, label_vector::Vector{Int64}, model::DC; delta=0.0001)

Updates the centroids based on the current assignment of data points to centroids.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
label_vector::Vector{Int64}: Vector of labels for each data point.
model::DC: An instance of DC.
delta: Small constant to avoid division by zero (default: 0.0001).

Output

Returns a matrix of updated centroid coordinates.

Examples

julia> X = rand(100, 2)
julia> labels = rand(1:3, 100)
julia> model = DC(k=3)
julia> centroids = update_centroids(X, labels, model)

source

Cluster.update_centroids — Method

update_centroids(X::Matrix{Float64}, label_vector::Vector{Int64}, model::KMeans)

Updates the centroids based on the current assignment of data points to centroids.

Input

X::Matrix{Float64}: Data matrix where rows are data points and columns are features.
label_vector::Vector{Int64}: Vector of labels for each data point.
model::KMeans: An instance of KMeans.

Output

Returns a matrix of updated centroid coordinates.

Examples

julia> X = rand(100, 2)
julia> labels = rand(1:3, 100)
julia> model = KMeans(k=3)
julia> centroids = update_centroids(X, labels, model)

source