K-Means¶
K-means clustering with support for k-means initialization proposed by Bahmani et al
Input¶
It takes in a DataFrame as input and performs K-Means clustering
Output¶
The input DataFrame is passed along to the next Processors
Type¶
ml-estimator
Class¶
fire.nodes.ml.NodeKMeans
Fields¶
Name |
Title |
Description |
|---|---|---|
modelIdentifier |
Model Identifier |
modelIdentifier starts with $loop & columns names separated with underscore. Example: $loop_columnName1_columnName2. |
featuresCol |
Features Column |
Features column of type vectorUDT for model fitting. |
k |
K |
The number of clusters to create. |
maxIter |
Max Iterations |
The maximum number of iterations. |
predictionCol |
Prediction Column |
The prediction column created during model scoring. |
seed |
Seed |
Random Seed. |
tol |
Tolerence |
The convergence tolerance for iterative algorithms. |
initMode |
initMode |
The initialization algorithm mode. |
initSteps |
initSteps |
The number of steps for the k-means initialization mode. It will be ignored when other initialization modes are chosen. |
distanceMeasure |
distanceMeasure |
Trait for shared param distanceMeasure |
weightCol |
Weight Column |
Weight Column |
Details¶
k-means is one of the most commonly used clustering algorithms that clusters the data points into a predefined number of clusters. The MLlib implementation includes a parallelized variant of the k-means++ method called kmeans||.
KMeans is implemented as an Estimator and generates a KMeansModel as the base model.
More details are available at Apache Spark ML docs page:
https://spark.apache.org/docs/latest/ml-clustering.html#k-means
Examples¶
Below example is available at : https://spark.apache.org/docs/latest/ml-clustering.html#k-means
import org.apache.spark.ml.clustering.KMeans
import org.apache.spark.ml.evaluation.ClusteringEvaluator
// Loads data.
val dataset = spark.read.format(“libsvm”).load(“data/mllib/sample_kmeans_data.txt”)
// Trains a k-means model.
val kmeans = new KMeans().setK(2).setSeed(1L)
val model = kmeans.fit(dataset)
// Make predictions
val predictions = model.transform(dataset)
// Evaluate clustering by computing Silhouette score
val evaluator = new ClusteringEvaluator()
val silhouette = evaluator.evaluate(predictions)
println(s”Silhouette with squared euclidean distance = $silhouette”)
// Shows the result.
println(“Cluster Centers: “)
model.clusterCenters.foreach(println)