Gaussian Mixture¶
This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated mixing weights specifying each’s contribution to the composite.
Input¶
It takes in a DataFrame as input and performs GaussianMixture clustering
Output¶
The input DataFrame is passed along to the next Processors
Type¶
ml-estimator
Class¶
fire.nodes.ml.NodeGaussianMixture
Fields¶
Name |
Title |
Description |
|---|---|---|
featuresCol |
Features Column |
Features column of type vectorUDT for model fitting. |
k |
K |
The number of clusters to create. |
maxIter |
Max Iterations |
The maximum number of iterations. |
predictionCol |
Prediction Column |
The prediction column created during model scoring. |
seed |
Seed |
Random Seed. |
tol |
Tolerence |
The convergence tolerance for iterative algorithms. |
weightCol |
Weight Column |
Param for weight column name |
aggregationDepth |
Aggregation Depth |
Param for suggested depth for treeAggregate (>= 2) |
probabilityCol |
Probability Column |
Param for Column name for predicted class conditional probabilities. |
Details¶
A Gaussian Mixture Model represents a composite distribution whereby points are drawn from one of k Gaussian sub-distributions, each with its own probability. The spark.ml implementation uses the expectation-maximization algorithm to induce the maximum-likelihood model given a set of samples.
GaussianMixture is implemented as an Estimator and generates a GaussianMixtureModel as the base model.
More details are available at Apache Spark ML docs page:
https://spark.apache.org/docs/latest/ml-clustering.html#gaussian-mixture-model-gmm
Examples¶
Below example is available at : https://spark.apache.org/docs/latest/ml-clustering.html#gaussian-mixture-model-gmm
import org.apache.spark.ml.clustering.GaussianMixture
// Loads data
val dataset = spark.read.format(“libsvm”).load(“data/mllib/sample_kmeans_data.txt”)
// Trains Gaussian Mixture Model
val gmm = new GaussianMixture()
.setK(2)
val model = gmm.fit(dataset)
// output parameters of mixture model model
for (i <- 0 until model.getK) {
println(s”Gaussian $i:\nweight=${model.weights(i)}\n” +
s”mu=${model.gaussians(i).mean}\nsigma=\n${model.gaussians(i).cov}\n”)
}