Gaussian Mixture

This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated mixing weights specifying each’s contribution to the composite.

Input

It takes in a DataFrame as input and performs GaussianMixture clustering

Output

The input DataFrame is passed along to the next Processors

Type

ml-estimator

Class

fire.nodes.ml.NodeGaussianMixture

Fields

Name

Title

Description

featuresCol

Features Column

Features column of type vectorUDT for model fitting.

k

K

The number of clusters to create.

maxIter

Max Iterations

The maximum number of iterations.

predictionCol

Prediction Column

The prediction column created during model scoring.

seed

Seed

Random Seed.

tol

Tolerence

The convergence tolerance for iterative algorithms.

weightCol

Weight Column

Param for weight column name

aggregationDepth

Aggregation Depth

Param for suggested depth for treeAggregate (>= 2)

probabilityCol

Probability Column

Param for Column name for predicted class conditional probabilities.

Details

A Gaussian Mixture Model represents a composite distribution whereby points are drawn from one of k Gaussian sub-distributions, each with its own probability. The spark.ml implementation uses the expectation-maximization algorithm to induce the maximum-likelihood model given a set of samples.

GaussianMixture is implemented as an Estimator and generates a GaussianMixtureModel as the base model.

More details are available at Apache Spark ML docs page:

https://spark.apache.org/docs/latest/ml-clustering.html#gaussian-mixture-model-gmm

Examples

Below example is available at : https://spark.apache.org/docs/latest/ml-clustering.html#gaussian-mixture-model-gmm

import org.apache.spark.ml.clustering.GaussianMixture

// Loads data

val dataset = spark.read.format(“libsvm”).load(“data/mllib/sample_kmeans_data.txt”)

// Trains Gaussian Mixture Model

val gmm = new GaussianMixture()

.setK(2)

val model = gmm.fit(dataset)

// output parameters of mixture model model

for (i <- 0 until model.getK) {

println(s”Gaussian $i:\nweight=${model.weights(i)}\n” +

s”mu=${model.gaussians(i).mean}\nsigma=\n${model.gaussians(i).cov}\n”)

}