Spark Pipeline

This node represents Pipeline from Spark ML

Input

It takes in a DataFrame as input.

Output

The incoming DataFrame is passed to the output.

Type

ml-pipeline

Class

fire.nodes.ml.NodePipeline

Fields

Details

This node represents Pipeline from Spark ML.

In machine learning, it is common to run a sequence of algorithms to process and learn from data.

E.g., a simple text document processing workflow might include several stages:

  • Split each document’s text into words.

  • Convert each document’s words into a numerical feature vector.

  • Learn a prediction model using the feature vectors and labels.

MLlib represents such a workflow as a Pipeline, which consists of a sequence of PipelineStages (Transformers and Estimators) to be run in a specific order.

More at Spark MLlib/ML docs page : http://spark.apache.org/docs/latest/ml-pipeline.html#pipeline