String Indexer Advanced

StringIndexer encodes a string column of labels to a column of label indices

Input

It takes in a DataFrame and transforms it to another DataFrame

Output

It adds a new column containing the encoding of the string column of labels to a column of label indices, to the incoming DataFrame.

Type

ml-estimator

Class

fire.nodes.ml.NodeStringIndexerAdvanced

Fields

Name

Title

Description

handleInvalid

Handle Invalid

Invalid entries to be skipped or thrown error

inputCol

Input Column

Input column for encoding

outputCol

Output Column

Output column

Details

String Indexer Advanced Node Details

The String Indexer Advanced Node is used to encode a string column of labels to a column of label indices. It takes in a DataFrame and transforms it to another DataFrame by adding a new column containing the encoding of the string column of labels to a column of label indices.

It takes in the parameters handleInvalid, inputCol and outputCol, which are used to handle invalid entries, input column name and output column name respectively.

Input Parameters

HANDLE INVALID: Select whether to skip or throw error on invalid entries.

INPUT COLUMN: Select the required column for encoding.

OUTPUT COLUMN: The name of the output column after encoding.

Examples

String Indexer Advanced Node Example

Consider the below String Indexer Advanced output for the color column

id color encoded_color

0 red 2

1 green 1

2 blue 0

3 purple 3

In this example, the input column is color and the output column is encoded_color. The string indexer advanced encodes the color column to a column of label indices. The handleInvalid is set to skip, so any invalid entries will be skipped.