String Indexer

StringIndexer encodes a string column of labels to a column of label indices

Input

It takes in a DataFrame and transforms it to another DataFrame

Output

It adds a new column containing the encoding of the string column of labels to a column of label indices, to the incoming DataFrame.

Type

ml-transformer

Class

fire.nodes.etl.NodeStringIndexer

Fields

Name

Title

Description

inputCol

Input Columns

Input column

outputCol

Output Column

Output column

Details

String Indexer Node Details

The String Indexer Node is used to encode a string column of labels to a column of label indices. It takes in a DataFrame and transforms it to another DataFrame by adding a new column containing the encoding of the string column of labels to a column of label indices.

It takes in the parameters handleInvalid, inputCol and outputCol, which are used to handle invalid entries, input column name and output column name respectively.

Input Parameters

HANDLE INVALID: Select whether to skip or throw error on invalid entries.

INPUT COLUMN: Select the required column for encoding.

OUTPUT COLUMN: The name of the output column after encoding.

Examples

String Indexer Node Example

Consider the below String Indexer output for the color column

id color encoded_color

0 red 2

1 green 1

2 blue 0

3 purple 3

In this example, the input column is color and the output column is encoded_color. The string indexer encodes the color column to a column of label indices. The handleInvalid is set to skip, so any invalid entries will be skipped.