Pivot By

Pivot Node

Type

transform

Class

fire.nodes.etl.NodePivotBy

Fields

Name

Title

Description

aggregate

Aggregate

groupingCols

Grouping Columns

Grouping Columns

aggregateCols

Aggregate Columns

Aggregate Columns

aggregateOperations

Aggregate Operation to use

Aggregate Operation

pivot

Pivot

pivotCol

Pivot Column

Pivoting Column

uniqueValues

UniqueValues

Comma separated unique values: Providing Unique values while performing pivot operation improves the performance of the operation since Spark does not have to first compute the list of distinct values of Pivot Column internally.

schema

InferSchema

outputColNames

Column Names of the Table

Output Columns Names of the Table

outputColTypes

Column Types of the Table

Output Column Types of the Table

outputColFormats

Column Formats

Output Column Formats

Details

This node creates a Dataframe based on the Pivot table created out of the incoming Dataframe.

Pivot table is created by Aggregation of rows by applying the Aggregate functions on the Aggregate Columns against the Grouping and Pivot Columns selected.

Examples

Incoming Dataframe has following rows:

EMP_CD    |    EMP_NAME    |    LOCATION    |    DEPT         |    SALARY
-----------------------------------------------------------------------------
E01       |    DAVID       |    NEW YORK    |    HR           |    10000
E02       |    JOHN        |    NEW JERSEY  |    SALES        |    11000
E03       |    MARTIN      |    NEW YORK    |    MARKETING    |    12000
E04       |    TONY        |    NEW JERSEY  |    MARKETING    |    13000
E05       |    ROSS        |    NEW YORK    |    FRONT DESK   |    10000
E06       |    LISA        |    NEW JERSEY  |    FRONT DESK   |    11000
E07       |    PAUL        |    NEW YORK    |    MAINTENANCE  |    12000
E08       |    MARK        |    NEW JERSEY  |    MAINTENANCE  |    13000

if PivotBy node is configured as below:

GROUPING COLUMNS : DEPT

PIVOT COLUMNS : LOCATION

AGGREGATE COLUMNS    |    AGGREGATE OPERATION
-------------------------------------------------
EMP_CD               |    COUNT

then outgoing Dataframe would be created as below after performing specified aggregation

Count of Employees for each combination of [DEPT] and [LOCATION] would be listed as below:

DEPT         |    NEW JERSEY       |    NEW YORK
---------------------------------------------------
FRONT DESK   |    1                |    1
MARKETING    |    1                |    1
HR           |                     |    1
SALES        |    1                |
MAINTENANCE  |    1                |    1