Window Aggregation

This node calculates the moving values of selected functions for the field(input column).

Input

It accepts a DataFrame as input from the previous Node

Output

A new columns is added which contains the results of applying the selected function on the given column of the input DataFrame

Type

transform

Class

fire.nodes.etl.NodeMovingWindowFunctions

Fields

Name

Title

Description

partitionCol

Partition Column Name

partition column to split the incoming dataframe for the sliding/window operation

orderCol

Order Column Name

the order of the selected column for the sliding/window operation

inputCols

Input Columns

input column name for calc

functions

Functions

advanced

Advanced

windowStart

Window Start

value to be used to calculate the window from

windowEnd

Window End

value to be used to calculate the window to

Details

This node Generates a new Dataframe with Moving Window Function based computed Column appended to the incoming Dataframe.

New Column is populated with value based on selected Moving Window Function applied on the selected column.

Examples

Incoming Dataframe has following rows:

EMP_CD    |    EMP_NAME    |    DEPT    |    SALARY    |    AGE
------------------------------------------------------------------------
E01       |    ANTHONY     |    HR      |    50000     |    40
E02       |    LISA        |    HR      |    50000     |    35
E03       |    MARTIN      |    HR      |    20000     |    25
E04       |    DAVID       |    SALES   |    55000     |    40
E05       |    MARK        |    SALES   |    60000     |    45
E06       |    JOE         |    SALES   |    40000     |    25
E07       |    BELLA       |    HR      |    60000     |    24

If MovingWindowFunctions node is configured as below:

WINDOW START : -1

WINDOW END : 1

PARTITION COLUMN NAME : DEPT

ORDER COLUMN NAME : SALARY

INPUT COLUMNS : SALARY

FUNCTIONS : AVG

Where window for each row is created from 1 row preceeding it upto 1 row succeeding it.

The current incoming Dataframe is partition by [DEPT] and data is sorted by [SALARY].

New column created would be populated with [AVG] of [SALARY] for rows present in a window within a partition.

Outgoing Dataframe would be created as below :

EMP_CD    |    EMP_NAME    |    DEPT    |    SALARY    |    AGE    |    mean_SALARY
---------------------------------------------------------------------------------------
E03       |    MARTIN      |    HR      |    20000     |    25     |    35000.0
E01       |    ANTHONY     |    HR      |    50000     |    40     |    40000.0
E02       |    LISA        |    HR      |    50000     |    35     |    53333.333
E07       |    BELLA       |    HR      |    60000     |    24     |    55000.0
E06       |    JOE         |    SALES   |    40000     |    25     |    47500.0
E04       |    DAVID       |    SALES   |    55000     |    40     |    51666.667
E05       |    MARK        |    SALES   |    60000     |    45     |    57500.0