Group By

Group By Node

Type

transform

Class

fire.nodes.etl.NodeGroupBy

Fields

Name

Title

Description

Aggregation Setting

Aggregation Setting

groupingCols

Grouping Columns

Grouping Columns

aggregateCols

Aggregate Columns

Aggregate Columns

aggregateOperations

Aggregate Operation

Aggregate Operation

outputColNames

Output Column Names

Output Column Names, default value is aggregateOperation_aggregateCol.

Filter Setting

Filter Setting

whereClause

Where Clause

where condition before group by function

havingClause

Having Clause

having condition after group by function

Details

Group By Details

Aggregation Settings

This node groups row values based on categorical columns selected by the user and then calculates aggregate statistics of the grouped columns.

The Grouping Columns allows the user to select which columns to group rows by, and the Variables List allows the user to select which aggregate statistics will be generated.

Filter Settings

The Filter Settings allow the user to provide additional clauses before and after the data is aggregated.

The Where Clause allows the user to filter the data before it is aggregated, and the Having Clause allows the user to filter the data after it has been aggregated.

Both the Where and Having Clauses are similar in use to those that exist in many forms of SQL.

Examples

Incoming Dataframe has following rows:

EMP_CD    |    EMP_NAME    |    LOCATION    |    DEPT         |    SALARY
-----------------------------------------------------------------------------
E01       |    DAVID       |    NEW YORK    |    HR           |    10000
E02       |    JOHN        |    NEW JERSEY  |    SALES        |    11000
E03       |    MARTIN      |    NEW YORK    |    MARKETING    |    12000
E04       |    TONY        |    NEW JERSEY  |    MARKETING    |    13000
E05       |    ROSS        |    NEW YORK    |    FRONT DESK   |    10000
E06       |    LISA        |    NEW JERSEY  |    FRONT DESK   |    11000
E07       |    PAUL        |    NEW YORK    |    MAINTENANCE  |    12000
E08       |    MARK        |    NEW JERSEY  |    MAINTENANCE  |    13000

if GroupBy node is configured as below:

GROUPING COLUMNS : DEPT

AGGREGATE COLUMNS    |    AGGREGATE OPERATION
-------------------------------------------------
EMP_CD               |    COUNT
SALARY               |    SUM

then outgoing Dataframe would be created as below after performing specified aggregation

Count of Employees and Summation of Salary all Employees is computed for each [DEPT]:

DEPT           |    count_emp_cd    |    sum_salary
----------------------------------------------------------
FRONT DESK     |    2               |    21000
MARKETING      |    2               |    25000
HR             |    1               |    10000
SALES          |    1               |    11000
MAINTENANCE    |    2               |    25000

if [WHERE CLAUSE] is entered as [DEPT = ‘HR’] then outgoing Dataframe would consists of data only from HR department.

if [HAVING CLAUSE] is entered as [COUNT(*) > 1] then outgoing Dataframe would consists of data for Department where count of Employees is more than 1.