Group By¶

Group By Node

Type¶

transform

Class¶

fire.nodes.etl.NodeGroupBy

Fields¶

Name	Title	Description
Aggregation Setting	Aggregation Setting
groupingCols	Grouping Columns	Grouping Columns
aggregateCols	Aggregate Columns	Aggregate Columns
aggregateOperations	Aggregate Operation	Aggregate Operation
outputColNames	Output Column Names	Output Column Names, default value is aggregateOperation_aggregateCol.
Filter Setting	Filter Setting
whereClause	Where Clause	where condition before group by function
havingClause	Having Clause	having condition after group by function

Details¶

Group By Details¶

Aggregation Settings¶

This node groups row values based on categorical columns selected by the user and then calculates aggregate statistics of the grouped columns.

The Grouping Columns allows the user to select which columns to group rows by, and the Variables List allows the user to select which aggregate statistics will be generated.

Filter Settings¶

The Filter Settings allow the user to provide additional clauses before and after the data is aggregated.

The Where Clause allows the user to filter the data before it is aggregated, and the Having Clause allows the user to filter the data after it has been aggregated.

Both the Where and Having Clauses are similar in use to those that exist in many forms of SQL.

Examples¶

Incoming Dataframe has following rows:

EMP_CD    |    EMP_NAME    |    LOCATION    |    DEPT         |    SALARY
-----------------------------------------------------------------------------
E01       |    DAVID       |    NEW YORK    |    HR           |    10000
E02       |    JOHN        |    NEW JERSEY  |    SALES        |    11000
E03       |    MARTIN      |    NEW YORK    |    MARKETING    |    12000
E04       |    TONY        |    NEW JERSEY  |    MARKETING    |    13000
E05       |    ROSS        |    NEW YORK    |    FRONT DESK   |    10000
E06       |    LISA        |    NEW JERSEY  |    FRONT DESK   |    11000
E07       |    PAUL        |    NEW YORK    |    MAINTENANCE  |    12000
E08       |    MARK        |    NEW JERSEY  |    MAINTENANCE  |    13000

if GroupBy node is configured as below:

GROUPING COLUMNS : DEPT

AGGREGATE COLUMNS    |    AGGREGATE OPERATION
-------------------------------------------------
EMP_CD               |    COUNT
SALARY               |    SUM

then outgoing Dataframe would be created as below after performing specified aggregation

Count of Employees and Summation of Salary all Employees is computed for each [DEPT]:

DEPT           |    count_emp_cd    |    sum_salary
----------------------------------------------------------
FRONT DESK     |    2               |    21000
MARKETING      |    2               |    25000
HR             |    1               |    10000
SALES          |    1               |    11000
MAINTENANCE    |    2               |    25000

if [WHERE CLAUSE] is entered as [DEPT = ‘HR’] then outgoing Dataframe would consists of data only from HR department.

if [HAVING CLAUSE] is entered as [COUNT(*) > 1] then outgoing Dataframe would consists of data for Department where count of Employees is more than 1.