Group By¶
Group By Node
Type¶
transform
Class¶
fire.nodes.etl.NodeGroupBy
Fields¶
Name |
Title |
Description |
|---|---|---|
Aggregation Setting |
Aggregation Setting |
|
groupingCols |
Grouping Columns |
Grouping Columns |
aggregateCols |
Aggregate Columns |
Aggregate Columns |
aggregateOperations |
Aggregate Operation |
Aggregate Operation |
outputColNames |
Output Column Names |
Output Column Names, default value is aggregateOperation_aggregateCol. |
Filter Setting |
Filter Setting |
|
whereClause |
Where Clause |
where condition before group by function |
havingClause |
Having Clause |
having condition after group by function |
Details¶
Group By Details¶
Aggregation Settings¶
This node groups row values based on categorical columns selected by the user and then calculates aggregate statistics of the grouped columns.
The Grouping Columns allows the user to select which columns to group rows by, and the Variables List allows the user to select which aggregate statistics will be generated.
Filter Settings¶
The Filter Settings allow the user to provide additional clauses before and after the data is aggregated.
The Where Clause allows the user to filter the data before it is aggregated, and the Having Clause allows the user to filter the data after it has been aggregated.
Both the Where and Having Clauses are similar in use to those that exist in many forms of SQL.
Examples¶
Incoming Dataframe has following rows:
EMP_CD | EMP_NAME | LOCATION | DEPT | SALARY
-----------------------------------------------------------------------------
E01 | DAVID | NEW YORK | HR | 10000
E02 | JOHN | NEW JERSEY | SALES | 11000
E03 | MARTIN | NEW YORK | MARKETING | 12000
E04 | TONY | NEW JERSEY | MARKETING | 13000
E05 | ROSS | NEW YORK | FRONT DESK | 10000
E06 | LISA | NEW JERSEY | FRONT DESK | 11000
E07 | PAUL | NEW YORK | MAINTENANCE | 12000
E08 | MARK | NEW JERSEY | MAINTENANCE | 13000
if GroupBy node is configured as below:
GROUPING COLUMNS : DEPT
AGGREGATE COLUMNS | AGGREGATE OPERATION
-------------------------------------------------
EMP_CD | COUNT
SALARY | SUM
then outgoing Dataframe would be created as below after performing specified aggregation
Count of Employees and Summation of Salary all Employees is computed for each [DEPT]:
DEPT | count_emp_cd | sum_salary
----------------------------------------------------------
FRONT DESK | 2 | 21000
MARKETING | 2 | 25000
HR | 1 | 10000
SALES | 1 | 11000
MAINTENANCE | 2 | 25000
if [WHERE CLAUSE] is entered as [DEPT = ‘HR’] then outgoing Dataframe would consists of data only from HR department.
if [HAVING CLAUSE] is entered as [COUNT(*) > 1] then outgoing Dataframe would consists of data for Department where count of Employees is more than 1.