Group By =========== Group By Node Type --------- transform Class --------- fire.nodes.etl.NodeGroupBy Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - Aggregation Setting - Aggregation Setting - * - groupingCols - Grouping Columns - Grouping Columns * - aggregateCols - Aggregate Columns - Aggregate Columns * - aggregateOperations - Aggregate Operation - Aggregate Operation * - outputColNames - Output Column Names - Output Column Names, default value is aggregateOperation_aggregateCol. * - Filter Setting - Filter Setting - * - whereClause - Where Clause - where condition before group by function * - havingClause - Having Clause - having condition after group by function Details ------- Group By Details +++++++++++++++ Aggregation Settings +++++++++++++++ This node groups row values based on categorical columns selected by the user and then calculates aggregate statistics of the grouped columns. The Grouping Columns allows the user to select which columns to group rows by, and the Variables List allows the user to select which aggregate statistics will be generated. Filter Settings +++++++++++++++ The Filter Settings allow the user to provide additional clauses before and after the data is aggregated. The Where Clause allows the user to filter the data before it is aggregated, and the Having Clause allows the user to filter the data after it has been aggregated. Both the Where and Having Clauses are similar in use to those that exist in many forms of SQL. Examples ------- Incoming Dataframe has following rows: :: EMP_CD | EMP_NAME | LOCATION | DEPT | SALARY ----------------------------------------------------------------------------- E01 | DAVID | NEW YORK | HR | 10000 E02 | JOHN | NEW JERSEY | SALES | 11000 E03 | MARTIN | NEW YORK | MARKETING | 12000 E04 | TONY | NEW JERSEY | MARKETING | 13000 E05 | ROSS | NEW YORK | FRONT DESK | 10000 E06 | LISA | NEW JERSEY | FRONT DESK | 11000 E07 | PAUL | NEW YORK | MAINTENANCE | 12000 E08 | MARK | NEW JERSEY | MAINTENANCE | 13000 if GroupBy node is configured as below: GROUPING COLUMNS : DEPT :: AGGREGATE COLUMNS | AGGREGATE OPERATION ------------------------------------------------- EMP_CD | COUNT SALARY | SUM then outgoing Dataframe would be created as below after performing specified aggregation Count of Employees and Summation of Salary all Employees is computed for each [DEPT]: :: DEPT | count_emp_cd | sum_salary ---------------------------------------------------------- FRONT DESK | 2 | 21000 MARKETING | 2 | 25000 HR | 1 | 10000 SALES | 1 | 11000 MAINTENANCE | 2 | 25000 if [WHERE CLAUSE] is entered as [DEPT = 'HR'] then outgoing Dataframe would consists of data only from HR department. if [HAVING CLAUSE] is entered as [COUNT(*) > 1] then outgoing Dataframe would consists of data for Department where count of Employees is more than 1.