Compare Specific Columns¶
Compares 2 incoming DataFrames on specific columns. Outputs 3 DataFrames (A-B), (B-A), (A intersection B)
Type¶
join
Class¶
fire.nodes.etl.NodeCompareSpecificColumns
Fields¶
Name |
Title |
Description |
|---|---|---|
columnsToCompare |
Columns to Compare |
Columns to be used in the comparison |
Details¶
This node takes two Dataframes as input, compares them on specific columns and creates three Dataframes as output.
First Dataframe (A-B) is created with rows of 1st Dataframe in which values in the specified columns don’t have matching entries in same set of columns in 2nd Dataframe.
Second Dataframe (B-A) is created with rows of 2nd Dataframe in which values in the specified columns don’t have matching entries in same set of columns in 1st Dataframe.
Third Dataframe (A intersection B) is created with rows common in both incoming Dataframes in which values in the specified columns have matching entries in same set of columns.
Examples¶
1st Incoming Dataframe has following rows:
EMP_CD | EMP_NAME | DEPT | AGE | DATE_OF_JOINING | SALARY | PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E01 | DAVID | HR | 25 | 2021-01-01 | 12 000.00 | GOOD
E02 | JOHN | SALES | 35 | 2019-05-04 | 11 000.00 | VERY GOOD
E03 | MARTIN | MARKETING | 40 | 2018-06-07 | 34 000 | AVERAGE
E04 | TONY | MARKETING | 45 | 2017-02-01 | 12 500.00 | VERY VERY GOOD
E05 | MARK | SALES | 25 | 2020-12-21 | 78 999.00 | BAD
2nd Incoming Dataframe has following rows:
EMP_CD | EMP_NAME | DEPT | AGE | DATE_OF_JOINING | SALARY | PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E03 | MARTIN | MARKETING | 40 | 2018-06-07 | 34 000 | AVERAGE
E04 | TONY | MARKETING | 45 | 2017-02-01 | 12 500.00 | VERY VERY GOOD
E05 | MARK | HR | 25 | 2020-12-21 | 78 999.00 | BAD
E06 | ROSS | FRONT DESK | 35 | 2010-01-01 | 20 000.00 | GOOD
E07 | GAVIN | MAINTENANCE| 45 | 2020-05-04 | 10 000.00 | VERY VERY GOOD
E08 | LISA | FRONT DESK | 40 | 2015-05-04 | 12 000.00 | VERY GOOD
if CompareSpecificColumns node is configured to compare incoming Dataframes on [DEPT] column then outgoing Dataframes would be created as below:
(A-B) Outgoing Dataframe with rows of 1st Dataframe in which values in the specified columns don’t have matching entries in same set of columns in 2nd Dataframe.¶
EMP_CD | EMP_NAME | DEPT | AGE | DATE_OF_JOINING | SALARY | PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E02 | JOHN | SALES | 35 | 2019-05-04 | 11 000.00 | VERY GOOD
E05 | MARK | SALES | 25 | 2020-12-21 | 78 999.00 | BAD
(B-A) Outgoing Dataframe with rows of 2nd Dataframe in which values in the specified columns don’t have matching entries in same set of columns in 1st Dataframe.¶
EMP_CD | EMP_NAME | DEPT | AGE | DATE_OF_JOINING | SALARY | PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E06 | ROSS | FRONT DESK | 35 | 2010-01-01 | 20 000.00 | GOOD
E07 | GAVIN | MAINTENANCE| 45 | 2020-05-04 | 10 000.00 | VERY VERY GOOD
E08 | LISA | FRONT DESK | 40 | 2015-05-04 | 12 000.00 | VERY GOOD
(A insection B) Outgoing Dataframe with rows common in both incoming Dataframes in which values in the specified columns have matching entries in same set of columns.¶
EMP_CD | EMP_NAME | DEPT | AGE | DATE_OF_JOINING | SALARY | PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E01 | DAVID | HR | 25 | 2021-01-01 | 12 000.00 | GOOD
E03 | MARTIN | MARKETING | 40 | 2018-06-07 | 34 000 | AVERAGE
E04 | TONY | MARKETING | 45 | 2017-02-01 | 12 500.00 | VERY VERY GOOD
E03 | MARTIN | MARKETING | 40 | 2018-06-07 | 34 000 | AVERAGE
E04 | TONY | MARKETING | 45 | 2017-02-01 | 12 500.00 | VERY VERY GOOD
E05 | MARK | HR | 25 | 2020-12-21 | 78 999.00 | BAD