Compare All Columns

Compares 2 incoming DataFrames. Outputs 3 DataFrames (A-B), (B-A), (A intersection B)

Type

join

Class

fire.nodes.etl.NodeCompareAllColumns

Fields

Details

This node takes two Dataframes as input and creates three Dataframes as output.

First Dataframe (A-B) is created with rows that are part of 1st incoming Dataframe but not present in 2nd Dataframe.

Second Dataframe (B-A) is created with rows that are part of 2nd incoming Dataframe but not present in 1st Dataframe.

Third Dataframe (A intersection B) is created with rows that are part of both incoming Dataframes.

Examples

1st Incoming Dataframe has following rows:

EMP_CD    |    EMP_NAME    |    DEPT       |    AGE    |    DATE_OF_JOINING   |    SALARY     |    PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E01       |    DAVID       |    HR         |    25     |    2021-01-01        |    12 000.00  |    GOOD
E02       |    JOHN        |    SALES      |    35     |    2019-05-04        |    11 000.00  |    VERY GOOD
E03       |    MARTIN      |    MARKETING  |    40     |    2018-06-07        |    34 000     |    AVERAGE
E04       |    TONY        |    MARKETING  |    45     |    2017-02-01        |    12 500.00  |    VERY VERY GOOD
E05       |    MARK        |    HR         |    25     |    2020-12-21        |    78 999.00  |    BAD

2nd Incoming Dataframe has following rows:

EMP_CD    |    EMP_NAME    |    DEPT       |    AGE    |    DATE_OF_JOINING   |    SALARY     |    PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E03       |    MARTIN      |    MARKETING  |    40     |    2018-06-07        |    34 000     |    AVERAGE
E04       |    TONY        |    MARKETING  |    45     |    2017-02-01        |    12 500.00  |    VERY VERY GOOD
E05       |    MARK        |    HR         |    25     |    2020-12-21        |    78 999.00  |    BAD
E06       |    ROSS        |    SALES      |    35     |    2010-01-01        |    20 000.00  |    GOOD
E07       |    GAVIN       |    SALES      |    45     |    2020-05-04        |    10 000.00  |    VERY VERY GOOD

after execution of CompareAllColumns node following three outgoing Dataframes would be created:

(A-B) Outgoing Dataframe with rows that are part of 1st incoming Dataframe but not present in 2nd Dataframe

EMP_CD    |    EMP_NAME    |    DEPT       |    AGE    |    DATE_OF_JOINING   |    SALARY     |    PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E01       |    DAVID       |    HR         |    25     |    2021-01-01        |    12 000.00  |    GOOD
E02       |    JOHN        |    SALES      |    35     |    2019-05-04        |    11 000.00  |    VERY GOOD

(B-A) Outgoing Dataframe with rows that are part of 2nd incoming Dataframe but not present in 1st Dataframe

EMP_CD    |    EMP_NAME    |    DEPT       |    AGE    |    DATE_OF_JOINING   |    SALARY     |    PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E06       |    ROSS        |    SALES      |    35     |    2010-01-01        |    20 000.00  |    GOOD
E07       |    GAVIN       |    SALES      |    45     |    2020-05-04        |    10 000.00  |    VERY VERY GOOD

(A insection B) Outgoing Dataframe with rows that are part of both incoming Dataframes

EMP_CD    |    EMP_NAME    |    DEPT       |    AGE    |    DATE_OF_JOINING   |    SALARY     |    PERFORMANCE
--------------------------------------------------------------------------------------------------------------------
E03       |    MARTIN      |    MARKETING  |    40     |    2018-06-07        |    34 000     |    AVERAGE
E04       |    TONY        |    MARKETING  |    45     |    2017-02-01        |    12 500.00  |    VERY VERY GOOD
E05       |    MARK        |    HR         |    25     |    2020-12-21        |    78 999.00  |    BAD