Join On Common Columns

This node joins the incoming dataframes on 1 or more columns

Input

It takes in 2 or more DataFrames as input and produces one DataFrame as output by joining on the specified columns

Output

The output DataFrame produced as a result of joining the incoming DataFrames on the specified columns

Type

join

Class

fire.nodes.etl.NodeJoinUsingColumns

Fields

Name

Title

Description

joinCols

Common Join Columns

Space separated list of columns on which to join

joinType

Join Type

Type of Join

whereClause

Where Clause

where condition after join function

schema

InferSchema

outputColNames

Output Column Names

Name of the Output Columns

outputColTypes

Output Column Types

Data Type of the Output Columns

outputColFormats

Output Column Formats

Format of the Output Columns

Details

Join On Common Columns Node Details

This node joins the incoming dataframes using one or more than one common column between the two dataframes.

Join Types supported by this node are as follows

  • inner : The joined table will have records that have matching values in both tables.

  • outer : The joined table contains either all the records from both the tables or fills in NULL values for missing matches on either side.

  • left_outer : Even if there are no matches in the right table it returns all the rows from the left table.

  • right_outer : Even if there are no matches in the left table it returns all the rows from the right table.

  • leftsemi : This gives only those rows in the left table that have a matching row in the right table.

The WHERE CLAUSE section is used to filter any records once the two or more tables have been joined.

Examples

Join On Common Columns Node Example

Incoming Dataframe

1st Incoming dataframe table1 has the following rows:

EMP_CD    |    EMP_NAME    |    DEPT_NO
-------------------------------------------
E01       |    DAVID       |    10
E02       |    JOHN        |    20
E03       |    MARTIN      |    30
E04       |    TONY        |    40

2nd Incoming dataframe table2 has the following rows:

DEPT_NO    |      DEPT_NAME   |    LOC     |   EMP_CD
----------------------------------------------------
10         |      HR          |    IND     |   E01
20         |      SALES       |    AUS     |   E02
40         |      RESEARCH    |    NZ      |   E04
50         |      ADMIN       |    UK      |   E05

Selected common columns for both the dataframes are following

  • From table1 is DEPT_NO and EMP_CD

  • From table2 is DEPT_NO and EMP_CD

When the JOIN condition is inner with a WHERE clause of DEPT_NO = 10 we have

Final Output

EMP_CD    |    EMP_NAME    |    DEPT_NO   |      DEPT_NAME  |    LOC
-------------------------------------------------------------------------------------
E01       |    DAVID       |    10        |      HR         |    IND

When the JOIN condition is inner we have

Final Output

EMP_CD    |    EMP_NAME    |    DEPT_NO   |      DEPT_NAME  |    LOC
-------------------------------------------------------------------------------------
E01       |    DAVID       |    10        |      HR         |    IND
E02       |    JOHN        |    20        |      SALES      |    AUS
E04       |    TONY        |    40        |      RESEARCH   |    NZ

When the Joining condition outer we have

Final Output

EMP_CD    |    EMP_NAME    |    DEPT_NO   |   DEPT_NAME  |    LOC
--------------------------------------------------------------------------------------
E01       |    DAVID       |    10        |   HR         |    IND
E02       |    JOHN        |    20        |   SALES      |    AUS
E03       |    MARTIN      |    30        |              |
E04       |    TONY        |    40        |   RESEARCH   |    NZ
E05       |                |    50        |   ADMIN      |    UK

When the Joining condition is left_outer we have

Final Output

EMP_CD    |    EMP_NAME    |    DEPT_NO   |   DEPT_NAME  |    LOC
--------------------------------------------------------------------------------------
E01       |    DAVID       |    10        |   HR         |    IND
E02       |    JOHN        |    20        |   SALES      |    AUS
E03       |    MARTIN      |    30        |   MARKETING  |    UK
E04       |    TONY        |    40        |              |

When the Joining condition right_outer we have

Final Output

EMP_CD    |    EMP_NAME    |    DEPT_NO   |   DEPT_NAME  |    LOC
--------------------------------------------------------------------------------------
E01       |    DAVID       |    10        |   HR         |    IND
E02       |    JOHN        |    20        |   SALES      |    AUS
E04       |    MARTIN      |    30        |   RESEARCH   |    UK
E05       |                |    50        |   ADMIN      |    NZ