Join Advanced

Append fields from a Source DataFrame to every row of a Target DataFrame (Cartesian-style append).

Type

join2inputs

Class

fire.nodes.etl.NodeJoin

Fields

Name

Title

Description

Join

Join

joinBy

Join By

whether to append dataframe as a table to email content

leftTableJoinColumn

LeftTableJoinColumn

rightTableJoinColumn

RightTableJoinColumn

Select

Select

selectLeftFields

Select Left Fields

Columns

dataTypeLeftFields

Data Type Left Fields

New data type (INTEGER, DOUBLE, STRING, LONG, SHORT)

renameLeftFields

Rename Left Fields

rename column

selectRightFields

Select Right Fields

Columns

dataTypeRightFields

Data Type Right Fields

New data type (INTEGER, DOUBLE, STRING, LONG, SHORT)

renameRightFields

Rename Right Fields

rename column

Drop

Drop

inputPropagateSchema

Propagate Input Schema

propagate input schema

dropLeftFields

Drop Left Columns

Drop Columns

dropRightFields

Drop Right Columns

Drop Columns

Details

Join Node Details

This node performs comprehensive join operations between two DataFrames, supporting both position-based and field-based joins with extensive data transformation capabilities. It enables inner joins with anti-join outputs to identify unmatched records for complete data analysis.

The configuration includes:

  • Join Tab: Selection of join type (Position or Specific Fields) and specification of join keys when using field-based joins

  • Select Tab: Independent column selection, renaming, and data type casting for both left and right input DataFrames

  • Drop Tab: Column dropping configuration with schema propagation options

Key features:

  • Dual Output Strategy: Produces inner join results plus left/right anti-joins for unmatched record analysis

  • Conflict Resolution: Automatically handles column name conflicts by prefixing right-side columns

  • Data Transformation: Supports column selection, renaming, type casting, and dropping in a single operation

  • Null Handling: Robust normalization of null and empty values for reliable join matching

  • Schema Management: Maintains original schemas for anti-join outputs while applying transformations to inner join results

Examples

Join Node Examples

---

:Example 1 — Basic Customer-Order Join with Unmatched Analysis

  • Left DataFrame: Customers*

customer_id | name   | city
------------|--------|---------
1          | John   | NYC
2          | Alice  | LA
3          | Bob    | Chicago
4          | Eva    | Miami
  • Right DataFrame: Orders*

order_id | customer_id | amount  | order_date
---------|-------------|---------|------------
101      | 1           | 100.50  | 2023-01-15
102      | 1           | 200.00  | 2023-02-20
103      | 2           | 150.75  | 2023-01-10
104      | 4           | 300.25  | 2023-03-05
  • Configuration*

  • Join By: Specific Fields

  • Left Table Join Column: customer_id

  • Right Table Join Column: customer_id

  • Select Left Fields: [customer_id, name, city]

  • Select Right Fields: [order_id, amount, order_date]

  • Inner Join Output*

customer_id | name   | city    | order_id | amount  | order_date
------------|--------|---------|----------|---------|------------
1          | John   | NYC     | 101      | 100.50  | 2023-01-15
1          | John   | NYC     | 102      | 200.00  | 2023-02-20
2          | Alice  | LA      | 103      | 150.75  | 2023-01-10
4          | Eva    | Miami   | 104      | 300.25  | 2023-03-05
  • Left Anti-Join Output (Customers without orders)*

customer_id | name   | city
------------|--------|---------
3          | Bob    | Chicago
---

:Example 2 — Department-Employee Join with Column Renaming

  • Left DataFrame: Departments*

dept_id | dept_name    | location | budget
--------|--------------|----------|--------
D1      | Sales        | North    | 500000
D2      | Engineering  | South    | 800000
D3      | Marketing    | East     | 400000
  • Right DataFrame: Employees*

emp_id | name  | department_id | salary | hire_date
-------|-------|---------------|--------|------------
E1     | Sarah | D1            | 75000  | 2020-05-15
E2     | Mike  | D1            | 80000  | 2019-08-20
E3     | Lisa  | D2            | 95000  | 2021-01-10
E4     | Tom   | D4            | 70000  | 2022-03-01
  • Configuration*

  • Join By: Specific Fields

  • Left Table Join Column: dept_id

  • Right Table Join Column: department_id

  • Select Left Fields: [dept_id, dept_name, location]

  • Rename Left Fields: [dept_id, dept_name, dept_location]

  • Select Right Fields: [emp_id, name, salary]

  • Rename Right Fields: [employee_id, employee_name, annual_salary]

  • Drop Left Fields: [budget]

  • Drop Right Fields: [hire_date]

  • Inner Join Output*

dept_id | dept_name   | dept_location | employee_id | employee_name | annual_salary
--------|-------------|---------------|-------------|---------------|--------------
D1      | Sales       | North         | E1          | Sarah         | 75000
D1      | Sales       | North         | E2          | Mike          | 80000
D2      | Engineering | South         | E3          | Lisa          | 95000
  • Left Anti-Join Output (Departments without employees)*

dept_id | dept_name | location | budget
--------|-----------|----------|--------
D3      | Marketing | East     | 400000
  • Right Anti-Join Output (Employees without departments)*

emp_id | name | department_id | salary | hire_date
-------|------|---------------|--------|------------
E4     | Tom  | D4            | 70000  | 2022-03-01
---

:Example 3 — Product Inventory Join with Data Type Casting

  • Left DataFrame: Products*

product_code | product_name | category     | price_string
-------------|--------------|-------------|-------------
P100         | Laptop       | Electronics | "999.99"
P200         | Chair        | Furniture   | "149.50"
P300         | Monitor      | Electronics | "299.00"
P400         | Desk         | Furniture   | "450.00"
  • Right DataFrame: Inventory*

item_id | product_code | quantity | last_updated
--------|-------------|----------|-------------
I1      | P100        | 25       | 2023-04-01
I2      | P200        | 40       | 2023-04-02
I3      | P300        | 15       | 2023-04-01
I5      | P500        | 10       | 2023-04-03
  • Configuration*

  • Join By: Specific Fields

  • Left Table Join Column: product_code

  • Right Table Join Column: product_code

  • Select Left Fields: [product_code, product_name, category, price_string]

  • DataType Left Fields: [STRING, STRING, STRING, DOUBLE]

  • Select Right Fields: [item_id, quantity, last_updated]

  • DataType Right Fields: [STRING, INTEGER, DATE]

  • Rename Left Fields: [prod_code, prod_name, prod_category, price]

  • Inner Join Output*

prod_code | prod_name | prod_category | price  | item_id | quantity | last_updated
----------|-----------|---------------|--------|---------|----------|-------------
P100      | Laptop    | Electronics   | 999.99 | I1      | 25       | 2023-04-01
P200      | Chair     | Furniture     | 149.50 | I2      | 40       | 2023-04-02
P300      | Monitor   | Electronics   | 299.00 | I3      | 15       | 2023-04-01
  • Left Anti-Join Output (Products without inventory)*

product_code | product_name | category   | price_string
-------------|--------------|-----------|-------------
P400         | Desk         | Furniture | "450.00"
---

:Example 4 — Multi-Column Join with Complex Relationships

  • Left DataFrame: Student Enrollments*

student_id | course_code | semester | grade | instructor
-----------|-------------|----------|-------|-----------
S100       | MATH101     | Fall     | A     | Dr. Smith
S100       | CS201       | Fall     | B     | Dr. Johnson
S200       | MATH101     | Spring   | B+    | Dr. Smith
S300       | PHY301      | Fall     | A-    | Dr. Brown
  • Right DataFrame: Course Catalog*

course_id | course_code | semester | credits | department
----------|-------------|----------|---------|------------
C1        | MATH101     | Fall     | 4       | Mathematics
C2        | CS201       | Fall     | 3       | Computer Science
C3        | MATH101     | Spring   | 4       | Mathematics
C4        | BIO202      | Fall     | 4       | Biology
  • Configuration*

  • Join By: Specific Fields

  • Left Table Join Column: [course_code, semester]

  • Right Table Join Column: [course_code, semester]

  • Select Left Fields: [student_id, course_code, semester, grade]

  • Select Right Fields: [course_id, credits, department]

  • Inner Join Output*

student_id | course_code | semester | grade | course_id | credits | department
-----------|-------------|----------|-------|-----------|---------|------------
S100       | MATH101     | Fall     | A     | C1        | 4       | Mathematics
S100       | CS201       | Fall     | B     | C2        | 3       | Computer Science
S200       | MATH101     | Spring   | B+    | C3        | 4       | Mathematics
  • Left Anti-Join Output (Enrollments without catalog entries)*

student_id | course_code | semester | grade | instructor
-----------|-------------|----------|-------|-----------
S300       | PHY301      | Fall     | A-    | Dr. Brown
  • Right Anti-Join Output (Catalog entries without enrollments)*

course_id | course_code | semester | credits | department
----------|-------------|----------|---------|------------
C4        | BIO202      | Fall     | 4       | Biology
---

:Example 5 — Retail Store Analysis with Sales Data

  • Left DataFrame: Store Locations*

store_id | store_name | state | square_footage
---------|------------|-------|---------------
ST1      | Downtown   | CA    | 10000
ST2      | Mall       | NY    | 15000
ST3      | Plaza      | TX    | 8000
ST4      | Corner     | FL    | 6000
  • Right DataFrame: Monthly Performance*

store_code | month | total_sales | customer_count
-----------|-------|-------------|---------------
ST1        | Jan   | 150000      | 1200
ST1        | Feb   | 165000      | 1350
ST2        | Jan   | 200000      | 1500
ST3        | Jan   | 90000       | 800
ST5        | Jan   | 75000       | 600
  • Configuration*

  • Join By: Specific Fields

  • Left Table Join Column: store_id

  • Right Table Join Column: store_code

  • Select Left Fields: [store_id, store_name, state]

  • Select Right Fields: [month, total_sales, customer_count]

  • Inner Join Output*

store_id | store_name | state | month | total_sales | customer_count
---------|------------|-------|-------|-------------|---------------
ST1      | Downtown   | CA    | Jan   | 150000      | 1200
ST1      | Downtown   | CA    | Feb   | 165000      | 1350
ST2      | Mall       | NY    | Jan   | 200000      | 1500
ST3      | Plaza      | TX    | Jan   | 90000       | 800
  • Left Anti-Join Output (Stores without sales data)*

store_id | store_name | state | square_footage
---------|------------|-------|---------------
ST4      | Corner     | FL    | 6000
  • Right Anti-Join Output (Sales data without stores)*

store_code | month | total_sales | customer_count
-----------|-------|-------------|---------------
ST5        | Jan   | 75000       | 600
---

:Example 6 — Position-Based Join for Time Series Alignment

  • Left DataFrame: Forecast Data*

period | forecast_revenue | growth_rate
-------|------------------|------------
Q1     | 500000           | 0.15
Q2     | 550000           | 0.10
Q3     | 600000           | 0.09
Q4     | 650000           | 0.08
  • Right DataFrame: Actual Results*

quarter | actual_revenue | variance
--------|----------------|---------
Q1      | 480000         | -20000
Q2      | 560000         | 10000
Q3      | 590000         | -10000
  • Configuration*

  • Join By: Position

  • Select Left Fields: [period, forecast_revenue]

  • Select Right Fields: [actual_revenue, variance]

  • Rename Left Fields: [quarter, forecast]

  • Rename Right Fields: [actual, diff]

  • Inner Join Output*

quarter | forecast | actual  | diff
--------|----------|---------|--------
Q1      | 500000   | 480000  | -20000
Q2      | 550000   | 560000  | 10000
Q3      | 600000   | 590000  | -10000
  • Left Anti-Join Output (Forecasts without actuals)*

period | forecast_revenue | growth_rate
-------|------------------|------------
Q4     | 650000           | 0.08
---