Join Advanced¶
Append fields from a Source DataFrame to every row of a Target DataFrame (Cartesian-style append).
Type¶
join2inputs
Class¶
fire.nodes.etl.NodeJoin
Fields¶
Name |
Title |
Description |
|---|---|---|
Join |
Join |
|
joinBy |
Join By |
whether to append dataframe as a table to email content |
leftTableJoinColumn |
LeftTableJoinColumn |
|
rightTableJoinColumn |
RightTableJoinColumn |
|
Select |
Select |
|
selectLeftFields |
Select Left Fields |
Columns |
dataTypeLeftFields |
Data Type Left Fields |
New data type (INTEGER, DOUBLE, STRING, LONG, SHORT) |
renameLeftFields |
Rename Left Fields |
rename column |
selectRightFields |
Select Right Fields |
Columns |
dataTypeRightFields |
Data Type Right Fields |
New data type (INTEGER, DOUBLE, STRING, LONG, SHORT) |
renameRightFields |
Rename Right Fields |
rename column |
Drop |
Drop |
|
inputPropagateSchema |
Propagate Input Schema |
propagate input schema |
dropLeftFields |
Drop Left Columns |
Drop Columns |
dropRightFields |
Drop Right Columns |
Drop Columns |
Details¶
Join Node Details¶
This node performs comprehensive join operations between two DataFrames, supporting both position-based and field-based joins with extensive data transformation capabilities. It enables inner joins with anti-join outputs to identify unmatched records for complete data analysis.
The configuration includes:
Join Tab: Selection of join type (Position or Specific Fields) and specification of join keys when using field-based joins
Select Tab: Independent column selection, renaming, and data type casting for both left and right input DataFrames
Drop Tab: Column dropping configuration with schema propagation options
Key features:
Dual Output Strategy: Produces inner join results plus left/right anti-joins for unmatched record analysis
Conflict Resolution: Automatically handles column name conflicts by prefixing right-side columns
Data Transformation: Supports column selection, renaming, type casting, and dropping in a single operation
Null Handling: Robust normalization of null and empty values for reliable join matching
Schema Management: Maintains original schemas for anti-join outputs while applying transformations to inner join results
Examples¶
Join Node Examples¶
---
:Example 1 — Basic Customer-Order Join with Unmatched Analysis¶
Left DataFrame: Customers*
customer_id | name | city
------------|--------|---------
1 | John | NYC
2 | Alice | LA
3 | Bob | Chicago
4 | Eva | Miami
Right DataFrame: Orders*
order_id | customer_id | amount | order_date
---------|-------------|---------|------------
101 | 1 | 100.50 | 2023-01-15
102 | 1 | 200.00 | 2023-02-20
103 | 2 | 150.75 | 2023-01-10
104 | 4 | 300.25 | 2023-03-05
Configuration*
Join By: Specific Fields
Left Table Join Column: customer_id
Right Table Join Column: customer_id
Select Left Fields: [customer_id, name, city]
Select Right Fields: [order_id, amount, order_date]
Inner Join Output*
customer_id | name | city | order_id | amount | order_date
------------|--------|---------|----------|---------|------------
1 | John | NYC | 101 | 100.50 | 2023-01-15
1 | John | NYC | 102 | 200.00 | 2023-02-20
2 | Alice | LA | 103 | 150.75 | 2023-01-10
4 | Eva | Miami | 104 | 300.25 | 2023-03-05
Left Anti-Join Output (Customers without orders)*
customer_id | name | city
------------|--------|---------
3 | Bob | Chicago
---
:Example 2 — Department-Employee Join with Column Renaming¶
Left DataFrame: Departments*
dept_id | dept_name | location | budget
--------|--------------|----------|--------
D1 | Sales | North | 500000
D2 | Engineering | South | 800000
D3 | Marketing | East | 400000
Right DataFrame: Employees*
emp_id | name | department_id | salary | hire_date
-------|-------|---------------|--------|------------
E1 | Sarah | D1 | 75000 | 2020-05-15
E2 | Mike | D1 | 80000 | 2019-08-20
E3 | Lisa | D2 | 95000 | 2021-01-10
E4 | Tom | D4 | 70000 | 2022-03-01
Configuration*
Join By: Specific Fields
Left Table Join Column: dept_id
Right Table Join Column: department_id
Select Left Fields: [dept_id, dept_name, location]
Rename Left Fields: [dept_id, dept_name, dept_location]
Select Right Fields: [emp_id, name, salary]
Rename Right Fields: [employee_id, employee_name, annual_salary]
Drop Left Fields: [budget]
Drop Right Fields: [hire_date]
Inner Join Output*
dept_id | dept_name | dept_location | employee_id | employee_name | annual_salary
--------|-------------|---------------|-------------|---------------|--------------
D1 | Sales | North | E1 | Sarah | 75000
D1 | Sales | North | E2 | Mike | 80000
D2 | Engineering | South | E3 | Lisa | 95000
Left Anti-Join Output (Departments without employees)*
dept_id | dept_name | location | budget
--------|-----------|----------|--------
D3 | Marketing | East | 400000
Right Anti-Join Output (Employees without departments)*
emp_id | name | department_id | salary | hire_date
-------|------|---------------|--------|------------
E4 | Tom | D4 | 70000 | 2022-03-01
---
:Example 3 — Product Inventory Join with Data Type Casting¶
Left DataFrame: Products*
product_code | product_name | category | price_string
-------------|--------------|-------------|-------------
P100 | Laptop | Electronics | "999.99"
P200 | Chair | Furniture | "149.50"
P300 | Monitor | Electronics | "299.00"
P400 | Desk | Furniture | "450.00"
Right DataFrame: Inventory*
item_id | product_code | quantity | last_updated
--------|-------------|----------|-------------
I1 | P100 | 25 | 2023-04-01
I2 | P200 | 40 | 2023-04-02
I3 | P300 | 15 | 2023-04-01
I5 | P500 | 10 | 2023-04-03
Configuration*
Join By: Specific Fields
Left Table Join Column: product_code
Right Table Join Column: product_code
Select Left Fields: [product_code, product_name, category, price_string]
DataType Left Fields: [STRING, STRING, STRING, DOUBLE]
Select Right Fields: [item_id, quantity, last_updated]
DataType Right Fields: [STRING, INTEGER, DATE]
Rename Left Fields: [prod_code, prod_name, prod_category, price]
Inner Join Output*
prod_code | prod_name | prod_category | price | item_id | quantity | last_updated
----------|-----------|---------------|--------|---------|----------|-------------
P100 | Laptop | Electronics | 999.99 | I1 | 25 | 2023-04-01
P200 | Chair | Furniture | 149.50 | I2 | 40 | 2023-04-02
P300 | Monitor | Electronics | 299.00 | I3 | 15 | 2023-04-01
Left Anti-Join Output (Products without inventory)*
product_code | product_name | category | price_string
-------------|--------------|-----------|-------------
P400 | Desk | Furniture | "450.00"
---
:Example 4 — Multi-Column Join with Complex Relationships¶
Left DataFrame: Student Enrollments*
student_id | course_code | semester | grade | instructor
-----------|-------------|----------|-------|-----------
S100 | MATH101 | Fall | A | Dr. Smith
S100 | CS201 | Fall | B | Dr. Johnson
S200 | MATH101 | Spring | B+ | Dr. Smith
S300 | PHY301 | Fall | A- | Dr. Brown
Right DataFrame: Course Catalog*
course_id | course_code | semester | credits | department
----------|-------------|----------|---------|------------
C1 | MATH101 | Fall | 4 | Mathematics
C2 | CS201 | Fall | 3 | Computer Science
C3 | MATH101 | Spring | 4 | Mathematics
C4 | BIO202 | Fall | 4 | Biology
Configuration*
Join By: Specific Fields
Left Table Join Column: [course_code, semester]
Right Table Join Column: [course_code, semester]
Select Left Fields: [student_id, course_code, semester, grade]
Select Right Fields: [course_id, credits, department]
Inner Join Output*
student_id | course_code | semester | grade | course_id | credits | department
-----------|-------------|----------|-------|-----------|---------|------------
S100 | MATH101 | Fall | A | C1 | 4 | Mathematics
S100 | CS201 | Fall | B | C2 | 3 | Computer Science
S200 | MATH101 | Spring | B+ | C3 | 4 | Mathematics
Left Anti-Join Output (Enrollments without catalog entries)*
student_id | course_code | semester | grade | instructor
-----------|-------------|----------|-------|-----------
S300 | PHY301 | Fall | A- | Dr. Brown
Right Anti-Join Output (Catalog entries without enrollments)*
course_id | course_code | semester | credits | department
----------|-------------|----------|---------|------------
C4 | BIO202 | Fall | 4 | Biology
---
:Example 5 — Retail Store Analysis with Sales Data¶
Left DataFrame: Store Locations*
store_id | store_name | state | square_footage
---------|------------|-------|---------------
ST1 | Downtown | CA | 10000
ST2 | Mall | NY | 15000
ST3 | Plaza | TX | 8000
ST4 | Corner | FL | 6000
Right DataFrame: Monthly Performance*
store_code | month | total_sales | customer_count
-----------|-------|-------------|---------------
ST1 | Jan | 150000 | 1200
ST1 | Feb | 165000 | 1350
ST2 | Jan | 200000 | 1500
ST3 | Jan | 90000 | 800
ST5 | Jan | 75000 | 600
Configuration*
Join By: Specific Fields
Left Table Join Column: store_id
Right Table Join Column: store_code
Select Left Fields: [store_id, store_name, state]
Select Right Fields: [month, total_sales, customer_count]
Inner Join Output*
store_id | store_name | state | month | total_sales | customer_count
---------|------------|-------|-------|-------------|---------------
ST1 | Downtown | CA | Jan | 150000 | 1200
ST1 | Downtown | CA | Feb | 165000 | 1350
ST2 | Mall | NY | Jan | 200000 | 1500
ST3 | Plaza | TX | Jan | 90000 | 800
Left Anti-Join Output (Stores without sales data)*
store_id | store_name | state | square_footage
---------|------------|-------|---------------
ST4 | Corner | FL | 6000
Right Anti-Join Output (Sales data without stores)*
store_code | month | total_sales | customer_count
-----------|-------|-------------|---------------
ST5 | Jan | 75000 | 600
---
:Example 6 — Position-Based Join for Time Series Alignment¶
Left DataFrame: Forecast Data*
period | forecast_revenue | growth_rate
-------|------------------|------------
Q1 | 500000 | 0.15
Q2 | 550000 | 0.10
Q3 | 600000 | 0.09
Q4 | 650000 | 0.08
Right DataFrame: Actual Results*
quarter | actual_revenue | variance
--------|----------------|---------
Q1 | 480000 | -20000
Q2 | 560000 | 10000
Q3 | 590000 | -10000
Configuration*
Join By: Position
Select Left Fields: [period, forecast_revenue]
Select Right Fields: [actual_revenue, variance]
Rename Left Fields: [quarter, forecast]
Rename Right Fields: [actual, diff]
Inner Join Output*
quarter | forecast | actual | diff
--------|----------|---------|--------
Q1 | 500000 | 480000 | -20000
Q2 | 550000 | 560000 | 10000
Q3 | 600000 | 590000 | -10000
Left Anti-Join Output (Forecasts without actuals)*
period | forecast_revenue | growth_rate
-------|------------------|------------
Q4 | 650000 | 0.08
---