Union Advanced¶

Smart Union node that combines multiple DataFrames with full control: union by column name or position, include all columns or only common ones, and automatically handle mismatched schemas with null padding. Perfect for merging monthly files, combining sources with evolving schemas, and building robust incremental pipelines.

Input¶

It accepts two or more DataFrames as input from the previous nodes.

Output¶

This node outputs a single DataFrame resulting from the union operation based on the selected configuration.

Type¶

join

Class¶

fire.nodes.etl.NodeUnionAdvanced

Fields¶

Name	Title	Description
allowMissingColumns	Allow Missing Columns	When true → DataFrames with different schemas are automatically aligned by filling missing columns with null values. When false → all input DataFrames must have exactly the same schema (fails otherwise). Recommended: true for real-world incremental loads.
configBy	Configure By	How columns are matched across DataFrames: • NAME → matches by column name (case-sensitive). Safest and most common choice. • POSITION → matches by column order (1st column of each DF lines up, 2nd column, etc.). Uses the column names from the first incoming DataFrame.
outputFields	Output Fields	Which columns appear in the final result: • ALL → includes every column that exists in any input (missing values become null). Best for full history and audit. • COMMON → keeps only columns that exist in ALL inputs. Great when you want a clean, consistent schema.

Details¶

Union Advanced Node – The Smart Way to Combine Data¶

The Union Advanced node is the most flexible and production-ready way to stack DataFrames vertically. Whether you’re appending daily files, merging regional exports, or combining sources that evolve over time — this node handles it gracefully without breaking your pipeline.

Real-World Use Cases¶

Monthly/weekly/daily incremental loads (new columns appear over time)
Combining sales data from different regions/countries with slightly different schemas
Merging legacy and new systems during migration
Building historical tables where schema evolves naturally
Preparing data for slowly changing dimensions (SCD)

Key Advantages Over Basic Union¶

Automatically handles missing or extra columns
Two matching strategies (by name or position)
Option to keep full history (ALL) or enforce strict schema (COMMON)
No manual column alignment needed
Works seamlessly with 2 to 100+ input branches

Best Practices¶

Always use configBy = NAME (unless you have a very specific positional reason)
Use outputFields = ALL + allowMissingColumns = true for audit-ready, future-proof pipelines
Use outputFields = COMMON when downstream models or reports require fixed schema

Examples¶

Union Advanced – Practical Business Examples¶

Example 1 – Monthly Sales Files (Schema Evolves Over Time)¶

Scenario¶

Jan file has columns: date, customer_id, amount

Feb file adds new column: promo_code

Mar file adds: channel

Configuration¶

Allow Missing Columns: true
Configure By: NAME
Output Fields: ALL

Result¶

| date       | customer_id | amount | promo_code | channel |
|------------|-------------|--------|------------|---------|
| 2025-01-01 | 101         | 150    | null       | null    |
| ...        | ...         | ...    | ...        | ...     |
| 2025-02-01 | 205         | 320    | SUMMER10   | null    |
| 2025-03-01 | 310         | 450    | SPRING25   | Online  |

Perfect for building a full historical sales table.

Example 2 – Regional Reports (Different Column Order & Extra Columns)¶

US Report: customer_id, name, state, revenue¶

EU Report: revenue, customer_name, country, customer_id¶

APAC Report: customer_id, revenue, region¶

Configuration¶

Allow Missing Columns: true
Configure By: NAME
Output Fields: ALL

Result¶

All regions stacked with consistent column order by name, missing fields filled with null.

Example 3 – Enforce Strict Schema (Data Quality Gate)¶

You only want columns that exist in every single daily file¶

Configuration¶

Allow Missing Columns: true
Configure By: NAME
Output Fields: COMMON

Result¶

Only columns present in 100% of inputs are kept — acts as automatic schema validation.

Union Advanced¶

Input¶

Output¶

Type¶

Class¶

Fields¶

Details¶

Union Advanced Node – The Smart Way to Combine Data¶

Real-World Use Cases¶

Key Advantages Over Basic Union¶

Best Practices¶

Examples¶

Union Advanced – Practical Business Examples¶

Example 1 – Monthly Sales Files (Schema Evolves Over Time)¶

Scenario¶

Configuration¶

Result¶

Example 2 – Regional Reports (Different Column Order & Extra Columns)¶

US Report: customer_id, name, state, revenue¶

EU Report: revenue, customer_name, country, customer_id¶

APAC Report: customer_id, revenue, region¶

Configuration¶

Result¶

Example 3 – Enforce Strict Schema (Data Quality Gate)¶

You only want columns that exist in every single daily file¶

Configuration¶

Result¶

Example 4 – Legacy Positional Files (Old Systems)¶

Old COBOL extracts have fixed positions, no headers, columns always in same order¶

Configuration¶

Result¶

Example 5 – Combining 12 Monthly Branches into Yearly Table¶

12 separate pipelines (one per month) → all connect to one Union Advanced¶

Configuration¶

Result¶