Expressions

The most powerful and intuitive column calculator – add unlimited new columns using full Spark SQL expressions. Perfect for feature engineering, business logic, cleaning, formatting, date manipulation, and creating report-ready fields without writing code.

Type

transform

Class

fire.nodes.etl.NodeExpressions

Fields

Name

Title

Description

Expressions

Expressions

outputCols

Column Name

Name of the new column you want to create (e.g., Revenue_Net, FullName, IsHighValue, OrderMonth).

expressions

Expression

Full Spark SQL expression to compute the column value. Supports 400+ functions: math, string, date, conditional, regex, JSON, arrays, etc. Quick Examples: • amount * quantity • concat(first_name, ‘ ‘, last_name) • date_format(order_date, ‘yyyy-MM’) • when(amount > 10000, ‘VIP’).otherwise(‘Regular’) • coalesce(email, phone, ‘unknown’)

propagation

ColumnPropagation

inputColumnPropagation

Input Column Propagation

When true (default) → all original columns are automatically kept + your new ones. Set to false if you want only the newly created columns.

outputSchema

InferSchema

outputColNames

Output Column Name

Force exact output column names (useful with propagation = false)

outputColTypes

Output Column Type

Force data types for new columns

outputColFormats

Output Column Format

Display formats (dates, numbers)

Details

Expressions Node – Your No-Code Business Logic Powerhouse

The Expressions node is the #1 most-used transform in production pipelines. It replaces dozens of individual nodes (String Ops, Date Ops, Math, If-Then-Else) with one clean, fast, and readable interface using standard Spark SQL syntax.

Why Teams Love It

  • Full Spark SQL → 400+ functions, no black boxes

  • Instant preview of results

  • Auto-complete & syntax highlighting

  • Perfect for KPIs, cleansing, segmentation, formatting

  • Same syntax works in dbt, Snowflake, BigQuery, Databricks

Pro Tips

  • Use when(…).otherwise(…) instead of nested ifs

  • Wrap divisions in coalesce(…, 0) or nullif(denominator, 0)

  • Always test with limit 10 first

  • Combine with Schema tab to force exact types

Examples

Expressions Node – Real Business Examples

Example 1 – Revenue & Margin Calculations

| Output Column     | Expression                                      |
|-------------------|-------------------------------------------------|
| gross_revenue     | amount * quantity                               |
| discount_amt      | gross_revenue * coalesce(discount_rate, 0)      |
| net_revenue       | gross_revenue - discount_amt                    |
| margin_pct        | round((net_revenue - cost) / net_revenue * 100, 2) |

Example 2 – Customer Name & Segmentation

| Output Column     | Expression                                      |
|-------------------|-------------------------------------------------|
| full_name         | concat_ws(' ', initcap(first_name), initcap(last_name)) |
| name_clean        | regexp_replace(trim(full_name), '\\s+', ' ')    |
| customer_tier     | when(net_revenue >= 100000, 'Platinum').when(net_revenue >= 50000, 'Gold').when(net_revenue >= 10000, 'Silver').otherwise('Bronze') |

Example 3 – Date Intelligence

| Output Column     | Expression                                      |
|-------------------|-------------------------------------------------|
| order_month       | date_format(order_date, 'yyyy-MM')              |
| order_quarter     | concat('Q', quarter(order_date))                |
| days_since_order  | datediff(current_date(), order_date)            |
| is_recent         | days_since_order <= 30                          |
| fiscal_year       | when(month(order_date) >= 4, year(order_date) + 1).otherwise(year(order_date)) |

Example 4 – Safe Math & Cleaning

| Output Column     | Expression                                      |
|-------------------|-------------------------------------------------|
| avg_basket_size   | round(amount / nullif(items_count, 0), 2)       |
| email_domain      | substring_index(email, '@', -1)                 |
| phone_clean       | regexp_replace(phone, '[^0-9]', '')             |
| revenue_safe      | coalesce(revenue, 0)                            |

Example 5 – Flags & Indicators

| Output Column     | Expression                                      |
|-------------------|-------------------------------------------------|
| has_email         | email is not null                               |
| is_new_customer   | first_order_date = order_date                   |
| high_value_flag   | amount > 5000                                   |
| status            | when(is_cancelled, 'Cancelled').when(is_shipped, 'Shipped').otherwise('Processing') |

Example 6 – Current Timestamp & Audit Columns

| Output Column     | Expression                                      |
|-------------------|-------------------------------------------------|
| load_date         | current_date()                                  |
| load_timestamp    | current_timestamp()                             |
| run_id            | 'RUN_' || date_format(current_timestamp(), 'yyyyMMdd_HHmmss') |

Example 7 – Complex JSON & Array Handling

| Output Column     | Expression                                      |
|-------------------|-------------------------------------------------|
| event_type        | json_tuple(events_json, 'type')[0]              |
| tags_array        | split(tags_string, ',')                         |
| tag_count         | size(tags_array)                                |
| has_tag_promo     | array_contains(tags_array, 'PROMO')             |