Lookup

Find values in a target column using a lookup table and either append fields or replace matched text.

Type

pyspark2inputs

Class

fire.nodes.etl.NodeLookup

Fields

Name

Title

Description

Find

Find

matchLocation

Match

How to match the lookup find value against the target text.

targetCol

Find Within Field

Column in the input dataframe to search/modify.

lookupFindCol

Find Value

Column in the lookup table that contains the find values.

caseInsensitive

Case Insensitive

Perform case-insensitive matching.

wholeWord

Match Whole Word Only

Match Whole Word Only

Replace

Replace

actionType

Action

Choose Append to add lookup columns, or Replace to change matched text in the target column.

lookupReplaceCol

Lookup Replace Column

When Action is Replace: column in lookup table that contains the replacement text.

appendCols

Append Columns

When Action is Append: which columns from the lookup table to append to the input rows.

replaceMultipleItems

Replace Multiple Items

When replacing, replace all matches rather than only the first.

Details

Lookup Node Details

The Lookup node performs text-based lookups from a small reference (lookup) DataFrame against a target column in the main input DataFrame. It supports flexible matching options and can either append additional columns from the lookup row or replace matched text in the target column. The lookup DataFrame is broadcasted for efficient distributed processing, with a safety limit of 100,000 rows to prevent excessive memory usage.

General:

Match:

Specifies how the find value matches the target text: ‘Beginning of Field’ (starts with), ‘Any Part of Field’ (contains), or ‘Entire Field’ (exact match).

Find Within Field:

The column in the main input DataFrame to search or modify. This is a required field.

Find Value:

The column in the lookup DataFrame containing the values to search for. This is a required field.

Case Insensitive:

When enabled, matching ignores case differences (e.g., ‘Apple’ matches ‘apple’).

Match Whole Word Only:

When enabled, matches are bounded by non-word characters (e.g., ‘cat’ matches ‘The cat sat’ but not ‘category’).

Replace Tab:

Action:

Select ‘Replace’ to modify the target column by substituting matched text with replacement values from the lookup.

Lookup Replace Column:

The column in the lookup DataFrame providing the replacement text for matches. Required when Action is Replace.

Replace Multiple Items:

When enabled, all occurrences of the find value in the target are replaced; otherwise, only the first match is replaced.

Action:

Select ‘Append’ to add columns from the matching lookup row to the input row without modifying the target.

Append Columns:

The columns from the lookup DataFrame to add as new columns in the output. Required when Action is Append.

Output:

The node outputs a single DataFrame:

For Append: Original schema plus the selected append columns (with nulls for non-matches).

For Replace: Original schema with the target column updated (unmatched rows unchanged).

Examples

Lookup Node Examples

Replace Action Example

Input:

Main DataFrame (schema 0):

| id | description |
|----|----------------------|
| 1 | I love red APPLE |
| 2 | The quick brown fox |
| 3 | Eating green apple pie |

Lookup DataFrame (schema 1):

| find_term | replacement |
|-----------|-------------|
| APPLE     | fruit       |
| fox       | animal      |

The Lookup node is configured as follows:

Match: Any Part of Field

Find Within Field: description

Find Value: find_term

Case Insensitive: true

Match Whole Word Only: true

Action: Replace

Lookup Replace Column: replacement

Replace Multiple Items: false

Output:

| id | description          |
|----|----------------------|
| 1  | I love red fruit     |
| 2  | The quick brown animal |
| 3  | Eating green apple pie |

(Row 1: ‘APPLE’ replaced with ‘fruit’; Row 2: ‘fox’ replaced with ‘animal’; Row 3: no match, unchanged.)

Append Action Example

Input:

Main DataFrame (schema 0):

| id | category |
|----|----------|
| 1  | fruit    |
| 2  | veggie   |
| 3  | fruit    |

Lookup DataFrame (schema 1):

| category | price | color  |
|----------|-------|--------|
| fruit    | 1.50  | red    |
| veggie   | 2.00  | green  |

The Lookup node is configured as follows:

Match: Entire Field

Find Within Field: category

Find Value: category

Case Insensitive: false

Match Whole Word Only: false

Action: Append

Append Columns: [price, color]

Output:

| id | category | price | color |
|----|----------|-------|-------|
| 1  | fruit    | 1.50  | red   |
| 2  | veggie   | 2.00  | green |
| 3  | fruit    | 1.50  | red   |

(Rows 1 and 3 match ‘fruit’ and append price/color; Row 2 matches ‘veggie’ and appends its values.)