Drop Duplicate Rows

Drops duplicate rows from the incoming DataFrame. Specific columns can be selected to be used when comparing two rows

Type

transform

Class

fire.nodes.etl.NodeDropDuplicateRows

Fields

Name

Title

Description

colNames

Columns

Columns to be used in determining if any two rows are duplication. No columns indicate to use all the available columns.

Details

This node drops duplicate rows from the incoming DataFrame.

Specific columns can be selected to be used when comparing two rows.

One of the matching rows is included in the outgoing Dataframe.

Examples

Incoming Dataframe has following rows:

EMP_CD    |    EMP_NAME    |    DEPT       |    AGE
-------------------------------------------------------
E01       |    DAVID       |    HR         |    25
E05       |    DAVID       |    HR         |    25
E02       |    JOHN        |    SALES      |    35
E03       |    JOHN        |    MARKETING  |    40
E04       |    JOHN        |    MARKETING  |    45

If DropDuplicateRows node is configured to drop duplicate rows having duplicate values in [EMP_NAME] and [DEPT] then outgoing dataframe would be created as below:

EMP_CD    |    EMP_NAME    |    DEPT       |    AGE
-------------------------------------------------------
E02       |    JOHN        |    SALES      |    35
E01       |    DAVID       |    HR         |    25
E03       |    JOHN        |    MARKETING  |    40

If DropDuplicateRows node is configured to drop duplicate rows having duplicate values in [EMP_NAME], [DEPT] and [AGE] then outgoing dataframe would be created as below:

EMP_CD    |    EMP_NAME    |    DEPT       |    AGE
-------------------------------------------------------
E01       |    DAVID       |    HR         |    25
E02       |    JOHN        |    SALES      |    35
E03       |    JOHN        |    MARKETING  |    40
E04       |    JOHN        |    MARKETING  |    45