FindDuplicate

This node splits the incoming DataFrame into two output DataFrames one having unique values and other having rest of duplicates

Input

It accepts a DataFrame as input from the previous Node

Type

transform

Class

fire.nodes.etl.NodeUnique

Fields

Name

Title

Description

inputCols

Select Columns

Select the columns you want to check for unique values.

sortByColNames

Columns

Select one or more columns to sort by. Drag to reorder — the top column is the primary sort key, followed by secondary, tertiary, etc.

ascDesc

Sorting Order

Individually set ASC (ascending) or DESC (descending) for each selected column. Example: Salary → DESC, then Department → ASC, then Name → ASC

Details

Filter Unique Details

This Node Separates data into two streams, unique and duplicate rows, based on the columns you choose.

Filter unique filters out unique data from the dataset based on input columns and outputs it in lower edge.

It outputs rest of the values (duplicates which were dropped) in the higher edge.

Examples

Incoming Dataframe has following rows:

CUST_CD   |   CUST_NAME    |   AGE
------------------------------------
C01       |   MATT         |   50
C02       |   DAVID        |   45
C03       |   DAVID        |   35
C04       |   DAVID        |   30

If Filter Unique node is configured as:

Filter By Columns: CUST_NAME

then two outgoing Dataframes would be created as below:

First Dataframe with only Unique Values

CUST_CD   |   CUST_NAME    |   AGE
------------------------------------
C01       |   MATT         |   50
C02       |   DAVID        |   45

Second Dataframe with all duplicate values except for the one that got output in above dataframe

CUST_CD   |   CUST_NAME   |   AGE
------------------------------------
C03       |   DAVID       |   35
C04       |   DAVID       |   30