Impute Advanced

It imputes missing or given value with constant value,mean, median or mode

Type

transform

Class

fire.nodes.etl.NodeImputeAdvanced

Fields

Name

Title

Description

inputCols

Columns

Columns to be processed for missing values

strategy

Impute Strategy

Imputing Strategy

replaceValue

Replace Value

Value to replace, when empty it replaces the missing values

constants

Constant

Missing value will be replaced with constant.Applicable only when imputation strategy is constant

Details

This node imputes missing values or replaces specified values in the selected columns by mean, meadian, mode or constant.

Examples

Incoming Dataframe has following rows and missing value / [NULL] for some rows:

+---+----+----+----+----+
| ID|  X1|  X2|  X3|  X4|
+---+----+----+----+----+
|  1|4.45| 5.6|null| 7.0|
|  2|5.83|5.72|2.55|10.0|
|  3|1.54|6.97|3.54| 3.0|
|  4|null|3.98|4.95| 2.0|
|  5| 3.1|null|8.42|null|
|  6|8.74| 6.1|1.91| 4.0|
|  7|null|0.01|8.07| 5.0|
|  8|7.51|6.31|5.94| 4.0|
|  9|1.21|4.74|1.91| 5.0|
| 10|1.85|7.02|null| 6.0|
+---+----+----+----+----+

If Impute Advanced node is configured to:

selected column: X1 -> Imputation strategy ->Mean

selected column: X2 -> Imputation strategy ->Median

selected column: X3 -> Imputation strategy ->Mode

selected column: X4 -> Imputation strategy ->Constant -> 123

selcted column : X1 -> Imputation Strategy ->Median -> Replace value -> 3.1

selected column: X2 -> Imputation Strategy ->Constant ->Replace value -> 0.01

Outgoing Dataframe would result as below:

+---+----+----+----+-------+
| ID|  X1|  X2|  X3|     X4|
+---+----+----+----+-------+
|  1|4.45| 5.6|1.91|    7.0|
|  2|5.83|5.72|2.55|   10.0|
|  3|1.54|6.97|3.54|    3.0|
|  4|4.28|3.98|4.95|    2.0|
|  5|4.28|5.72|8.42|  123.0|
|  6|8.74| 6.1|1.91|    4.0|
|  7|4.28| 1.0|8.07|    5.0|
|  8|7.51|6.31|5.94|    4.0|
|  9|1.21|4.74|1.91|    5.0|
| 10|1.85|7.02|1.91|    6.0|
+---+----+----+----+-------+

Similarly if one wants to replace a particular value by mean/median/mode/constant, This can be achieved by specifying replace value in the node configration.