WhatIf Summary Statistics

WhatIf Summary statistics provide useful information about sample data. eg: measures of spread.

Type

transform

Class

fire.nodes.ml.NodeWhatIfSummary

Fields

Name

Title

Description

categoricalColNames

Categorical Column Names

Column Names for Summary

numericColNames

Numeric Column Names

Column Names for Summary

Details

WhatIf Summary Statistics Node Details

The WhatIf Summary Statistics node makes it easy to explore the contents of a DataFrame at a high level.

This node computes specified statistics which includes : - count - mean - stddev - variance - min - max - approximate percentiles specified as a percentage

Input Parameters

  • OUTPUT STORAGE LEVEL : Keep this as DEFAULT.

  • TITLE : A short description to summarize what the data depicts.

  • COLUMN NAMES :

  • Available : A list of numeric columns derived from the input dataframe schema.

  • Selected : A list of numeric columns for whom the node will compute statistical values.

Examples

WhatIf Summary Statistics Node Example

Consider the following DataFrame

ID  | CODE |
------------
1   | aa  |
2   | aa  |
9   | bb  |
5   | cc  |

If we calculate the summary statistics for all columns in the DataFrame, we get

summary|      ID          |  CODE |
-----------------------------------
|  count|                4|      4|
|   mean|             4.25|   null|
|    min|                1|     aa|
|    25%|                1|   null|
|    50%|                2|   null|
|    75%|                5|   null|
|    max|                9|     cc|
| stddev|3.593976442141304|   null|
|variance|12.916667       |   null|