Summary Statistics

Summary statistics provide useful information about sample data. eg: measures of spread.

Type

transform

Class

fire.nodes.ml.NodeSummary

Fields

Name

Title

Description

title

Title

colNames

Column Names

Column Names for Summary

path

Path

Save Summary Statistics to Path

Details

Summary Statistics Node Details

The Summary Statistics node makes it easy to explore the contents of a DataFrame at a high level.

This node computes specified statistics which includes : - count - mean - stddev - variance - min - max - approximate percentiles specified as a percentage

Input Parameters

  • OUTPUT STORAGE LEVEL : Keep this as DEFAULT.

  • TITLE : A short description to summarize what the data depicts.

  • COLUMN NAMES :

  • Available : A list of numeric columns derived from the input dataframe schema.

  • Selected : A list of numeric columns for whom the node will compute statistical values.

Examples

Summary Statistics Node Example

Consider the following DataFrame

ID  | CODE |
------------
1   | aa  |
2   | aa  |
9   | bb  |
5   | cc  |

If we calculate the summary statistics for all columns in the DataFrame, we get

summary|      ID          |  CODE |
-----------------------------------
|  count|                4|      4|
|   mean|             4.25|   null|
|    min|                1|     aa|
|    25%|                1|   null|
|    50%|                2|   null|
|    75%|                5|   null|
|    max|                9|     cc|
| stddev|3.593976442141304|   null|
|variance|12.916667       |   null|