Split With Stratified Sampling¶
This node splits the incoming DataFrame into 2. It takes in the fraction to use in splitting the data by Stratified Sampling.
Input¶
It takes in a DataFrame as input
Output¶
The input DataFrame is split into 2 DataFrames and output
Type¶
transform
Class¶
fire.nodes.util.SplitWithStratifiedSampling
Fields¶
Name |
Title |
Description |
|---|---|---|
keyInputCol |
Column Name |
column that defines strata |
fraction |
Fraction |
sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero |
seed |
Seed |
random seed |
Details¶
Split With Stratified Sampling, which is the preferred way to sample from populations with varing subpopulation sizes.
Returns a stratified sample without replacement based on the fraction given on each stratum.
More details are available at : https://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/dataframe.html#DataFrame.sampleBy