AFT Survival Regression

Accelerated failure time (AFT) model which is a parametric survival regression model for censored data.

Output

It generates the LAFTSurvivalRegressionModel and passes it to the next Predict and ModelSave Nodes. The input DataFrame is also passed along to the next nodes.

Type

ml-estimator

Class

fire.nodes.ml.NodeAFTSurvivalRegression

Fields

Name

Title

Description

featuresCol

Features Column

Features column of type vectorUDT for model fitting

labelCol

Label Column

The label column for model fitting

splitRatio

Split Ratio

Split Ratio

censorCol

Censor Column

Indicator of the event has occurred or not. If the value is 1.O, it means the event has occurred i.e. uncensored; otherwise censored

fitIntercept

Fit Intercept

Whether to fit an intercept term

maxIter

Maximum Iterations

Maximum number of iterations (>= 0)

tol

Tolerance

The convergence tolerance for iterative algorithms

quantileProbabilities

QuantileProbabilities

Values of the quantile probabilities array should be in the range (0, 1)

quantilesCol

Quantiles Column

The quantiles column created during model scoring

predictionCol

Prediction Column

The prediction column created during model scoring

Details

Apache Spark ML implements the Accelerated failure time (AFT) model which is a parametric survival regression model for censored data. It describes a model for the log of survival time, so it’s often called a log-linear model for survival analysis. Different from a Proportional hazards model designed for the same purpose, the AFT model is easier to parallelize because each instance contributes to the objective function independently.

More details can be found at Spark MLlib/ML docs page : https://spark.apache.org/docs/latest/ml-classification-regression.html#survival-regression

Examples

Below example is available at : https://spark.apache.org/docs/latest/ml-classification-regression.html#survival-regression

import org.apache.spark.ml.linalg.Vectors

import org.apache.spark.ml.regression.AFTSurvivalRegression

val training = spark.createDataFrame(Seq(

(1.218, 1.0, Vectors.dense(1.560, -0.605)),

(2.949, 0.0, Vectors.dense(0.346, 2.158)),

(3.627, 0.0, Vectors.dense(1.380, 0.231)),

(0.273, 1.0, Vectors.dense(0.520, 1.151)),

(4.199, 0.0, Vectors.dense(0.795, -0.226))

)).toDF(“label”, “censor”, “features”)

val quantileProbabilities = Array(0.3, 0.6)

val aft = new AFTSurvivalRegression()

.setQuantileProbabilities(quantileProbabilities)

.setQuantilesCol(“quantiles”)

val model = aft.fit(training)

// Print the coefficients, intercept and scale parameter for AFT survival regression

println(s”Coefficients: ${model.coefficients}”)

println(s”Intercept: ${model.intercept}”)

println(s”Scale: ${model.scale}”)

model.transform(training).show(false)