Linear Regression

The interface for working with linear regression models and model summaries is similar to the logistic regression case.

Input

This takes in a DataFrame and performs Logistic Regression

Output

It generates the LinearRegressionModel and passes it to the next Predict and ModelSave Nodes. The input DataFrame is also passed along to the next nodes.

Type

ml-estimator

Class

fire.nodes.ml.NodeLinearRegression

Fields

Name

Title

Description

modelIdentifier

Model Identifier

modelIdentifier starts with $loop & columns names separated with underscore. Example: $loop_columnName1_columnName2.

splitRatio

Split Ratio

Split Ratio

featuresCol

Features Column

Features column of type vectorUDT for model fitting

labelCol

Label Column

The label column for model fitting

predictionCol

Prediction Column

The prediction column created during model scoring

fitIntercept

Fit Intercept

Whether to fit an intercept term

maxIter

Maximum Iterations

Maximum number of iterations (>= 0)

regParam

Regularization Param

The regularization parameter

elasticNetParam

ElasticNet Param

The ElasticNet mixing parameter. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty

solver

Solver

The solver algorithm for optimization

standardization

Standardization

Whether to standardize the training features before fitting the model

tol

Tolerance

The convergence tolerance for iterative algorithms

weightCol

Weight Column

If the ‘weight column’ is not specified, all instances are treated equally with a weight 1.0

aggregationDepth

Aggregation Depth

depth for treeAggregate

epsilon

Epsilon

The shape parameter to control the amount of robustness

loss

Loss

The loss function to be optimized

saveCoefficientsPath

Path to Save Coefficients

Path to Save Coefficients and Intercept as CSV

gridSearch

Grid Search

regParamGrid

Regularization Param Grid Search

Regularization Parameters for Grid Search

elasticNetGrid

ElasticNet Param Grid Search

ElasticNet Parameters for Grid Search

maxIterGrid

MaxIter Param Grid Search

Maximum iteration Parameters for Grid Search

Details

The interface for working with linear regression models and model summaries is similar to the logistic regression case.

When fitting LinearRegressionModel without intercept on dataset with constant nonzero column by “l-bfgs” solver, Spark MLlib outputs zero coefficients for constant nonzero columns. This behavior is the same as R glmnet but different from LIBSVM.

More details are available at : http://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression

Examples

Below example is available at : https://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression

import org.apache.spark.ml.regression.LinearRegression

// Load training data

val training = spark.read.format(“libsvm”)

.load(“data/mllib/sample_linear_regression_data.txt”)

val lr = new LinearRegression()

.setMaxIter(10)

.setRegParam(0.3)

.setElasticNetParam(0.8)

// Fit the model

val lrModel = lr.fit(training)

// Print the coefficients and intercept for linear regression

println(s”Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}”)

// Summarize the model over the training set and print out some metrics

val trainingSummary = lrModel.summary

println(s”numIterations: ${trainingSummary.totalIterations}”)

println(s”objectiveHistory: [${trainingSummary.objectiveHistory.mkString(“,”)}]”)

trainingSummary.residuals.show()

println(s”RMSE: ${trainingSummary.rootMeanSquaredError}”)

println(s”r2: ${trainingSummary.r2}”)