Linear Regression¶
The interface for working with linear regression models and model summaries is similar to the logistic regression case.
Input¶
This takes in a DataFrame and performs Logistic Regression
Output¶
It generates the LinearRegressionModel and passes it to the next Predict and ModelSave Nodes. The input DataFrame is also passed along to the next nodes.
Type¶
ml-estimator
Class¶
fire.nodes.ml.NodeLinearRegression
Fields¶
Name |
Title |
Description |
|---|---|---|
modelIdentifier |
Model Identifier |
modelIdentifier starts with $loop & columns names separated with underscore. Example: $loop_columnName1_columnName2. |
splitRatio |
Split Ratio |
Split Ratio |
featuresCol |
Features Column |
Features column of type vectorUDT for model fitting |
labelCol |
Label Column |
The label column for model fitting |
predictionCol |
Prediction Column |
The prediction column created during model scoring |
fitIntercept |
Fit Intercept |
Whether to fit an intercept term |
maxIter |
Maximum Iterations |
Maximum number of iterations (>= 0) |
regParam |
Regularization Param |
The regularization parameter |
elasticNetParam |
ElasticNet Param |
The ElasticNet mixing parameter. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty |
solver |
Solver |
The solver algorithm for optimization |
standardization |
Standardization |
Whether to standardize the training features before fitting the model |
tol |
Tolerance |
The convergence tolerance for iterative algorithms |
weightCol |
Weight Column |
If the ‘weight column’ is not specified, all instances are treated equally with a weight 1.0 |
aggregationDepth |
Aggregation Depth |
depth for treeAggregate |
epsilon |
Epsilon |
The shape parameter to control the amount of robustness |
loss |
Loss |
The loss function to be optimized |
saveCoefficientsPath |
Path to Save Coefficients |
Path to Save Coefficients and Intercept as CSV |
gridSearch |
Grid Search |
|
regParamGrid |
Regularization Param Grid Search |
Regularization Parameters for Grid Search |
elasticNetGrid |
ElasticNet Param Grid Search |
ElasticNet Parameters for Grid Search |
maxIterGrid |
MaxIter Param Grid Search |
Maximum iteration Parameters for Grid Search |
Details¶
The interface for working with linear regression models and model summaries is similar to the logistic regression case.
When fitting LinearRegressionModel without intercept on dataset with constant nonzero column by “l-bfgs” solver, Spark MLlib outputs zero coefficients for constant nonzero columns. This behavior is the same as R glmnet but different from LIBSVM.
More details are available at : http://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression
Examples¶
Below example is available at : https://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression
import org.apache.spark.ml.regression.LinearRegression
// Load training data
val training = spark.read.format(“libsvm”)
.load(“data/mllib/sample_linear_regression_data.txt”)
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
// Print the coefficients and intercept for linear regression
println(s”Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}”)
// Summarize the model over the training set and print out some metrics
val trainingSummary = lrModel.summary
println(s”numIterations: ${trainingSummary.totalIterations}”)
println(s”objectiveHistory: [${trainingSummary.objectiveHistory.mkString(“,”)}]”)
trainingSummary.residuals.show()
println(s”RMSE: ${trainingSummary.rootMeanSquaredError}”)
println(s”r2: ${trainingSummary.r2}”)