ALS¶

Alternating Least Squares (ALS) matrix factorization.

Input¶

It takes in a DataFrame as input and performs ALS

Output¶

It generates the ALSModel and passes it to the next Predict and ModelSave Nodes. It also passes the incoming DataFrame to the next Nodes

Type¶

ml-estimator

Class¶

fire.nodes.ml.NodeALS

Fields¶

Name	Title	Description
userCol	User Column	The column name for user ids.
itemCol	Item Column	The column name for item ids.
ratingCol	Rating Column	The column name for ratings.
predictionCol	Prediction Column	The prediction column created during model scoring
maxIter	Max iterations	The maximum number of iterations.
regParam	Regularization Param	The regularization parameter.(>=0)
alpha	Alpha	The alpha parameter in the implicit preference formulation.(>=0)
checkpointInterval	Checkpoint Interval	The checkpoint interval.
nonnegative	Non negative	Whether to apply nonnegativity constraints.
numItemBlocks	Num Item Blocks	The number of item blocks.
numUserBlocks	Num User Blocks	The number of user blocks.
rank	Rank	The rank of the matrix factorization.
seed	Seed	Random Seed.
implicitPrefs	Implicit Prefs	whether to use implicit preference

Details¶

Collaborative filtering is commonly used for recommender systems. These techniques aim to fill in the missing entries of a user-item association matrix. spark.ml currently supports model-based collaborative filtering, in which users and products are described by a small set of latent factors that can be used to predict missing entries. spark.ml uses the alternating least squares (ALS) algorithm to learn these latent factors. The implementation in spark.ml has the following parameters:

numBlocks is the number of blocks the users and items will be partitioned into in order to parallelize computation (defaults to 10).
rank is the number of latent factors in the model (defaults to 10).
maxIter is the maximum number of iterations to run (defaults to 10).
regParam specifies the regularization parameter in ALS (defaults to 1.0).
implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data (defaults to false which means using explicit feedback).
alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations (defaults to 1.0).
nonnegative specifies whether or not to use nonnegative constraints for least squares (defaults to false).

More details are available at Apache Spark ML docs page:

http://spark.apache.org/docs/latest/ml-collaborative-filtering.html

Examples¶

Below example is available at : https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples ¶

import org.apache.spark.mllib.recommendation.ALS

import org.apache.spark.mllib.recommendation.MatrixFactorizationModel

import org.apache.spark.mllib.recommendation.Rating

// Load and parse the data

val data = sc.textFile(“data/mllib/als/test.data”)

val ratings = data.map(_.split(‘,’) match { case Array(user, item, rate) =>

Rating(user.toInt, item.toInt, rate.toDouble)

})

// Build the recommendation model using ALS

val rank = 10

val numIterations = 10

val model = ALS.train(ratings, rank, numIterations, 0.01)

// Evaluate the model on rating data

val usersProducts = ratings.map { case Rating(user, product, rate) =>

(user, product)

}

val predictions =

model.predict(usersProducts).map { case Rating(user, product, rate) =>

((user, product), rate)

}

val ratesAndPreds = ratings.map { case Rating(user, product, rate) =>

((user, product), rate)

}.join(predictions)

val MSE = ratesAndPreds.map { case ((user, product), (r1, r2)) =>

val err = (r1 - r2)

err * err

}.mean()

println(s”Mean Squared Error = $MSE”)

// Save and load model

model.save(sc, “target/tmp/myCollaborativeFilter”)

val sameModel = MatrixFactorizationModel.load(sc, “target/tmp/myCollaborativeFilter”)