ALS

Alternating Least Squares (ALS) matrix factorization.

Input

It takes in a DataFrame as input and performs ALS

Output

It generates the ALSModel and passes it to the next Predict and ModelSave Nodes. It also passes the incoming DataFrame to the next Nodes

Type

ml-estimator

Class

fire.nodes.ml.NodeALS

Fields

Name

Title

Description

userCol

User Column

The column name for user ids.

itemCol

Item Column

The column name for item ids.

ratingCol

Rating Column

The column name for ratings.

predictionCol

Prediction Column

The prediction column created during model scoring

maxIter

Max iterations

The maximum number of iterations.

regParam

Regularization Param

The regularization parameter.(>=0)

alpha

Alpha

The alpha parameter in the implicit preference formulation.(>=0)

checkpointInterval

Checkpoint Interval

The checkpoint interval.

nonnegative

Non negative

Whether to apply nonnegativity constraints.

numItemBlocks

Num Item Blocks

The number of item blocks.

numUserBlocks

Num User Blocks

The number of user blocks.

rank

Rank

The rank of the matrix factorization.

seed

Seed

Random Seed.

implicitPrefs

Implicit Prefs

whether to use implicit preference

Details

Collaborative filtering is commonly used for recommender systems. These techniques aim to fill in the missing entries of a user-item association matrix. spark.ml currently supports model-based collaborative filtering, in which users and products are described by a small set of latent factors that can be used to predict missing entries. spark.ml uses the alternating least squares (ALS) algorithm to learn these latent factors. The implementation in spark.ml has the following parameters:

  • numBlocks is the number of blocks the users and items will be partitioned into in order to parallelize computation (defaults to 10).

  • rank is the number of latent factors in the model (defaults to 10).

  • maxIter is the maximum number of iterations to run (defaults to 10).

  • regParam specifies the regularization parameter in ALS (defaults to 1.0).

  • implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data (defaults to false which means using explicit feedback).

  • alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations (defaults to 1.0).

  • nonnegative specifies whether or not to use nonnegative constraints for least squares (defaults to false).

More details are available at Apache Spark ML docs page:

http://spark.apache.org/docs/latest/ml-collaborative-filtering.html

Examples

Below example is available at : https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples

import org.apache.spark.mllib.recommendation.ALS

import org.apache.spark.mllib.recommendation.MatrixFactorizationModel

import org.apache.spark.mllib.recommendation.Rating

// Load and parse the data

val data = sc.textFile(“data/mllib/als/test.data”)

val ratings = data.map(_.split(‘,’) match { case Array(user, item, rate) =>

Rating(user.toInt, item.toInt, rate.toDouble)

})

// Build the recommendation model using ALS

val rank = 10

val numIterations = 10

val model = ALS.train(ratings, rank, numIterations, 0.01)

// Evaluate the model on rating data

val usersProducts = ratings.map { case Rating(user, product, rate) =>

(user, product)

}

val predictions =

model.predict(usersProducts).map { case Rating(user, product, rate) =>

((user, product), rate)

}

val ratesAndPreds = ratings.map { case Rating(user, product, rate) =>

((user, product), rate)

}.join(predictions)

val MSE = ratesAndPreds.map { case ((user, product), (r1, r2)) =>

val err = (r1 - r2)

err * err

}.mean()

println(s”Mean Squared Error = $MSE”)

// Save and load model

model.save(sc, “target/tmp/myCollaborativeFilter”)

val sameModel = MatrixFactorizationModel.load(sc, “target/tmp/myCollaborativeFilter”)