H2O Word to Vec

The Word2vec algorithm takes a text corpus as an input and produces the word vectors as output.

Input

It takes in a DataFrame as input

Type

ml-estimator

Class

fire.nodes.h2o.NodeH2OWord2vec

Fields

Name

Title

Description

inputCol

Input Column

Input column name.

vecSize

Vec Size

Set size of word vectors.

windowSize

Window Size

Set max skip length between words.

sentSampleRate

Sent Sample Rate

Set the threshold for the occurrence of words. Those words that appear with higher frequency in the training data will be randomly down-sampled. An ideal range for this option 0, 1e-5.

normModel

Normalization Model

Use Hierarchical Softmax.

epochs

Epochs

Number of training iterations to run.

minWordFreq

Min Word Frequency

SThis will discard words that appear less than <int> times.

initLearningRate

Init Learning Rate

Set the starting learning rate.

wordModel

Word Model

Uhe word model to use (SkipGram or CBOW).

maxRuntimeSecs

Max Runtime Secs

his argument specifies the maximum time that the AutoML process will run for. If both max_runtime_secs and max_models are specified, then the AutoML run will stop as soon as it hits either of these limits. If neither max_runtime_secs nor max_models are specified, then max_runtime_secs defaults to 3600 seconds (1 hour).

columnsToCategorical

Columns to Categorical

Columns to be Categorical encoded

Details

The Word2vec algorithm takes a text corpus as an input and produces the word vectors as output. The algorithm first creates a vocabulary from the training text data and then learns vector representations of the words.

More details are available at : http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/word2vec.html#