H2O Word to Vec¶

The Word2vec algorithm takes a text corpus as an input and produces the word vectors as output.

Input¶

It takes in a DataFrame as input

Type¶

ml-estimator

Class¶

fire.nodes.h2o.NodeH2OWord2vec

Fields¶

Name	Title	Description
inputCol	Input Column	Input column name.
vecSize	Vec Size	Set size of word vectors.
windowSize	Window Size	Set max skip length between words.
sentSampleRate	Sent Sample Rate	Set the threshold for the occurrence of words. Those words that appear with higher frequency in the training data will be randomly down-sampled. An ideal range for this option 0, 1e-5.
normModel	Normalization Model	Use Hierarchical Softmax.
epochs	Epochs	Number of training iterations to run.
minWordFreq	Min Word Frequency	SThis will discard words that appear less than <int> times.
initLearningRate	Init Learning Rate	Set the starting learning rate.
wordModel	Word Model	Uhe word model to use (SkipGram or CBOW).
maxRuntimeSecs	Max Runtime Secs	his argument specifies the maximum time that the AutoML process will run for. If both max_runtime_secs and max_models are specified, then the AutoML run will stop as soon as it hits either of these limits. If neither max_runtime_secs nor max_models are specified, then max_runtime_secs defaults to 3600 seconds (1 hour).
columnsToCategorical	Columns to Categorical	Columns to be Categorical encoded

Details¶

The Word2vec algorithm takes a text corpus as an input and produces the word vectors as output. The algorithm first creates a vocabulary from the training text data and then learns vector representations of the words.

More details are available at : http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/word2vec.html#