Word2 Vec
===========

Transforms vectors of words into vectors of numeric codes for the purpose of further processing by NLP or machine learning algorithms.

Input
--------------
It takes in a DataFrame as input and transforms it to another DataFrame

Output
--------------
A new column containing feature vector is added to the incoming DataFrame

Type
--------- 

ml-transformer

Class
--------- 

fire.nodes.ml.NodeWord2Vec

Fields
--------- 

.. list-table::
      :widths: 10 5 10
      :header-rows: 1

      * - Name
        - Title
        - Description
      * - inputCol
        - Input Column
        - Contains sequences of words
      * - inputColStringArrCol
        - Text Array Column
        - The text array column which is produced
      * - outputCol
        - Output Column
        - Output column name
      * - vectorSize
        - Vector Size
        - Vector Size
      * - minCount
        - Min Count
        - Min Count
      * - numPartitions
        - Num Partitions
        - Number of partitions for sentences of words
      * - windowSize
        - Window Size
        - The window size (context words from [-window, window])
      * - maxSentenceLength
        - Max Sentence Length
        - Sets the maximum length (in words) of each sentence in the input data. Any sentence longer than this threshold will be divided into chunks of up to maxSentenceLength


Details
-------
Word2Vec Node Details
+++++++++++++++


The Word2Vec Node is an Estimator which takes sequences of words representing documents and trains a Word2VecModel. The model maps each word to a unique fixed-size vector. The Word2VecModel transforms each document into a vector using the average of all words in the document; this vector can then be used for as features for prediction, document similarity calculations, etc.


The word2vec algorithm uses a model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence.


Input Parameters
+++++++++++++++


* OUTPUT STORAGE LEVEL : Keep this as DEFAULT.
* INPUT COLUMN : The name of the field in the input schema that contains the features to build the model from.
* TEXT ARRAY COLUMN : Set a name for the output transformed array column.
* OUTPUT COLUMN : Set a name for the output vector column.
* VECTOR SIZE : The dimension of the code that you want to transform from words. Defaults to 3.
* MIN COUNT : The minimum number of times a token must appear to be included in the word2vec model's vocabulary.
* NUM PARTITIONS : Number of partitions for sentences of words.
* WINDOW SIZE  : Hyperparameter used in training process. Defaults to 5.
* MAX SENTENCE LENGTH : Sets the maximum length (in words) of each sentence in the input data.


Examples
-------
Word2Vec Node Example
+++++++++++++++


Consider the below **Word2Vec** output's for the input string **Title** column.


* Title : 'Learning Sparkflows'
* textarr : (Learning, Sparkflows)
* Word2Vec : [-0.12955643981695175,0.0201990008354187,0.13060205057263374]