Sklearn TF-IDF Vectorizer =========== Applies scikit-learn's TfidfVectorizer to a text column. Converts text documents into TF-IDF feature vectors and stores them as an ARRAY column in the Spark DataFrame. Input -------------- Takes a DataFrame with at least one text column. Output -------------- Adds a new column containing TF-IDF vectors as ARRAY and passes the DataFrame to downstream nodes. Type --------- transform Class --------- fire.nodes.sklearn.preprocessing.NodeTFIDFVectorizerFitTransform Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - columnToVectorize - Text Column to Vectorize - Name of the text column on which TF-IDF should be computed. * - outputCol - Output Column Name - Name of the output column that will store TF-IDF vectors as ARRAY. If left empty, defaults to 'tfidf_'. * - max_df - Max Document Frequency (max_df) - Ignore terms that appear in more than this proportion of documents. For example, 0.9 drops terms appearing in more than 90% of documents. * - min_df - Min Document Frequency (min_df) - Ignore terms that appear in fewer than this number of documents. For example, 2 keeps only terms that appear in at least 2 documents.