Text Analysis =========== Input -------------- It takes directory or path as an input Output -------------- Outputs a Dataframe with 2 columns speaker and dialouge Type --------- pyspark Class --------- fire.nodes.gai.NodeTextAnalysis Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - isWordCloud - Word Cloud - Render Word cloud chart * - selectedAnalysis - Analysis Type - Analysis Type * - openai - OpenAI - * - llmConnection - Select Connection - Select Connection * - openaiModel - OpenAI Model - OpenAI Model to be Used Details ------- Text Analysis Node Details +++++++++++++++ The Text Analysis node processes text data to perform various types of analysis, such as tone, emotion, sentiment, or slang analysis, using an OpenAI model. It takes a directory or file path as input and generates a structured DataFrame output. Optionally, it can render a word cloud chart to visualize the text data. This node is designed for PySpark-based workflows, making it suitable for advanced text analysis in data pipelines. General: +++++++++++++++ Word Cloud: Controls whether a word cloud chart is rendered to visualize the text data. Options are: * true: Generates a word cloud chart based on the input text (default). * false: Does not generate a word cloud chart. Analysis Type: Specifies the type of text analysis to perform. This field is required. Options include: * TONE ANALYSIS: Analyzes the tone of the text (e.g., formal, informal, positive, negative). * EMOTION ANALYSIS: Identifies emotions expressed in the text (e.g., joy, anger, sadness). * SENTIMENT ANALYSIS: Determines the sentiment of the text (e.g., positive, negative, neutral). * SLANG ANALYSIS: Detects and analyzes slang or informal language in the text. OpenAI Configuration: +++++++++++++++ Select Connection: Specifies the connection details for the OpenAI API (e.g., API key). This is required to authenticate and access the OpenAI model. OpenAI Model: Specifies the OpenAI model to use for text analysis. Default is 'gpt-4o'. This field is required, and other compatible models can be specified if supported by the OpenAI API. Output: +++++++++++++++ The node outputs a DataFrame with two columns: * speaker: The identifier for the text source (e.g., speaker label or file name, depending on input). * dialogue: The result of the specified analysis (e.g., tone, emotion, sentiment, or slang details). If Word Cloud is set to true, a word cloud chart is also generated to visualize the frequency or significance of words in the input text. Examples ------- Example: Text Analysis Node +++++++++++++++ Input: +++++++++++++++ A text file is located at: * /data/text/conversation.txt (containing a dialogue: "I'm so excited about the new project! It's going to be awesome!") The Text Analysis node is configured as follows: * Word Cloud: true * Analysis Type: SENTIMENT ANALYSIS * Select Connection: Configured with a valid OpenAI API key * OpenAI Model: gpt-4o Output: +++++++++++++++ The node processes the text file and produces a DataFrame with the following structure: :: speaker | dialogue ----------------|---------------------------- conversation.txt| Positive sentiment detected Additionally, a word cloud chart is generated, highlighting words like "excited," "new," "project," and "awesome" based on their frequency and significance. Explanation: +++++++++++++++ * The conversation.txt file is processed using the OpenAI gpt-4o model for SENTIMENT ANALYSIS. * The Analysis Type is set to SENTIMENT ANALYSIS, so the node evaluates the text and determines it has a positive sentiment, which is output in the dialogue column. * The speaker column contains the file name (conversation.txt) as the identifier for the text source. * With Word Cloud set to true, a word cloud chart is generated, visually representing the most prominent words in the input text. * If Analysis Type was set to TONE ANALYSIS, the output might describe the tone (e.g., "Enthusiastic tone detected"). * If Word Cloud was set to false, no word cloud chart would be generated.