Text Analysis¶
Input¶
It takes directory or path as an input
Output¶
Outputs a Dataframe with 2 columns speaker and dialouge
Type¶
pyspark
Class¶
fire.nodes.gai.NodeTextAnalysis
Fields¶
Name |
Title |
Description |
|---|---|---|
isWordCloud |
Word Cloud |
Render Word cloud chart |
selectedAnalysis |
Analysis Type |
Analysis Type |
openai |
OpenAI |
|
llmConnection |
Select Connection |
Select Connection |
openaiModel |
OpenAI Model |
OpenAI Model to be Used |
Details¶
Text Analysis Node Details¶
The Text Analysis node processes text data to perform various types of analysis, such as tone, emotion, sentiment, or slang analysis, using an OpenAI model. It takes a directory or file path as input and generates a structured DataFrame output. Optionally, it can render a word cloud chart to visualize the text data. This node is designed for PySpark-based workflows, making it suitable for advanced text analysis in data pipelines.
General:¶
Word Cloud: Controls whether a word cloud chart is rendered to visualize the text data. Options are:
true: Generates a word cloud chart based on the input text (default).
false: Does not generate a word cloud chart.
Analysis Type: Specifies the type of text analysis to perform. This field is required. Options include:
TONE ANALYSIS: Analyzes the tone of the text (e.g., formal, informal, positive, negative).
EMOTION ANALYSIS: Identifies emotions expressed in the text (e.g., joy, anger, sadness).
SENTIMENT ANALYSIS: Determines the sentiment of the text (e.g., positive, negative, neutral).
SLANG ANALYSIS: Detects and analyzes slang or informal language in the text.
OpenAI Configuration:¶
Select Connection: Specifies the connection details for the OpenAI API (e.g., API key). This is required to authenticate and access the OpenAI model.
OpenAI Model: Specifies the OpenAI model to use for text analysis. Default is ‘gpt-4o’. This field is required, and other compatible models can be specified if supported by the OpenAI API.
Output:¶
The node outputs a DataFrame with two columns:
speaker: The identifier for the text source (e.g., speaker label or file name, depending on input).
dialogue: The result of the specified analysis (e.g., tone, emotion, sentiment, or slang details).
If Word Cloud is set to true, a word cloud chart is also generated to visualize the frequency or significance of words in the input text.
Examples¶
Example: Text Analysis Node¶
Input:¶
A text file is located at:
/data/text/conversation.txt (containing a dialogue: “I’m so excited about the new project! It’s going to be awesome!”)
The Text Analysis node is configured as follows:
Word Cloud: true
Analysis Type: SENTIMENT ANALYSIS
Select Connection: Configured with a valid OpenAI API key
OpenAI Model: gpt-4o
Output:¶
The node processes the text file and produces a DataFrame with the following structure:
speaker | dialogue
----------------|----------------------------
conversation.txt| Positive sentiment detected
Additionally, a word cloud chart is generated, highlighting words like “excited,” “new,” “project,” and “awesome” based on their frequency and significance.
Explanation:¶
The conversation.txt file is processed using the OpenAI gpt-4o model for SENTIMENT ANALYSIS.
The Analysis Type is set to SENTIMENT ANALYSIS, so the node evaluates the text and determines it has a positive sentiment, which is output in the dialogue column.
The speaker column contains the file name (conversation.txt) as the identifier for the text source.
With Word Cloud set to true, a word cloud chart is generated, visually representing the most prominent words in the input text.
If Analysis Type was set to TONE ANALYSIS, the output might describe the tone (e.g., “Enthusiastic tone detected”).
If Word Cloud was set to false, no word cloud chart would be generated.