Multi LLM Query¶

The Multi LLM Query node is designed to query multiple large language models (LLMs) from providers such as OpenAI, Bedrock, and Gemini, using a DataFrame as input. It processes user queries, text content, and/or base64-encoded images to generate responses based on the selected model and task, producing a structured DataFrame output.

Input¶

It may take in Dataframe as an input

Output¶

Returns response as Dataframe

Type¶

pyspark

Class¶

fire.nodes.gai.NodeMultiLLMQuery

Fields¶

Name	Title	Description
llmConnection	Select Connection	Select Connection
temperature	Temperature	Temperature setting for the model (default: 0).
contentCol	Content Column	Column name for the text content.
imageCol	Image Column	Column name for the base 64 image.
inputMode	Mode Selection	Select the model to use (text, image, text+image).
Prompt	Prompt
task	Select Prompt	Specify the task to perform: summary, translation, topic extraction, or other.
customPrompt	Prompt	Custom prompt to override the default instructions.
userQueryCol	User Query Column	Column name for user query, (if the query is in a column)
Advanced	Advanced
aggregateMode	Aggregate Response
numPartitions	Number of Partitions	Number of Partitions
fileNameCol	File Name Column	Select File Name Column
pageNumberCol	Page Number Column	Select Page Number column.
timeout	Timeout (seconds)	Maximum time to wait for Openai and Gemini API response
thinkingBudget	Thinking Budget	Configure the Gemini thinking budget by specifying the number of tokens to allocate for thinking. For Flash and Flash Lite models, values can range from 0 to 24,576 or -1 for dynamic thinking. For 2.5 Pro model, values must be between 1 and 24,576; setting 0 is not allowed.

Details¶

Multi LLM Query Node Details¶

The Multi LLM Query node is designed to query multiple large language models (LLMs) from providers such as OpenAI, Bedrock, and Gemini, using a DataFrame as input. It processes user queries, text content, and/or base64-encoded images to generate responses based on the selected connection and task, producing a structured DataFrame output.

General:¶

Select Task:¶

Specifies the task to perform. Options include:

summary: Generates a summary of the content in bullet points.
translation: Translates the content to English.
topic_extraction: Extracts key topics from the content.
other: Allows for a custom task defined by the user.

Prompt:¶

Allows users to provide a custom prompt / instructions for the selected task.

Content Column:¶

Specifies the DataFrame column containing the text content to be processed. Required for text or text+image modes.

Select Connection:¶

Specifies the connection details for the selected LLM provider (e.g., API keys for OpenAI/Gemini, AWS credentials for Bedrock). Required to authenticate and access the respective model.

Temperature:¶

Controls the randomness of the LLM’s output. Default is 0.7. Higher values increase creativity, while lower values ensure more deterministic responses.

Image Column:¶

Specifies the DataFrame column containing base64-encoded images. Required for image or text+image modes.

Mode Selection:¶

Determines the input mode for the LLM. Options are:

text: Processes text-only input from the content column or custom prompt.
image: Processes base64-encoded images from the image column.
text+image: Processes both text and base64-encoded images.

Timeout (seconds):¶

Specifies the maximum time (in seconds) to wait for the model response. Visible when OpenAI or Gemini is selected.

Thinking Budget:¶

Controls the computational budget (e.g., steps or tokens) for Gemini models. Only visible when Gemini is selected.

Advanced:¶

Aggregate Response:¶

Specifies how to aggregate input data before processing. Options are:

none: Processes each row individually, retaining fileName and pageNumber (if provided).
all: Aggregates all rows into a single response.
perfile: Aggregates rows by fileName, producing one response per file.

Number of Partitions:¶

Specifies the number of Spark partitions for distributed processing. Default is 3.

File Name Column:¶

Specifies the DataFrame column containing file names. Required for perfile aggregation mode.

Page Number Column:¶

Specifies the DataFrame column containing page numbers (e.g., for PDFs). Optional, used for row-wise processing with none aggregation mode.

Output:¶

The node outputs a DataFrame with columns based on the aggregation mode:

none: Includes fileName (if provided), pageNumber (if provided), and response.
perfile: Includes fileName and response.
all: Includes only the response column.

The response column contains the LLM-generated text or error messages if the API call fails.

Examples¶

Multi LLM Query Node Examples¶

Input:¶

A DataFrame contains the following data:

fileName: [“doc1.pdf”, “doc1.pdf”, “doc2.pdf”]
pageNumber: [“1”, “2”, null]
content: [“Article about climate change…”, “Climate change impacts…”, “Renewable energy report…”]
imageBase64: [null, “iVBORw0KGgoAAAANSUhEUg…”, null]

The Multi LLM Query node is configured as follows:

Select Task: summary
Prompt: “Summarize the content in bullet points.”
Content Column: content
Select Connection: Configured with valid OpenAI API key
Temperature: 0.7
Timeout (seconds): 90
Image Column: imageBase64
Mode Selection: text+image
Aggregate Response: perfile
Number of Partitions: 3
File Name Column: fileName
Page Number Column: pageNumber

Output:¶

The node processes the DataFrame and produces a DataFrame with the following structure:

fileName: doc1.pdf

response:

Climate change effects on ecosystems
Rising temperatures
fileName: doc2.pdf

response:

Renewable energy advancements
Solar and wind adoption