Multi LLM Query

The Multi LLM Query node is designed to query multiple large language models (LLMs) from providers such as OpenAI, Bedrock, and Gemini, using a DataFrame as input. It processes user queries, text content, and/or base64-encoded images to generate responses based on the selected model and task, producing a structured DataFrame output.

Input

It may take in Dataframe as an input

Output

Returns response as Dataframe

Type

pyspark

Class

fire.nodes.gai.NodeMultiLLMQuery

Fields

Name

Title

Description

llmConnection

Select Connection

Select Connection

temperature

Temperature

Temperature setting for the model (default: 0).

contentCol

Content Column

Column name for the text content.

imageCol

Image Column

Column name for the base 64 image.

inputMode

Mode Selection

Select the model to use (text, image, text+image).

Prompt

Prompt

task

Select Prompt

Specify the task to perform: summary, translation, topic extraction, or other.

customPrompt

Prompt

Custom prompt to override the default instructions.

userQueryCol

User Query Column

Column name for user query, (if the query is in a column)

Advanced

Advanced

aggregateMode

Aggregate Response

numPartitions

Number of Partitions

Number of Partitions

fileNameCol

File Name Column

Select File Name Column

pageNumberCol

Page Number Column

Select Page Number column.

timeout

Timeout (seconds)

Maximum time to wait for Openai and Gemini API response

thinkingBudget

Thinking Budget

Configure the Gemini thinking budget by specifying the number of tokens to allocate for thinking. For Flash and Flash Lite models, values can range from 0 to 24,576 or -1 for dynamic thinking. For 2.5 Pro model, values must be between 1 and 24,576; setting 0 is not allowed.

Details

Multi LLM Query Node Details

The Multi LLM Query node is designed to query multiple large language models (LLMs) from providers such as OpenAI, Bedrock, and Gemini, using a DataFrame as input. It processes user queries, text content, and/or base64-encoded images to generate responses based on the selected connection and task, producing a structured DataFrame output.

General:

Select Task:

Specifies the task to perform. Options include:

  • summary: Generates a summary of the content in bullet points.

  • translation: Translates the content to English.

  • topic_extraction: Extracts key topics from the content.

  • other: Allows for a custom task defined by the user.

Prompt:

Allows users to provide a custom prompt / instructions for the selected task.

Content Column:

Specifies the DataFrame column containing the text content to be processed. Required for text or text+image modes.

Select Connection:

Specifies the connection details for the selected LLM provider (e.g., API keys for OpenAI/Gemini, AWS credentials for Bedrock). Required to authenticate and access the respective model.

Temperature:

Controls the randomness of the LLM’s output. Default is 0.7. Higher values increase creativity, while lower values ensure more deterministic responses.

Image Column:

Specifies the DataFrame column containing base64-encoded images. Required for image or text+image modes.

Mode Selection:

Determines the input mode for the LLM. Options are:

  • text: Processes text-only input from the content column or custom prompt.

  • image: Processes base64-encoded images from the image column.

  • text+image: Processes both text and base64-encoded images.

Timeout (seconds):

Specifies the maximum time (in seconds) to wait for the model response. Visible when OpenAI or Gemini is selected.

Thinking Budget:

Controls the computational budget (e.g., steps or tokens) for Gemini models. Only visible when Gemini is selected.

Advanced:

Aggregate Response:

Specifies how to aggregate input data before processing. Options are:

  • none: Processes each row individually, retaining fileName and pageNumber (if provided).

  • all: Aggregates all rows into a single response.

  • perfile: Aggregates rows by fileName, producing one response per file.

Number of Partitions:

Specifies the number of Spark partitions for distributed processing. Default is 3.

File Name Column:

Specifies the DataFrame column containing file names. Required for perfile aggregation mode.

Page Number Column:

Specifies the DataFrame column containing page numbers (e.g., for PDFs). Optional, used for row-wise processing with none aggregation mode.

Output:

The node outputs a DataFrame with columns based on the aggregation mode:

  • none: Includes fileName (if provided), pageNumber (if provided), and response.

  • perfile: Includes fileName and response.

  • all: Includes only the response column.

The response column contains the LLM-generated text or error messages if the API call fails.

Examples

Multi LLM Query Node Examples

Input:

A DataFrame contains the following data:

  • fileName: [“doc1.pdf”, “doc1.pdf”, “doc2.pdf”]

  • pageNumber: [“1”, “2”, null]

  • content: [“Article about climate change…”, “Climate change impacts…”, “Renewable energy report…”]

  • imageBase64: [null, “iVBORw0KGgoAAAANSUhEUg…”, null]

The Multi LLM Query node is configured as follows:

  • Select Task: summary

  • Prompt: “Summarize the content in bullet points.”

  • Content Column: content

  • Select Connection: Configured with valid OpenAI API key

  • Temperature: 0.7

  • Timeout (seconds): 90

  • Image Column: imageBase64

  • Mode Selection: text+image

  • Aggregate Response: perfile

  • Number of Partitions: 3

  • File Name Column: fileName

  • Page Number Column: pageNumber

Output:

The node processes the DataFrame and produces a DataFrame with the following structure:

  • fileName: doc1.pdf

response:

  • Climate change effects on ecosystems

  • Rising temperatures

  • fileName: doc2.pdf

response:

  • Renewable energy advancements

  • Solar and wind adoption