Read Pinecone DB¶

Read Vector Embeddings from Pinecone db

Input¶

It takes in a DataFrame as input

Type¶

pyspark

Class¶

fire.nodes.gai.NodeReadFromPineconeDB

Fields¶

Name	Title	Description
topK	Top K	Number of top results to retrieve (default: 3).
pineconeConnection	Select Pinecone Connection	Select Pinecone Connection
pineconeIndexName	Pinecone Index Name	Name of the Pinecone index.
indexNameSpace	Index Namespace	Namespace for organizing the Pinecone index.
userQueryCol	User Query Column	Column name for user query (default: ‘userQuery’).
queryEmbeddingCol	Query Embedding Column	Column name for query embeddings (default: ‘embeddings’).

Details¶

Read Pinecone DB Node Details¶

The Read Pinecone DB node retrieves vector embeddings from a Pinecone vector database based on a user query or query embeddings provided in a DataFrame. It performs a similarity search to find the most relevant documents and returns the results as a DataFrame with columns for the user query and corresponding content. This node is designed for PySpark-based workflows, enabling efficient retrieval of vector-based data for similarity search applications.

General:¶

Top K: Specifies the number of top results to retrieve from the Pinecone index based on similarity. Default is 3. Must be a positive integer.

Select Pinecone Connection: Specifies the connection details for the Pinecone API (e.g., API key, environment). This is required to authenticate and access the Pinecone service.

Pinecone Index Name: Specifies the name of the Pinecone index to query. This is required and must match an existing index in the Pinecone database.

Index Namespace: Specifies the namespace within the Pinecone index to query. This is optional; if provided, it narrows the search to the specified namespace.

User Query Column: Specifies the DataFrame column containing the user query (text input for similarity search). Default is ‘userQuery’. This is required if querying with text input.

Query Embedding Column: Specifies the DataFrame column containing pre-computed query embeddings (vector representations). Default is ‘embeddings’. This is required if querying with embeddings instead of text.

Output:¶

The node outputs a DataFrame with the following columns:

userquery: The original query from the User Query Column or derived from the query embeddings.
content: The content of the top matching documents retrieved from the Pinecone index, based on similarity to the query.

Examples¶

Example: Read Pinecone DB Node¶

Input:¶

A DataFrame contains the following data:

userQuery: [“What is climate change?”, “AI advancements in 2025”]
embeddings: [[0.12, 0.45, …], [0.23, 0.67, …]] (1024-dimensional vectors)

The Read Pinecone DB node is configured as follows:

Top K: 3
Select Pinecone Connection: Configured with a valid Pinecone API key and environment
Pinecone Index Name: document-index
Index Namespace: document-namespace
User Query Column: userQuery
Query Embedding Column: embeddings

Output:¶

The node queries the Pinecone database and produces a DataFrame with the following structure:

userquery                    | content
-----------------------------|--------------------------------------
What is climate change?      | Climate change refers to long-term shifts in weather patterns...
AI advancements in 2025      | Recent AI advancements include improved neural networks...

Explanation:¶

The node processes the DataFrame, using the userQuery and embeddings columns to perform a similarity search in the Pinecone index ‘document-index’ under the ‘document-namespace’ namespace.
The Top K setting of 3 ensures that up to three matching documents are retrieved for each query based on similarity to the provided embeddings.
The User Query Column (‘userQuery’) provides the text query, which is included in the output DataFrame for reference.
The Query Embedding Column (‘embeddings’) supplies the vector representations used for the similarity search in Pinecone.
The content column contains the text of the most relevant documents retrieved from the Pinecone index.
If Index Namespace was left empty, the search would encompass the entire index without namespace filtering.
If only text queries were provided without embeddings, the node would rely on the Pinecone service to generate embeddings internally (if supported by the connection).