Read Pinecone DB =========== Read Vector Embeddings from Pinecone db Input -------------- It takes in a DataFrame as input Type --------- pyspark Class --------- fire.nodes.gai.NodeReadFromPineconeDB Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - topK - Top K - Number of top results to retrieve (default: 3). * - pineconeConnection - Select Pinecone Connection - Select Pinecone Connection * - pineconeIndexName - Pinecone Index Name - Name of the Pinecone index. * - indexNameSpace - Index Namespace - Namespace for organizing the Pinecone index. * - userQueryCol - User Query Column - Column name for user query (default: 'userQuery'). * - queryEmbeddingCol - Query Embedding Column - Column name for query embeddings (default: 'embeddings'). Details ------- Read Pinecone DB Node Details +++++++++++++++ The Read Pinecone DB node retrieves vector embeddings from a Pinecone vector database based on a user query or query embeddings provided in a DataFrame. It performs a similarity search to find the most relevant documents and returns the results as a DataFrame with columns for the user query and corresponding content. This node is designed for PySpark-based workflows, enabling efficient retrieval of vector-based data for similarity search applications. General: +++++++++++++++ Top K: Specifies the number of top results to retrieve from the Pinecone index based on similarity. Default is 3. Must be a positive integer. Select Pinecone Connection: Specifies the connection details for the Pinecone API (e.g., API key, environment). This is required to authenticate and access the Pinecone service. Pinecone Index Name: Specifies the name of the Pinecone index to query. This is required and must match an existing index in the Pinecone database. Index Namespace: Specifies the namespace within the Pinecone index to query. This is optional; if provided, it narrows the search to the specified namespace. User Query Column: Specifies the DataFrame column containing the user query (text input for similarity search). Default is 'userQuery'. This is required if querying with text input. Query Embedding Column: Specifies the DataFrame column containing pre-computed query embeddings (vector representations). Default is 'embeddings'. This is required if querying with embeddings instead of text. Output: +++++++++++++++ The node outputs a DataFrame with the following columns: * userquery: The original query from the User Query Column or derived from the query embeddings. * content: The content of the top matching documents retrieved from the Pinecone index, based on similarity to the query. Examples ------- Example: Read Pinecone DB Node +++++++++++++++ Input: +++++++++++++++ A DataFrame contains the following data: * userQuery: ["What is climate change?", "AI advancements in 2025"] * embeddings: [[0.12, 0.45, ...], [0.23, 0.67, ...]] (1024-dimensional vectors) The Read Pinecone DB node is configured as follows: * Top K: 3 * Select Pinecone Connection: Configured with a valid Pinecone API key and environment * Pinecone Index Name: document-index * Index Namespace: document-namespace * User Query Column: userQuery * Query Embedding Column: embeddings Output: +++++++++++++++ The node queries the Pinecone database and produces a DataFrame with the following structure: :: userquery | content -----------------------------|-------------------------------------- What is climate change? | Climate change refers to long-term shifts in weather patterns... AI advancements in 2025 | Recent AI advancements include improved neural networks... Explanation: +++++++++++++++ * The node processes the DataFrame, using the userQuery and embeddings columns to perform a similarity search in the Pinecone index 'document-index' under the 'document-namespace' namespace. * The Top K setting of 3 ensures that up to three matching documents are retrieved for each query based on similarity to the provided embeddings. * The User Query Column ('userQuery') provides the text query, which is included in the output DataFrame for reference. * The Query Embedding Column ('embeddings') supplies the vector representations used for the similarity search in Pinecone. * The content column contains the text of the most relevant documents retrieved from the Pinecone index. * If Index Namespace was left empty, the search would encompass the entire index without namespace filtering. * If only text queries were provided without embeddings, the node would rely on the Pinecone service to generate embeddings internally (if supported by the connection).