Read Faiss DB =========== Read Vector Embeddings, from faiss db Input -------------- It takes in a DataFrame as input Type --------- pyspark Class --------- fire.nodes.gai.NodeReadFromFaissDB Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - faissIndexDir - Path Of FAISS Index Directory - Enter FAISS Index Directory path. * - topK - Top K - Consider the top k(3) probable words at each step during text generation. * - faissIndexName - Name Of FAISS Index - Enter FAISS Index Name. Details ------- Read Faiss DB Node Details +++++++++++++++ The Read Faiss DB node retrieves vector embeddings from a FAISS vector database based on a user query or query embeddings provided in a DataFrame. It performs a similarity search to find the most relevant documents and returns the results as a DataFrame with columns for the user query and corresponding content. This node is designed for PySpark-based workflows, enabling efficient retrieval of vector-based data for similarity search applications. General: +++++++++++++++ Path Of FAISS Index Directory: Specifies the directory path (local or distributed filesystem) where the FAISS index is stored. This is required and must point to a valid directory containing the FAISS index. Top K: Specifies the number of top results to retrieve from the FAISS index based on similarity. Default is 3. Must be a positive integer. Name Of FAISS Index: Specifies the name of the FAISS index to query. This is required and must match an existing index in the specified directory. Output: +++++++++++++++ The node outputs a DataFrame with the following columns: * userquery: The original query from the input DataFrame or derived from the query embeddings. * content: The content of the top matching documents retrieved from the FAISS index, based on similarity to the query. Examples ------- Example: Read Faiss DB Node +++++++++++++++ Input: +++++++++++++++ A DataFrame contains the following data: * userQuery: ["What is climate change?", "AI advancements in 2025"] * embeddings: [[0.12, 0.45, ...], [0.23, 0.67, ...]] (1024-dimensional vectors) The Read Faiss DB node is configured as follows: * Path Of FAISS Index Directory: /data/faiss_indices/ * Top K: 3 * Name Of FAISS Index: faiss_index Output: +++++++++++++++ The node queries the FAISS database and produces a DataFrame with the following structure: :: userquery | content -----------------------------|-------------------------------------- What is climate change? | Climate change refers to long-term shifts in weather patterns... AI advancements in 2025 | Recent AI advancements include improved neural networks... Explanation: +++++++++++++++ * The node processes the DataFrame, using the userQuery and embeddings columns to perform a similarity search in the FAISS index named 'faiss_index' located in the '/data/faiss_indices/' directory. * The Top K setting of 3 ensures that up to three matching documents are retrieved for each query based on similarity to the provided embeddings. * The userquery column in the output DataFrame contains the original text queries from the input DataFrame for reference. * The content column contains the text of the most relevant documents retrieved from the FAISS index. * The Path Of FAISS Index Directory and Name Of FAISS Index settings ensure the node queries the correct index in the specified location. * If the input DataFrame only provided text queries without embeddings, the node would rely on the FAISS service to generate embeddings internally (if supported by the configuration).