Read Faiss DB

Read Vector Embeddings, from faiss db

Input

It takes in a DataFrame as input

Type

pyspark

Class

fire.nodes.gai.NodeReadFromFaissDB

Fields

Name

Title

Description

faissIndexDir

Path Of FAISS Index Directory

Enter FAISS Index Directory path.

topK

Top K

Consider the top k(3) probable words at each step during text generation.

faissIndexName

Name Of FAISS Index

Enter FAISS Index Name.

Details

Read Faiss DB Node Details

The Read Faiss DB node retrieves vector embeddings from a FAISS vector database based on a user query or query embeddings provided in a DataFrame. It performs a similarity search to find the most relevant documents and returns the results as a DataFrame with columns for the user query and corresponding content. This node is designed for PySpark-based workflows, enabling efficient retrieval of vector-based data for similarity search applications.

General:

Path Of FAISS Index Directory: Specifies the directory path (local or distributed filesystem) where the FAISS index is stored. This is required and must point to a valid directory containing the FAISS index.

Top K: Specifies the number of top results to retrieve from the FAISS index based on similarity. Default is 3. Must be a positive integer.

Name Of FAISS Index: Specifies the name of the FAISS index to query. This is required and must match an existing index in the specified directory.

Output:

The node outputs a DataFrame with the following columns:

  • userquery: The original query from the input DataFrame or derived from the query embeddings.

  • content: The content of the top matching documents retrieved from the FAISS index, based on similarity to the query.

Examples

Example: Read Faiss DB Node

Input:

A DataFrame contains the following data:

  • userQuery: [“What is climate change?”, “AI advancements in 2025”]

  • embeddings: [[0.12, 0.45, …], [0.23, 0.67, …]] (1024-dimensional vectors)

The Read Faiss DB node is configured as follows:

  • Path Of FAISS Index Directory: /data/faiss_indices/

  • Top K: 3

  • Name Of FAISS Index: faiss_index

Output:

The node queries the FAISS database and produces a DataFrame with the following structure:

userquery                    | content
-----------------------------|--------------------------------------
What is climate change?      | Climate change refers to long-term shifts in weather patterns...
AI advancements in 2025      | Recent AI advancements include improved neural networks...

Explanation:

  • The node processes the DataFrame, using the userQuery and embeddings columns to perform a similarity search in the FAISS index named ‘faiss_index’ located in the ‘/data/faiss_indices/’ directory.

  • The Top K setting of 3 ensures that up to three matching documents are retrieved for each query based on similarity to the provided embeddings.

  • The userquery column in the output DataFrame contains the original text queries from the input DataFrame for reference.

  • The content column contains the text of the most relevant documents retrieved from the FAISS index.

  • The Path Of FAISS Index Directory and Name Of FAISS Index settings ensure the node queries the correct index in the specified location.

  • If the input DataFrame only provided text queries without embeddings, the node would rely on the FAISS service to generate embeddings internally (if supported by the configuration).