Audio Diarization =========== This node processes audio files to transcribe and diarize speech, identifying different speakers in the audio. It generates a structured output as a DataFrame with two columns: speaker and dialogue. Input -------------- It takes directory or path as an input Output -------------- Outputs a Dataframe with 2 columns speaker and dialouge Type --------- pyspark Class --------- fire.nodes.gai.NodeSpeechToText Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - audioFilePath - Directory Or File Path - Select a Pdf/Text/Docx File or Directory * - numSpeakers - Number of Speakers - Provide the number of speakers expected in the conversation. * - diarization - Diarization - Diarise the transcription. * - saveOutputPath - Output Save Path - Specify the file path to save the transcription output as a .txt file. * - context - Additional Context - Add any relevant context or details about the conversation to help improve the diarization. * - openai - OpenAI - * - llmConnection - Select Connection - Select Connection * - openaiModel - OpenAI Model - OpenAI Model to be Used Details ------- Audio Diarization Node Details +++++++++++++++ The Audio Diarization node processes audio files to transcribe speech and, optionally, diarize it by identifying different speakers. It uses OpenAI's Whisper model (or other specified models) to generate a structured DataFrame output with two columns: speaker and dialogue. This node is ideal for extracting and organizing spoken content from audio files in PySpark-based data pipelines. General: +++++++++++++++ Directory Or File Path: Specifies the path to a single audio file or a directory containing multiple audio files. This field is required and must be accessible to the PySpark engine. Number of Speakers: Specifies the expected number of speakers in the audio. Default is 1. If set to 1, diarization is not applied, and all dialogue is attributed to a single speaker. Must be an integer. Diarization: Controls whether speaker diarization is performed. Options are: * true: Enables diarization to identify and label different speakers in the audio. * false: Disables diarization, treating all dialogue as coming from a single speaker (default). Output Save Path: Specifies the file path to save the transcription output as a .txt file. This is optional; if provided, the transcribed text is saved to the specified location. Additional Context: Allows users to provide additional context or details about the conversation (e.g., speaker names, accents, or topics) to improve transcription and diarization accuracy. This is optional. OpenAI Configuration: +++++++++++++++ Select Connection: Specifies the connection details for the OpenAI API (e.g., API key). This is required to authenticate and access the OpenAI model. OpenAI Model: Specifies the OpenAI model to use for transcription. Default is 'whisper-1'. Other compatible models can be specified if supported by the OpenAI API. Output: +++++++++++++++ The node outputs a DataFrame with the following columns: * speaker: The identified speaker label (e.g., Speaker_1, Speaker_2, or 'Default' if diarization is disabled). * dialogue: The transcribed text corresponding to the speaker's speech. If the Output Save Path is specified, the transcription is also saved as a .txt file at the provided location. Examples ------- Example: Audio Diarization Node +++++++++++++++ Input: +++++++++++++++ A directory /data/audio/ contains the following file: * meeting_recording.wav (a 5-minute audio file with two speakers discussing a project) The Audio Diarization node is configured as follows: * Directory Or File Path: /data/audio/meeting_recording.wav * Number of Speakers: 2 * Diarization: true * Output Save Path: /data/output/transcription.txt * Additional Context: "Conversation between a project manager and a developer discussing project milestones." * Select Connection: Configured with a valid OpenAI API key * OpenAI Model: whisper-1 Output: +++++++++++++++ The node processes the audio file and produces a DataFrame with the following structure: :: speaker | dialogue ------------|-------------------------------------- Speaker_1 | Let's discuss the project timeline... Speaker_2 | Sure, we need to finalize the milestones... Speaker_1 | I think we should prioritize the testing phase... Speaker_2 | Agreed, but we need more resources for that... The transcription is also saved as /data/output/transcription.txt with the content: * Speaker_1: Let's discuss the project timeline... * Speaker_2: Sure, we need to finalize the milestones... * Speaker_1: I think we should prioritize the testing phase... * Speaker_2: Agreed, but we need more resources for that... Explanation: +++++++++++++++ * The meeting_recording.wav file is processed using the OpenAI Whisper-1 model. * With Diarization set to true and Number of Speakers set to 2, the node identifies two distinct speakers and labels them as Speaker_1 and Speaker_2. * The dialogue column contains the transcribed text for each speaker's segment. * The Additional Context ("Conversation between a project manager and a developer...") helps improve the accuracy of transcription and diarization by providing relevant information about the conversation. * The transcription is saved as a .txt file in /data/output/transcription.txt as specified in the Output Save Path. * If Diarization was set to false or Number of Speakers was set to 1, all dialogue would be attributed to a single speaker labeled 'Default'.