Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism
Workflow preview
DISCOUNT 20%
Important notice
This workflow is provided as-is. Please review and test before using in production.
Overview
ποΈ n8n Workflow: Voice Message Transcription with Access Control
This n8n workflow enables automated transcription of voice messages in Telegram groups with built-in access control and intelligent fallback mechanisms. It's designed for teams that need to convert audio messages to text while maintaining security and handling various audio formats.
π Section 1: Trigger & Access Control
β‘ Receive Message (Telegram Trigger)
Purpose: Captures incoming messages from users in your Telegram group.
How it works: When a user sends a message (voice, audio, or text), the workflow is triggered and the sender's information is captured.
Benefit: Serves as the entry point for the entire transcription pipeline.
π Sender Verification
Purpose: Validates whether the sender has permission to use the transcription service.
Logic: Check sender against authorized users list If authorized β Proceed to next step If not authorized β Send "Access denied" message and stop workflow
Benefit: Prevents unauthorized users from consuming AI credits and accessing the service.
π Section 2: Message Type Detection
π΅ Audio/Voice Recognition
Purpose: Identifies the type of incoming message and audio format.
Why it's needed: Telegram handles different audio types with different statuses:
- Voice notes (voice messages)
- Audio files (standard audio attachments)
- Text messages (no audio content)
Process:
- Check if message contains audio/voice content
- If no audio file detected β Send "No audio file found" message
- If audio detected β Assign file ID and proceed to format detection
π§© File Type Determination (IF Node)
Purpose: Identifies the specific audio format for proper processing.
Supported formats:
- OGG (Telegram voice messages)
- MPEG/MP3
- MP4/M4A
- Other audio formats
Logic:
If format recognized β Proceed to transcription If format not recognized β Send "File format not recognized" message
Benefit: Ensures compatibility with transcription services by validating file types upfront.
π Section 3: Primary Transcription (OpenAI)
π₯ File Download
Purpose: Downloads the audio file from Telegram for processing.
π€ OpenAI Transcription
Purpose: Transcribes audio to text using OpenAI's Whisper API.
Why OpenAI: High-quality transcription with cost-effective pricing.
Process:
- Send downloaded file to OpenAI transcription API
- Simultaneously send notification: "Transcription started"
- If successful β Assign transcribed text to variable and proceed
- If error occurs β Trigger fallback mechanism
Benefit: Fast, accurate transcription with multi-language support.
π Section 4: Fallback Transcription (Gemini)
π Gemini Backup Transcription
Purpose: Provides a safety net if OpenAI transcription fails.
Process:
- Receives file only if OpenAI node returns an error
- Downloads and processes the same audio file
- Sends to Google Gemini for transcription
- Assigns transcribed text to the same text variable
Benefit: Ensures high reliabilityβif one service fails, the other takes over automatically.
π Section 5: Message Length Handling
π Text Length Check (IF Node)
Purpose: Determines if the transcribed text exceeds Telegram's character limit.
Logic:
If text β€ 4000 characters β Send directly to Telegram If text > 4000 characters β Split into chunks
Why: Telegram has a 4,000-character limit per message.
βοΈ Text Splitting (Code Node)
Purpose: Breaks long transcriptions into 4,000-character segments.
Process:
- Receives text longer than 4,000 characters
- Splits text into chunks of β€4,000 characters
- Maintains readability by avoiding mid-word breaks
- Outputs array of text chunks
π Section 6: Response Delivery
π¬ Send Transcription (Telegram Node)
Purpose: Delivers the transcribed text back to the Telegram group.
Behavior:
- Short messages: Sent as a single message
- Long messages: Sent as multiple sequential messages
Benefit: Users receive complete transcriptions regardless of length, ensuring no content is lost.
π Workflow Overview Table
| Section | Node Name | Purpose |
|---|---|---|
| 1. Trigger | Receive Message | Captures incoming Telegram messages |
| 2. Access Control | Sender Verification | Validates user permissions |
| 3. Detection | Audio/Voice Recognition | Identifies message type and audio format |
| 4. Validation | File Type Check | Verifies supported audio formats |
| 5. Download | File Download | Retrieves audio file from Telegram |
| 6. Primary AI | OpenAI Transcription | Main transcription service |
| 7. Fallback AI | Gemini Transcription | Backup transcription service |
| 8. Processing | Text Length Check | Determines if splitting is needed |
| 9. Splitting | Code Node | Breaks long text into chunks |
| 10. Response | Send to Telegram | Delivers transcribed text |
π― Key Benefits
- π Secure access control: Only authorized users can trigger transcriptions
- π° Cost management: Prevents unauthorized credit consumption
- π΅ Multi-format support: Handles various Telegram audio types
- π‘οΈ High reliability: Dual-AI fallback ensures transcription success
- π± Telegram-optimized: Automatically handles message length limits
- π Multi-language: Both AI services support numerous languages
- β‘ Real-time notifications: Users receive status updates during processing
- π Automatic chunking: Long transcriptions are intelligently split
- π§ Smart routing: Files are processed through the optimal path
- π Complete delivery: No content loss regardless of transcription length
π Use Cases
- Team meetings: Transcribe voice notes from team discussions
- Client communications: Convert client voice messages to searchable text
- Documentation: Create text records of verbal communications
- Accessibility: Make audio content accessible to all team members
- Multi-language teams: Leverage AI transcription for various languages