Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism

Name: Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism
Availability: InStock
Rating: 4.5 (26 reviews)
Author: Yehor EGMS

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

🎙️ n8n Workflow: Voice Message Transcription with Access Control

This n8n workflow enables automated transcription of voice messages in Telegram groups with built-in access control and intelligent fallback mechanisms. It's designed for teams that need to convert audio messages to text while maintaining security and handling various audio formats.

📌 Section 1: Trigger & Access Control

⚡ Receive Message (Telegram Trigger)

Purpose: Captures incoming messages from users in your Telegram group.

How it works: When a user sends a message (voice, audio, or text), the workflow is triggered and the sender's information is captured.

Benefit: Serves as the entry point for the entire transcription pipeline.

🔐 Sender Verification

Purpose: Validates whether the sender has permission to use the transcription service.

Logic: Check sender against authorized users list If authorized → Proceed to next step If not authorized → Send "Access denied" message and stop workflow

Benefit: Prevents unauthorized users from consuming AI credits and accessing the service.

📌 Section 2: Message Type Detection

🎵 Audio/Voice Recognition

Purpose: Identifies the type of incoming message and audio format.

Why it's needed: Telegram handles different audio types with different statuses:

Voice notes (voice messages)
Audio files (standard audio attachments)
Text messages (no audio content)

Process:

Check if message contains audio/voice content
If no audio file detected → Send "No audio file found" message
If audio detected → Assign file ID and proceed to format detection

🧩 File Type Determination (IF Node)

Purpose: Identifies the specific audio format for proper processing.

Supported formats:

OGG (Telegram voice messages)
MPEG/MP3
MP4/M4A
Other audio formats

Logic:

If format recognized → Proceed to transcription If format not recognized → Send "File format not recognized" message

Benefit: Ensures compatibility with transcription services by validating file types upfront.

📌 Section 3: Primary Transcription (OpenAI)

📥 File Download

Purpose: Downloads the audio file from Telegram for processing.

🤖 OpenAI Transcription

Purpose: Transcribes audio to text using OpenAI's Whisper API.

Why OpenAI: High-quality transcription with cost-effective pricing.

Process:

Send downloaded file to OpenAI transcription API
Simultaneously send notification: "Transcription started"
If successful → Assign transcribed text to variable and proceed
If error occurs → Trigger fallback mechanism

Benefit: Fast, accurate transcription with multi-language support.

📌 Section 4: Fallback Transcription (Gemini)

🛟 Gemini Backup Transcription

Purpose: Provides a safety net if OpenAI transcription fails.

Process:

Receives file only if OpenAI node returns an error
Downloads and processes the same audio file
Sends to Google Gemini for transcription
Assigns transcribed text to the same text variable

Benefit: Ensures high reliability—if one service fails, the other takes over automatically.

📌 Section 5: Message Length Handling

📏 Text Length Check (IF Node)

Purpose: Determines if the transcribed text exceeds Telegram's character limit.

Logic:

If text ≤ 4000 characters → Send directly to Telegram If text > 4000 characters → Split into chunks

Why: Telegram has a 4,000-character limit per message.

✂️ Text Splitting (Code Node)

Purpose: Breaks long transcriptions into 4,000-character segments.

Process:

Receives text longer than 4,000 characters
Splits text into chunks of ≤4,000 characters
Maintains readability by avoiding mid-word breaks
Outputs array of text chunks

📌 Section 6: Response Delivery

💬 Send Transcription (Telegram Node)

Purpose: Delivers the transcribed text back to the Telegram group.

Behavior:

Short messages: Sent as a single message
Long messages: Sent as multiple sequential messages

Benefit: Users receive complete transcriptions regardless of length, ensuring no content is lost.

📊 Workflow Overview Table

Section	Node Name	Purpose
1. Trigger	Receive Message	Captures incoming Telegram messages
2. Access Control	Sender Verification	Validates user permissions
3. Detection	Audio/Voice Recognition	Identifies message type and audio format
4. Validation	File Type Check	Verifies supported audio formats
5. Download	File Download	Retrieves audio file from Telegram
6. Primary AI	OpenAI Transcription	Main transcription service
7. Fallback AI	Gemini Transcription	Backup transcription service
8. Processing	Text Length Check	Determines if splitting is needed
9. Splitting	Code Node	Breaks long text into chunks
10. Response	Send to Telegram	Delivers transcribed text

🎯 Key Benefits

🔐 Secure access control: Only authorized users can trigger transcriptions
💰 Cost management: Prevents unauthorized credit consumption
🎵 Multi-format support: Handles various Telegram audio types
🛡️ High reliability: Dual-AI fallback ensures transcription success
📱 Telegram-optimized: Automatically handles message length limits
🌍 Multi-language: Both AI services support numerous languages
⚡ Real-time notifications: Users receive status updates during processing
🔄 Automatic chunking: Long transcriptions are intelligently split
🧠 Smart routing: Files are processed through the optimal path
📊 Complete delivery: No content loss regardless of transcription length

🚀 Use Cases

Team meetings: Transcribe voice notes from team discussions
Client communications: Convert client voice messages to searchable text
Documentation: Create text records of verbal communications
Accessibility: Make audio content accessible to all team members
Multi-language teams: Leverage AI transcription for various languages

Yehor EGMS

0 workflows

Nodes

set gmail telegram agent google-gemini

Complexity

advanced

Published 14 Oct 2025

Likes 0

View on n8n.io Download Workflow

✨

Share Your Workflow

Have a great workflow to share? Join the n8n Creator Hub and help the community!

Submit Your Template How to Submit

Related Workflows

Analyze legal contracts with GPT-4.1 and manage cases in Google Sheets and Slack

## Who this workflow is for Law firms in corporate, litigation, or family law needing streamlined case and contract management. ## What this workflow does Automatically analyzes contracts using AI, extracts key clauses, logs cases in Google Sheets, routes cases to attorneys, sends client summaries, generates PDFs, and schedules follow-ups. ## How the workflow works 1. Webhook triggers on new case or contract 2. AI analyzes contract 3. Case routed by type 4. Logs case info in Google Sheets 5. Notifies attorney via Slack 6. Sends client email summary 7. Generates PDF report 8. Schedules follow-up events 9. Optional integration with practice management software **Author:** Hyrum Hurst, AI Automation Engineer **Company:** QuarterSmart **Contact:** [email protected]

View

Forecast and report multi-channel tax liabilities with OpenAI, Gmail, Sheets and Airtable

## How It Works This workflow automates tax compliance by aggregating multi-channel revenue data, calculating jurisdiction-specific tax obligations, detecting anomalies, and generating submission-ready reports for tax authorities. Designed for finance teams, tax professionals, and e-commerce operations, it solves the challenge of manually reconciling transactions across multiple sales channels, applying complex tax rules, and preparing compliant filings under tight deadlines. The system triggers monthly or on-demand, fetching revenue data from e-commerce platforms, payment processors, and accounting systems. Transaction records flow through validation layers that merge historical context, classify revenue streams, and calculate tax obligations using jurisdiction-specific rules engines. AI models detect anomalies in tax calculations, identify unusual deduction patterns, and flag potential audit risks. The workflow routes revenue data by tax jurisdiction, applies progressive tax brackets, and generates formatted reports matching authority specifications. Critical anomalies trigger immediate alerts to tax teams via Gmail, while finalized reports store in Google Sheets and Airtable for audit trails. This eliminates 80% of manual tax preparation work, ensures multi-jurisdiction compliance, and reduces filing errors. ## Setup Steps 1. Configure e-commerce API credentials for transaction access 2. Set up payment processor integrations (Stripe, PayPal) for revenue reconciliation 3. Add accounting system credentials (QuickBooks, Xero) for financial data 4. Configure OpenAI API key for anomaly detection and tax analysis 5. Set Gmail OAuth credentials for tax team alert notifications 6. Link Google Sheets for report storage and audit trail documentation 7. Connect Airtable workspace for structured tax record management ## Prerequisites Active e-commerce platform accounts with API access. Payment processor credentials. ## Use Cases Automated monthly sales tax calculations for multi-state e-commerce. ## Customization Modify tax calculation rules for specific jurisdiction requirements. ## Benefits Reduces tax preparation time by 80% through end-to-end automation.

View

Automate satellite data analysis and regulatory reporting with GPT-4 and Slack

## How It Works This workflow automates satellite data processing by ingesting raw geospatial data, applying AI analysis, and submitting formatted reports to regulatory authorities. Designed for environmental agencies, research institutions, and compliance teams, it solves the challenge of manually processing large satellite datasets and preparing standardized submissions for government agencies. The system triggers on scheduled intervals or event webhooks, fetching satellite imagery and sensor data from ECC/climate APIs. Raw data flows through parsing and normalization stages, then routes to AI models for analysis—detecting environmental changes, calculating metrics, and identifying anomalies. Processed results are validated against agency specifications, formatted into SDQAR reports, and automatically stored in designated repositories. The workflow generates submission packages with required metadata, notifies stakeholders via Slack and email, and logs all activities to Google Sheets for audit trails. This eliminates hours of manual data processing, ensures compliance with submission standards, and accelerates environmental monitoring workflows. ## Setup Steps 1. Configure ECC/climate API credentials for satellite data access 2. Set up webhook endpoints for event-driven data ingestion triggers 3. Add OpenAI API key for geospatial analysis and anomaly detection 4. Configure NVIDIA NIM API for specialized environmental modeling 5. Set Google Sheets credentials for audit logging and tracking 6. Connect Slack workspace and specify notification channels for submission updates 7. Configure Gmail OAuth for automated stakeholder notifications ## Prerequisites Active satellite data API access (ECC, NASA, ESA) with authentication credentials. ## Use Cases Automated climate monitoring with monthly regulatory submissions. ## Customization Modify AI analysis prompts for specific environmental parameters. ## Benefits Reduces satellite data processing time by 85% through end-to-end automation.

View

👨‍💻

Need Custom Automation?

N8N Automation Expert

Specialized in N8N automation, I design custom workflows that connect your tools and automate your processes.