Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism

Name: Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism
Availability: InStock
Author: Yehor EGMS

Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism preview

Open on n8n.io

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER

Early Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

Important notice

This workflow is provided as-is. Please review and test before using in production.

Overview

🎙️ n8n Workflow: Voice Message Transcription with Access Control

This n8n workflow enables automated transcription of voice messages in Telegram groups with built-in access control and intelligent fallback mechanisms. It's designed for teams that need to convert audio messages to text while maintaining security and handling various audio formats.

📌 Section 1: Trigger & Access Control

⚡ Receive Message (Telegram Trigger)

Purpose: Captures incoming messages from users in your Telegram group.

How it works: When a user sends a message (voice, audio, or text), the workflow is triggered and the sender's information is captured.

Benefit: Serves as the entry point for the entire transcription pipeline.

🔐 Sender Verification

Purpose: Validates whether the sender has permission to use the transcription service.

Logic: Check sender against authorized users list If authorized → Proceed to next step If not authorized → Send "Access denied" message and stop workflow

Benefit: Prevents unauthorized users from consuming AI credits and accessing the service.

📌 Section 2: Message Type Detection

🎵 Audio/Voice Recognition

Purpose: Identifies the type of incoming message and audio format.

Why it's needed: Telegram handles different audio types with different statuses:

Voice notes (voice messages)
Audio files (standard audio attachments)
Text messages (no audio content)

Process:

Check if message contains audio/voice content
If no audio file detected → Send "No audio file found" message
If audio detected → Assign file ID and proceed to format detection

🧩 File Type Determination (IF Node)

Purpose: Identifies the specific audio format for proper processing.

Supported formats:

OGG (Telegram voice messages)
MPEG/MP3
MP4/M4A
Other audio formats

Logic:

If format recognized → Proceed to transcription If format not recognized → Send "File format not recognized" message

Benefit: Ensures compatibility with transcription services by validating file types upfront.

📌 Section 3: Primary Transcription (OpenAI)

📥 File Download

Purpose: Downloads the audio file from Telegram for processing.

🤖 OpenAI Transcription

Purpose: Transcribes audio to text using OpenAI's Whisper API.

Why OpenAI: High-quality transcription with cost-effective pricing.

Process:

Send downloaded file to OpenAI transcription API
Simultaneously send notification: "Transcription started"
If successful → Assign transcribed text to variable and proceed
If error occurs → Trigger fallback mechanism

Benefit: Fast, accurate transcription with multi-language support.

📌 Section 4: Fallback Transcription (Gemini)

🛟 Gemini Backup Transcription

Purpose: Provides a safety net if OpenAI transcription fails.

Process:

Receives file only if OpenAI node returns an error
Downloads and processes the same audio file
Sends to Google Gemini for transcription
Assigns transcribed text to the same text variable

Benefit: Ensures high reliability—if one service fails, the other takes over automatically.

📌 Section 5: Message Length Handling

📏 Text Length Check (IF Node)

Purpose: Determines if the transcribed text exceeds Telegram's character limit.

Logic:

If text ≤ 4000 characters → Send directly to Telegram If text > 4000 characters → Split into chunks

Why: Telegram has a 4,000-character limit per message.

✂️ Text Splitting (Code Node)

Purpose: Breaks long transcriptions into 4,000-character segments.

Process:

Receives text longer than 4,000 characters
Splits text into chunks of ≤4,000 characters
Maintains readability by avoiding mid-word breaks
Outputs array of text chunks

📌 Section 6: Response Delivery

💬 Send Transcription (Telegram Node)

Purpose: Delivers the transcribed text back to the Telegram group.

Behavior:

Short messages: Sent as a single message
Long messages: Sent as multiple sequential messages

Benefit: Users receive complete transcriptions regardless of length, ensuring no content is lost.

📊 Workflow Overview Table

Section	Node Name	Purpose
1. Trigger	Receive Message	Captures incoming Telegram messages
2. Access Control	Sender Verification	Validates user permissions
3. Detection	Audio/Voice Recognition	Identifies message type and audio format
4. Validation	File Type Check	Verifies supported audio formats
5. Download	File Download	Retrieves audio file from Telegram
6. Primary AI	OpenAI Transcription	Main transcription service
7. Fallback AI	Gemini Transcription	Backup transcription service
8. Processing	Text Length Check	Determines if splitting is needed
9. Splitting	Code Node	Breaks long text into chunks
10. Response	Send to Telegram	Delivers transcribed text

🎯 Key Benefits

🔐 Secure access control: Only authorized users can trigger transcriptions
💰 Cost management: Prevents unauthorized credit consumption
🎵 Multi-format support: Handles various Telegram audio types
🛡️ High reliability: Dual-AI fallback ensures transcription success
📱 Telegram-optimized: Automatically handles message length limits
🌍 Multi-language: Both AI services support numerous languages
⚡ Real-time notifications: Users receive status updates during processing
🔄 Automatic chunking: Long transcriptions are intelligently split
🧠 Smart routing: Files are processed through the optimal path
📊 Complete delivery: No content loss regardless of transcription length

🚀 Use Cases

Team meetings: Transcribe voice notes from team discussions
Client communications: Convert client voice messages to searchable text
Documentation: Create text records of verbal communications
Accessibility: Make audio content accessible to all team members
Multi-language teams: Leverage AI transcription for various languages

Yehor EGMS

4 workflows

Nodes

n8n-nodes-base.telegramtrigger n8n-nodes-base.telegram n8n-nodes-base.stickynote n8n-nodes-base.if n8n-nodes-base.switch n8n-nodes-base.set @n8n/n8n-nodes-langchain.openai @n8n/n8n-nodes-langchain.googlegemini

Complexity

advanced

Published 14 Oct 2025

Likes 0

View on n8n.io Download Workflow

Install path: /data/workflows/9625/9625.json

Share Your Workflow

Have a useful automation to share? Publish it and help the community.

Submit Your Template How to Submit

Related Workflows

Create fillable document templates from PDF or DOCX with GPT-4o and Google Drive

## 📄Template Creator ### How it works This workflow accepts any uploaded document (PDF or DOCX) via webhook and automatically converts it into a reusable fill-in-the-blank template. **Step 1 — Identify:** GPT-4o first reads the document and determines the document type (e.g., Employment Contract, Invoice, NDA, Lease Agreement, Project Proposal) and the specific variable fields that type of document typically contains. **Step 2 — Templatize:** A second AI pass uses the identified document type and field list to replace all variable content with clearly labeled `[BRACKET]` placeholders while preserving all static boilerplate and structure verbatim. **Step 3 — Deliver:** The cleaned template is rendered to PDF via Gotenberg, uploaded to Google Drive, made publicly accessible, and a JSON response with file URLs is returned to the caller. ### Setup 1. Configure the **Webhook** path if needed (default: `general-template-creator`) 2. Set your **OpenAI** credential on both LLM nodes 3. Set your **Google Drive** credential and confirm the target folder ID in the Upload node 4. Confirm the **Gotenberg** URL matches your self-hosted instance 5. Install the community node `n8n-nodes-word2text` (see ⚠️ warning sticky) ### Customization - Swap GPT-4o for GPT-4.1 or GPT-4.1-mini on the Identify node to reduce cost on the lighter classification task - Add a Switch node after identification to route different document types to type-specific prompts - Modify the Drive folder ID to sort templates into subfolders by document type Document accepts input from a form such as the one found here: [Sample Form](https://iportgpt.com/n8n_assets/template_creator_form.html)

View

Generate PDF pricing proposals from Excel with Gotenberg and Outlook

## How it works 1. **Webhook Trigger** — A web form POSTs prospect details, selected services, discount level, and requestor info to the webhook endpoint. 2. **Fetch Pricing Sheet** — Retrieves all rows from your `pricing_request.xlsx` Excel table in OneDrive/SharePoint. 3. **Process & Price** — Filters the pricing rows to the selected services, applies the correct Discount / Retail / Premium price tier, and calculates the total. 4. **Build HTML** — Renders a fully branded HTML proposal document with cover info, line-item pricing table, terms, and a signature block. 5. **Convert to PDF** — Sends the HTML to a self-hosted [Gotenberg](https://gotenberg.dev) instance, which returns a print-ready PDF. 6. **Email Proposal** — Delivers the PDF as an attachment via Gmail OAuth2 to the requester and any internal recipients you configure.

View

Classify and route email attachments with easybits, Gmail and Google Drive

## What This Workflow Does Receive any business document via email. The attachment is automatically **classified** (Invoice, Contract, or Purchase Order) using easybits Extractor, then **routed** down the correct path where a second Extractor pulls out document-specific data. Each route stores the file in Google Drive and triggers the appropriate action – Invoices go to a finance spreadsheet, Contracts and Purchase Orders trigger Slack notifications. ## How It Works 1. **Receive** – Gmail polls for new emails with attachments every minute 2. **Classify** – easybits Extractor identifies the document type and returns a class label 3. **Route** – A Switch node sends the item down the matching path (Invoice / Contract / PO) 4. **Merge Binary** – The original file is merged back into the routed item (classification strips the binary) 5. **Extract** – A second easybits Extractor pulls fields specific to that document type 6. **Merge Data + File** – Extracted JSON and original binary are combined for upload 7. **Store & Notify** – The file is uploaded to Google Drive; Invoices update a spreadsheet, Contracts and POs trigger Slack alerts --- ## Setup Guide ### 1. Create Your easybits Classification Pipeline 1. Go to **extractor.easybits.tech** and create a new pipeline 2. Add **one field** called `document_class` 3. In the field prompt, describe your classification categories and how to identify each one (see the "easybits: Classify Document" node for a reference prompt) 4. The prompt should instruct the model to return exactly one category label – no explanations, no extra text 5. Adjust the categories and identification criteria to match your specific document types 6. Copy your **Pipeline ID** and connect the credential in the classification node > 💡 **Tip:** The classification prompt is the heart of this workflow. The more specific your category descriptions and decision rules are, the more accurate your results will be. ### 2. Create Three Extraction Pipelines Create one pipeline per document type on **extractor.easybits.tech**: - **Invoice pipeline** – fields: `invoice_number`, `total_amount`, `currency`, `due_date`, `vendor_name` - **Contract pipeline** – fields: `client_name`, `contract_type`, `contract_value`, `currency`, `start_date`, `notice_period` - **Purchase Order pipeline** – fields: `supplier_name`, `po_number`, `order_date`, `expected_delivery_date`, `total_amount`, `currency` Connect each pipeline's credentials to the matching Extractor node. ### 3. Set Up Gmail Connect your Gmail account via OAuth2. Optionally filter by label to only process specific emails. Make sure **Download Attachments** is enabled in the node options. ### 4. Set Up Google Drive Create three folders in Google Drive: **Invoices**, **Contracts**, **Purchase Orders**. Select the correct folder in each Upload node. Set the **Input Binary Field** to `attachment_0` (or whichever field carries the PDF). ### 5. Set Up Google Sheets Create a spreadsheet (or use an existing Master Finance File). Make sure the column headers match the field mappings in the "Update Master Finance Sheet" node. ### 6. Set Up Slack Connect your Slack workspace. Select the channel for contract notifications (e.g. `#contracts`) and the channel for PO updates (e.g. `#operations`). Adjust the message templates if your extracted field names differ. ### 7. Activate & Test Set the workflow to active and send a test email with an invoice, contract, and purchase order attached to verify each route works end to end.

View

Need Custom Automation?

Get help designing a custom n8n workflow that connects your stack and fits your process.

Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism

Workflow preview

Important notice

Overview

🎙️ n8n Workflow: Voice Message Transcription with Access Control

📌 Section 1: Trigger & Access Control

⚡ Receive Message (Telegram Trigger)

🔐 Sender Verification

📌 Section 2: Message Type Detection

🎵 Audio/Voice Recognition

🧩 File Type Determination (IF Node)

📌 Section 3: Primary Transcription (OpenAI)

📥 File Download

🤖 OpenAI Transcription

📌 Section 4: Fallback Transcription (Gemini)

🛟 Gemini Backup Transcription

📌 Section 5: Message Length Handling

📏 Text Length Check (IF Node)

✂️ Text Splitting (Code Node)

📌 Section 6: Response Delivery

💬 Send Transcription (Telegram Node)

📊 Workflow Overview Table

🎯 Key Benefits

🚀 Use Cases