Multimodal telegram bot with voice, image & video analysis using Claude & Gemini
Workflow preview
DISCOUNT 20%
Important notice
This workflow is provided as-is. Please review and test before using in production.
Overview
What it's for:
This is a base template for anyone trying to develop a telegram AI Agent. This base allows for multiple inputs (Voice, Picture, Video, and Text inputs) to be processed by an AI model of their choosing to a get a User started. From here, the User may connect any tools that they see fit to the AI Agent for their n8n workflows.
How it works:
Input: Telegram message to a bot chat
n8n Processing: Switch node determines the type:
- Voice Message
- Picture Message
- Video Message
- Text Message
(Currently uses OpenAI and Gemini to analyze Voice/Photo/Video content but feel free to change these nodes with other models)
AI Agent Proccessing: LLM of your choosing examines message and based on system prompt, generates an output
Output: AI Output is sent back in telegram Message
How to use:
Create your chat bot and generate access token -> Search Bot father in telegram -> Type "/newbot" -> follow instructions and create access token -> Copy access token
Create Credentials in n8n -> Open telegram trigger node -> Click create credential -> Paste access token -> Save
Create LLM access token (Different per LLM but search your LLM + API in google) -> (will have to create an account with the LLM platform) -> buy credits to use LLM API -> Generate Access token -> Paste token in LLM node
Requirements:
- Telegram Bot Access Token
- Google Gemini Access Token (For Picture and Video messages)
- OpenAI Access Token (For Voice messages)
- LLM Access Token (Your preference for the AI Agent)
Customizing this workflow:
- To personalize the AI Output, adjust the system prompt (give context or directions on the AI's role)
- Add tools to the AI agent to give it more utility besides a personalied LLM (Example: Calendars, Databases, etc).