Analyze images & extract text with GPT-4o Vision and Telegram

Name: Analyze images & extract text with GPT-4o Vision and Telegram
Availability: InStock
Rating: 4.5 (92 reviews)
Author: AI/ML API | D1m7asis

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

Who’s it for

Teams and makers who want a plug-and-play vision bot: users send a photo in Telegram, the bot returns a concise description plus OCR text. No custom servers required—just n8n, a Telegram bot, and an AIMLAPI key.

What it does / How it works

The workflow listens for new Telegram messages, fetches the highest-resolution photo, converts it to base64, normalizes the MIME type, and calls AIMLAPI (GPT-4o Vision) via the HTTP Request node using the OpenAI-compatible messages format with an image_url data URI. The model returns a short caption and extracted text. The answer is sent back to the same Telegram chat.

Requirements

n8n instance (self-hosted or cloud)
Telegram bot token (from @BotFather)
AIMLAPI account and API key (OpenAI-compatible endpoint)

How to set up

Create a Telegram bot with @BotFather and copy the token.
In n8n, add Telegram credentials (no hardcoded tokens in nodes).
Add AIMLAPI credentials with your API key (base URL: https://api.aimlapi.com/v1).
Import the workflow JSON and connect credentials in the nodes.
Execute the trigger and send a photo to your bot to test.

How to customize the workflow

Modify the vision prompt (e.g., add brand, language, or formatting rules).
Switch models within AIMLAPI (any vision-capable model using the same messages schema).
Add an IF branch for text-only messages (reply with guidance).
Log usage to Google Sheets or a database (user id, file id, response).
Add rate limits, user allowlists, or Markdown formatting in Telegram responses.
Increase timeouts/retries in the HTTP Request node for long-running images.

AI/ML API | D1m7asis

0 workflows

Nodes

set gmail telegram agent google-gemini

Complexity

advanced

Published 03 Sept 2025

Likes 0

View on n8n.io Download Workflow

✨

Share Your Workflow

Have a great workflow to share? Join the n8n Creator Hub and help the community!

Submit Your Template How to Submit

Related Workflows

Build a website-powered customer support chatbot with Decodo, Pinecone and Gemini

**Categories:** Business Automation, Customer Support, AI, Knowledge Management This comprehensive workflow enables businesses to build and deploy a custom-trained AI Chatbot in minutes. By combining a sophisticated data scraping engine with a RAG-based (Retrieval-Augmented Generation) chat interface, it allows you to transform website content into a high-performance support agent. Powered by **Google Gemini** and **Pinecone**, this system ensures your chatbot provides accurate, real-time answers based exclusively on your business data. ### **Benefits** * **Instant Knowledge Sync** - Automatically crawls sitemaps and URLs to keep your AI up-to-date with your latest website content. * **Embeddable Anywhere** - Features a ready-to-use chat trigger that can be integrated into the bottom-right of any website via a simple script. * **High-Fidelity Retrieval** - Uses vector embeddings to ensure the AI "searches" your documentation before answering, reducing hallucinations. * **Smart Conversational Memory** - Equipped with a 10-message window buffer, allowing the bot to handle complex follow-up questions naturally. * **Cost-Efficient Scaling** - Leverages Gemini’s efficient API and Pinecone’s high-speed indexing to manage thousands of customer queries at a low cost. ### **How It Works** 1. **Dual-Path Ingestion:** The process begins with an n8n Form where you provide a sitemap or individual URLs. The workflow automatically handles XML parsing and URL cleaning to prepare a list of pages for processing. 2. **Clean Content Extraction:** Using **Decodo**, the workflow fetches the HTML of each page and uses a specialized extraction node to strip away code, ads, and navigation, leaving only the high-value text content. **SignUp using:** [dashboard.decodo.com/register?referral_code=55543bbdb96ffd8cf45c2605147641ee017e7900](dashboard.decodo.com/register?referral_code=55543bbdb96ffd8cf45c2605147641ee017e7900). 3. **Vectorization & Storage:** The cleaned text is passed to the **Gemini Embedding** model, which converts the information into 3076-dimensional vectors. These are stored in a **Pinecone** "supportbot" index for instant retrieval. 4. **RAG-Powered Chat Agent:** When a user sends a message through the chat widget, an **AI Agent** takes over. It uses the user's query to search the Pinecone database for relevant business facts. 5. **Intelligent Response Generation:** The AI Agent passes the retrieved facts and the current chat history to **Google Gemini**, which generates a polite, accurate, and contextually relevant response for the user. ### **Requirements** * **n8n Instance:** A self-hosted or cloud instance of n8n. * **Google Gemini API Key:** For text embeddings and chat generation. * **Pinecone Account:** An API key and a "supportbot" index to store your knowledge base. * **Decodo Access:** For high-quality website content extraction. ### **How to Use** 1. **Initialize the Knowledge Base:** Use the Form Trigger to input your website URL or Sitemap. Run the ingestion flow to populate your Pinecone index. 2. **Configure Credentials:** Authenticate your Google Gemini and Pinecone accounts within n8n. 3. **Deploy the Chatbot:** Enable the Chat Trigger node. Use the provided webhook URL to connect the backend to your website's frontend chat widget. 4. **Test & Refine:** Interact with the bot to ensure it retrieves the correct data, and update your knowledge base by re-running the ingestion flow whenever your website content changes. ### **Business Use Cases** * **Customer Support Teams** - Automate answers to 80% of common FAQs using your existing documentation. * **E-commerce Sites** - Help customers find product details, shipping policies, and return information instantly. * **SaaS Providers** - Build an interactive technical documentation assistant to help users navigate your software. * **Marketing Agencies** - Offer "AI-powered site search" as an add-on service for client websites. ### **Efficiency Gains** * **Reduce Ticket Volume** by providing instant self-service options. * **Eliminate Manual Data Entry** by scraping content directly from the live website. * **Improve UX** with 24/7 availability and zero wait times for customers. **Difficulty Level:** Intermediate **Estimated Setup Time:** 30 min **Monthly Operating Cost:** Low (variable based on AI usage and Pinecone tier)

View

Create an all-in-one Discord assistant with Gemini, Llama Vision & Flux images

This n8n template demonstrates how to build **O'Carla**, an advanced all-in-one Discord AI assistant. It intelligently handles natural conversations, professional image generation, and visual file analysis within a single server integration. Use cases are many: Deploy a smart community manager that remembers past interactions, an on-demand artistic tool for your members, or an AI that can "read" and explain uploaded documents and images! ## Good to know * **API Costs:** Each interaction costs vary depending on the model used (Gemini vs. OpenRouter). Check your provider's dashboard for updated pricing. * **Infrastructure:** This workflow requires a separate Discord bot script (e.g., Node.js) to forward events to the n8n Webhook. It is recommended to host the bot using **PM2** for 24/7 uptime. ## How it works 1. **Webhook Trigger:** Receives incoming data (text and attachments) from your Discord bot. 2. **Intent Routing:** The workflow uses conditional logic to detect if the user wants an image (via keyword `gambar:`), a vision analysis (via attachments), or a standard chat. 3. **Multi-Model Intelligence:** * **Gemini 2.5:** Powers rapid and high-quality general chat reasoning. * **Llama 3.2 Vision (via OpenRouter):** Specifically used to describe and analyze images or text-based files. * **Flux (via Pollinations):** Uses a specialized AI Agent to refine prompts and generate professional-grade images. 4. **Contextual Memory:** A 50-message buffer window ensures O'Carla maintains the context of your conversation based on your Discord User ID. 5. **Clean UI Output:** Generated image links are automatically shortened via **TinyURL** to keep the Discord chat interface tidy. ## How to use 1. Connect your **Google Gemini** and **OpenRouter** API keys in the respective nodes. 2. Replace the Webhook URL in your bot script with this workflow's **Production Webhook URL**. 3. Type `gambar: [your prompt]` in Discord to generate images. 4. Upload an image or file to Discord to trigger the AI Vision analysis. ## Requirements * n8n instance (Self-hosted or Cloud). * Google Gemini API Key. * OpenRouter API Key. * Discord Bot Token and hosting environment. ## Customising this workflow O'Carla is highly flexible. You can change her personality by modifying the **System Message** in the Agent nodes, adjust the memory window length, or swap the LLM models to specialized ones like Claude 3.5 or GPT-4o.

View

AI chatbot for Max Messenger with voice recognition (GigaChat +SaluteSpeech)

**Name:** AI Chatbot for Max Messenger with Voice Recognition (GigaChat + Sber) **Description:** ### How it works This workflow powers an intelligent, conversational AI bot for Max messenger that can understand and respond to both **text and voice messages**. The bot uses GigaChat AI with built-in memory, allowing it to remember the conversation history for each unique user and answer follow-up questions. Voice messages are transcribed using Sber SmartSpeech. It's a complete solution for creating an engaging, automated assistant within your Max bot, using Russian AI services. ### Step-by-step * **Max Trigger:** The workflow starts when the **Max Trigger** node receives a new message sent to your Max bot. * **Access Control:** The **Check User** node verifies the sender's user ID against an allowed list. This prevents unauthorized users from accessing your bot. * **Access Denied Response:** If the user is not authorized, the **Access Denied** node sends a polite rejection message. * **Message Type Routing:** The **Text/Attachment** (Switch) node checks if the message contains plain text or has attachments (voice, photo, file). * **Attachment Processing:** If an attachment is detected, the **Download Attachment** (HTTP Request) node retrieves it, and the **Attachment Router** (Switch) node determines its type (voice, photo, or file). * **Voice Transcription:** For voice messages, the workflow gets a Sber access token via **Get Access Token** (HTTP Request), merges it with the audio file, and sends it to **Get Response** (HTTP Request) which uses Sber SmartSpeech API to transcribe the audio to text. * **Input Unification:** The **Voice to Prompt** node converts transcribed text into a prompt, while **Text to Prompt** does the same for plain text messages. Both paths merge at the **Combine** node. * **AI Agent Processing:** The unified prompt is passed to the **AI Agent**, powered by **GigaChat Model** and using **Simple Memory** to retain the last 10 messages per user (using Max `user_id` as the session key). * **Response Delivery:** The AI-generated response is sent back to the user via the **Send Message** node. ### Set up steps Estimated set up time: 15 minutes 1. **Get Max bot credentials:** Visit https://business.max.ru/ to create a bot and obtain API credentials. Add these credentials to **Max Trigger**, **Send Message**, and **Access Denied** nodes. 2. **Add GigaChat credentials:** Register for GigaChat API access and add your credentials to the **GigaChat Model** node. 3. **Add Sber credentials:** Obtain Sber SmartSpeech API credentials and add them to **Get Access Token** and **Get Response** nodes (HTTP Header Auth). 4. **Configure access control:** Open the **Check User** node and change the `user_id` value (currently 50488534) to your own Max user ID. This ensures only you can use the bot during testing. 5. **Customize bot personality:** Open the **AI Agent** node and edit the system message to change the bot's name, behavior, and add your own contact information or links. 6. **Test the bot:** Activate the workflow and send a text or voice message to your Max bot to verify it responds correctly. ### Notes This workflow is specifically designed for Russian-speaking users and uses Russian AI services (GigaChat and Sber SmartSpeech) as alternatives to OpenAI. Make sure you have valid API access to both services before setting up this workflow.

View

👨‍💻

Need Custom Automation?

N8N Automation Expert

Specialized in N8N automation, I design custom workflows that connect your tools and automate your processes.