Analyze images, videos, documents & audio with Gemini Tools and Qwen LLM Agent

Name: Analyze images, videos, documents & audio with Gemini Tools and Qwen LLM Agent
Availability: InStock
Author: Mauricio Perera

Analyze images, videos, documents & audio with Gemini Tools and Qwen LLM Agent preview

Open on n8n.io

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER

Early Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

Important notice

This workflow is provided as-is. Please review and test before using in production.

Overview

📁 Analyze uploaded images, videos, audio, and documents with specialized tools — powered by a lightweight language-only agent.

🧭 What It Does

This workflow enables multimodal file analysis using Google Gemini tools connected to a text-only LLM agent. Users can upload images, videos, audio files, or documents via a chat interface. The workflow will:

Upload each file to Google Gemini and obtain an accessible URL.
Dynamically generate contextual prompts based on the file(s) and user message.
Allow the agent to invoke Gemini tools for specific media types as needed.
Return a concise, helpful response based on the analysis.

🚀 Use Cases

Customer support: Let users upload screenshots, documents, or recordings and get helpful insights or summaries.
Multimedia QA: Review visual, audio, or video content for correctness or compliance.
Educational agents: Interpret content from PDFs, diagrams, or audio recordings on the fly.
Low-cost multimodal assistants: Achieve multimodal functionality without relying on large vision-language models.

🎯 Why This Architecture Matters

Unlike end-to-end multimodal LLMs (like Gemini 1.5 or GPT-4o), this template:

Uses a text-only LLM (Qwen 32B via Groq) for reasoning.
Delegates media analysis to specialized Gemini tools.

✅ Advantages

Feature	Benefit
🧩 Modular	LLM + Tools are decoupled; can update them independently
💸 Cost-Efficient	No need to pay for full multimodal models; only use tools when needed
🔧 Tool-based Reasoning	Agent invokes tools on demand, just like OpenAI’s Toolformer setup
⚡ Fast	Groq LLMs offer ultra-fast responses with low latency
📚 Memory	Includes context buffer for multi-turn chats (15 messages)

🧪 How It Works

🔹 Input via Chat

Users submit a message and (optionally) files via the chatTrigger.

🔹 File Handling

If no files: prompt is passed directly to the agent.
If files are included:
- Files are split, uploaded to Gemini (to get public URLs).
- Metadata (name, type, URL) is collected and embedded into the prompt.

🔹 Prompt Construction

A new chatInput is dynamically generated:

User message
Media: [array of file data]

🔹 Agent Reasoning

The Langchain Agent receives:
- The enriched prompt
- File URLs
- Memory context (15 turns)
- Access to 4 Gemini tools:
  - IMG: analyze image
  - VIDEO: analyze video
  - AUDIO: analyze audio
  - DOCUMENT: analyze document

The agent autonomously decides whether and how to use tools, then responds with concise output.

🧱 Nodes & Services

Category	Node / Tool	Purpose
Chat Input	`chatTrigger`	User interface with file support
File Processing	`splitOut`, `splitInBatches`	Process each uploaded file
Upload	`googleGemini`	Uploads each file to Gemini, gets URL
Metadata	`set`, `aggregate`	Builds structured file info
AI Agent	`Langchain Agent`	Receives context + file data
Tools	`googleGeminiTool`	Analyze media with Gemini
LLM	`lmChatGroq` (Qwen 32B)	Text reasoning, high-speed
Memory	`memoryBufferWindow`	Maintains session context

⚙️ Setup Instructions

1. 🔑 Required Credentials

Groq API key (for Qwen 32B model)
Google Gemini API key (Palm / Gemini 1.5 tools)

2. 🧩 Nodes That Need Setup

Replace existing credentials on:
- Upload a file
- Each GeminiTool (IMG, VIDEO, AUDIO, DOCUMENT)
- lmChatGroq

3. ⚠️ File Size & Format Considerations

Some Gemini tools have file size or format restrictions.
You may add validation nodes before uploading if needed.

🛠️ Optional Improvements

Add logging and error handling (e.g., for upload failures).
Add MIME-type filtering to choose the right tool explicitly.
Extend to include OCR or transcription services pre-analysis.
Integrate with Slack, Telegram, or WhatsApp for chat delivery.

🧪 Example Use Case

> "Hola, ¿qué dice este PDF?"

Uploads a document → Agent routes it to Gemini DOCUMENT tool → Receives extracted content → LLM summarizes it in Spanish.

🧰 Tags

multimodal, agent, langchain, groq, gemini, image analysis, audio analysis, document parsing, video analysis, file uploader, chat assistant, LLM tools, memory, AI tools

📂 Files

This template is ready to use as-is in n8n.
No external webhooks or integrations required.

Mauricio Perera

9 workflows

Nodes

@n8n/n8n-nodes-langchain.agent n8n-nodes-base.splitinbatches n8n-nodes-base.splitout n8n-nodes-base.aggregate n8n-nodes-base.merge @n8n/n8n-nodes-langchain.googlegeminitool @n8n/n8n-nodes-langchain.chattrigger n8n-nodes-base.if

Complexity

advanced

Published 06 Aug 2025

Likes 0

View on n8n.io Download Workflow

Install path: /data/workflows/7026/7026.json

Share Your Workflow

Have a useful automation to share? Publish it and help the community.

Submit Your Template How to Submit

Related Workflows

Create fillable document templates from PDF or DOCX with GPT-4o and Google Drive

## 📄Template Creator ### How it works This workflow accepts any uploaded document (PDF or DOCX) via webhook and automatically converts it into a reusable fill-in-the-blank template. **Step 1 — Identify:** GPT-4o first reads the document and determines the document type (e.g., Employment Contract, Invoice, NDA, Lease Agreement, Project Proposal) and the specific variable fields that type of document typically contains. **Step 2 — Templatize:** A second AI pass uses the identified document type and field list to replace all variable content with clearly labeled `[BRACKET]` placeholders while preserving all static boilerplate and structure verbatim. **Step 3 — Deliver:** The cleaned template is rendered to PDF via Gotenberg, uploaded to Google Drive, made publicly accessible, and a JSON response with file URLs is returned to the caller. ### Setup 1. Configure the **Webhook** path if needed (default: `general-template-creator`) 2. Set your **OpenAI** credential on both LLM nodes 3. Set your **Google Drive** credential and confirm the target folder ID in the Upload node 4. Confirm the **Gotenberg** URL matches your self-hosted instance 5. Install the community node `n8n-nodes-word2text` (see ⚠️ warning sticky) ### Customization - Swap GPT-4o for GPT-4.1 or GPT-4.1-mini on the Identify node to reduce cost on the lighter classification task - Add a Switch node after identification to route different document types to type-specific prompts - Modify the Drive folder ID to sort templates into subfolders by document type Document accepts input from a form such as the one found here: [Sample Form](https://iportgpt.com/n8n_assets/template_creator_form.html)

View

Generate PDF pricing proposals from Excel with Gotenberg and Outlook

## How it works 1. **Webhook Trigger** — A web form POSTs prospect details, selected services, discount level, and requestor info to the webhook endpoint. 2. **Fetch Pricing Sheet** — Retrieves all rows from your `pricing_request.xlsx` Excel table in OneDrive/SharePoint. 3. **Process & Price** — Filters the pricing rows to the selected services, applies the correct Discount / Retail / Premium price tier, and calculates the total. 4. **Build HTML** — Renders a fully branded HTML proposal document with cover info, line-item pricing table, terms, and a signature block. 5. **Convert to PDF** — Sends the HTML to a self-hosted [Gotenberg](https://gotenberg.dev) instance, which returns a print-ready PDF. 6. **Email Proposal** — Delivers the PDF as an attachment via Gmail OAuth2 to the requester and any internal recipients you configure.

View

Classify and route email attachments with easybits, Gmail and Google Drive

## What This Workflow Does Receive any business document via email. The attachment is automatically **classified** (Invoice, Contract, or Purchase Order) using easybits Extractor, then **routed** down the correct path where a second Extractor pulls out document-specific data. Each route stores the file in Google Drive and triggers the appropriate action – Invoices go to a finance spreadsheet, Contracts and Purchase Orders trigger Slack notifications. ## How It Works 1. **Receive** – Gmail polls for new emails with attachments every minute 2. **Classify** – easybits Extractor identifies the document type and returns a class label 3. **Route** – A Switch node sends the item down the matching path (Invoice / Contract / PO) 4. **Merge Binary** – The original file is merged back into the routed item (classification strips the binary) 5. **Extract** – A second easybits Extractor pulls fields specific to that document type 6. **Merge Data + File** – Extracted JSON and original binary are combined for upload 7. **Store & Notify** – The file is uploaded to Google Drive; Invoices update a spreadsheet, Contracts and POs trigger Slack alerts --- ## Setup Guide ### 1. Create Your easybits Classification Pipeline 1. Go to **extractor.easybits.tech** and create a new pipeline 2. Add **one field** called `document_class` 3. In the field prompt, describe your classification categories and how to identify each one (see the "easybits: Classify Document" node for a reference prompt) 4. The prompt should instruct the model to return exactly one category label – no explanations, no extra text 5. Adjust the categories and identification criteria to match your specific document types 6. Copy your **Pipeline ID** and connect the credential in the classification node > 💡 **Tip:** The classification prompt is the heart of this workflow. The more specific your category descriptions and decision rules are, the more accurate your results will be. ### 2. Create Three Extraction Pipelines Create one pipeline per document type on **extractor.easybits.tech**: - **Invoice pipeline** – fields: `invoice_number`, `total_amount`, `currency`, `due_date`, `vendor_name` - **Contract pipeline** – fields: `client_name`, `contract_type`, `contract_value`, `currency`, `start_date`, `notice_period` - **Purchase Order pipeline** – fields: `supplier_name`, `po_number`, `order_date`, `expected_delivery_date`, `total_amount`, `currency` Connect each pipeline's credentials to the matching Extractor node. ### 3. Set Up Gmail Connect your Gmail account via OAuth2. Optionally filter by label to only process specific emails. Make sure **Download Attachments** is enabled in the node options. ### 4. Set Up Google Drive Create three folders in Google Drive: **Invoices**, **Contracts**, **Purchase Orders**. Select the correct folder in each Upload node. Set the **Input Binary Field** to `attachment_0` (or whichever field carries the PDF). ### 5. Set Up Google Sheets Create a spreadsheet (or use an existing Master Finance File). Make sure the column headers match the field mappings in the "Update Master Finance Sheet" node. ### 6. Set Up Slack Connect your Slack workspace. Select the channel for contract notifications (e.g. `#contracts`) and the channel for PO updates (e.g. `#operations`). Adjust the message templates if your extracted field names differ. ### 7. Activate & Test Set the workflow to active and send a test email with an invoice, contract, and purchase order attached to verify each route works end to end.

View

Need Custom Automation?

Get help designing a custom n8n workflow that connects your stack and fits your process.

Analyze images, videos, documents & audio with Gemini Tools and Qwen LLM Agent

Workflow preview

Important notice

Overview

📁 Analyze uploaded images, videos, audio, and documents with specialized tools — powered by a lightweight language-only agent.

🧭 What It Does

🚀 Use Cases

🎯 Why This Architecture Matters

✅ Advantages

🧪 How It Works

🔹 Input via Chat

🔹 File Handling

🔹 Prompt Construction

🔹 Agent Reasoning

🧱 Nodes & Services

⚙️ Setup Instructions

1. 🔑 Required Credentials

2. 🧩 Nodes That Need Setup

3. ⚠️ File Size & Format Considerations

🛠️ Optional Improvements

🧪 Example Use Case

🧰 Tags

📂 Files