Analyze images, videos, documents & audio with Gemini Tools and Qwen LLM Agent

Name: Analyze images, videos, documents & audio with Gemini Tools and Qwen LLM Agent
Availability: InStock
Rating: 4.5 (345 reviews)
Author: Mauricio Perera

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

📁 Analyze uploaded images, videos, audio, and documents with specialized tools — powered by a lightweight language-only agent.

🧭 What It Does

This workflow enables multimodal file analysis using Google Gemini tools connected to a text-only LLM agent. Users can upload images, videos, audio files, or documents via a chat interface. The workflow will:

Upload each file to Google Gemini and obtain an accessible URL.
Dynamically generate contextual prompts based on the file(s) and user message.
Allow the agent to invoke Gemini tools for specific media types as needed.
Return a concise, helpful response based on the analysis.

🚀 Use Cases

Customer support: Let users upload screenshots, documents, or recordings and get helpful insights or summaries.
Multimedia QA: Review visual, audio, or video content for correctness or compliance.
Educational agents: Interpret content from PDFs, diagrams, or audio recordings on the fly.
Low-cost multimodal assistants: Achieve multimodal functionality without relying on large vision-language models.

🎯 Why This Architecture Matters

Unlike end-to-end multimodal LLMs (like Gemini 1.5 or GPT-4o), this template:

Uses a text-only LLM (Qwen 32B via Groq) for reasoning.
Delegates media analysis to specialized Gemini tools.

✅ Advantages

Feature	Benefit
🧩 Modular	LLM + Tools are decoupled; can update them independently
💸 Cost-Efficient	No need to pay for full multimodal models; only use tools when needed
🔧 Tool-based Reasoning	Agent invokes tools on demand, just like OpenAI’s Toolformer setup
⚡ Fast	Groq LLMs offer ultra-fast responses with low latency
📚 Memory	Includes context buffer for multi-turn chats (15 messages)

🧪 How It Works

🔹 Input via Chat

Users submit a message and (optionally) files via the chatTrigger.

🔹 File Handling

If no files: prompt is passed directly to the agent.
If files are included:
- Files are split, uploaded to Gemini (to get public URLs).
- Metadata (name, type, URL) is collected and embedded into the prompt.

🔹 Prompt Construction

A new chatInput is dynamically generated:

User message
Media: [array of file data]

🔹 Agent Reasoning

The Langchain Agent receives:
- The enriched prompt
- File URLs
- Memory context (15 turns)
- Access to 4 Gemini tools:
  - IMG: analyze image
  - VIDEO: analyze video
  - AUDIO: analyze audio
  - DOCUMENT: analyze document

The agent autonomously decides whether and how to use tools, then responds with concise output.

🧱 Nodes & Services

Category	Node / Tool	Purpose
Chat Input	`chatTrigger`	User interface with file support
File Processing	`splitOut`, `splitInBatches`	Process each uploaded file
Upload	`googleGemini`	Uploads each file to Gemini, gets URL
Metadata	`set`, `aggregate`	Builds structured file info
AI Agent	`Langchain Agent`	Receives context + file data
Tools	`googleGeminiTool`	Analyze media with Gemini
LLM	`lmChatGroq` (Qwen 32B)	Text reasoning, high-speed
Memory	`memoryBufferWindow`	Maintains session context

⚙️ Setup Instructions

1. 🔑 Required Credentials

Groq API key (for Qwen 32B model)
Google Gemini API key (Palm / Gemini 1.5 tools)

2. 🧩 Nodes That Need Setup

Replace existing credentials on:
- Upload a file
- Each GeminiTool (IMG, VIDEO, AUDIO, DOCUMENT)
- lmChatGroq

3. ⚠️ File Size & Format Considerations

Some Gemini tools have file size or format restrictions.
You may add validation nodes before uploading if needed.

🛠️ Optional Improvements

Add logging and error handling (e.g., for upload failures).
Add MIME-type filtering to choose the right tool explicitly.
Extend to include OCR or transcription services pre-analysis.
Integrate with Slack, Telegram, or WhatsApp for chat delivery.

🧪 Example Use Case

> "Hola, ¿qué dice este PDF?"

Uploads a document → Agent routes it to Gemini DOCUMENT tool → Receives extracted content → LLM summarizes it in Spanish.

🧰 Tags

multimodal, agent, langchain, groq, gemini, image analysis, audio analysis, document parsing, video analysis, file uploader, chat assistant, LLM tools, memory, AI tools

📂 Files

This template is ready to use as-is in n8n.
No external webhooks or integrations required.

Mauricio Perera

0 workflows

Nodes

set gmail telegram agent google-gemini

Complexity

advanced

Published 06 Aug 2025

Likes 0

View on n8n.io Download Workflow

✨

Share Your Workflow

Have a great workflow to share? Join the n8n Creator Hub and help the community!

Submit Your Template How to Submit

Related Workflows

Analyze legal contracts with GPT-4.1 and manage cases in Google Sheets and Slack

## Who this workflow is for Law firms in corporate, litigation, or family law needing streamlined case and contract management. ## What this workflow does Automatically analyzes contracts using AI, extracts key clauses, logs cases in Google Sheets, routes cases to attorneys, sends client summaries, generates PDFs, and schedules follow-ups. ## How the workflow works 1. Webhook triggers on new case or contract 2. AI analyzes contract 3. Case routed by type 4. Logs case info in Google Sheets 5. Notifies attorney via Slack 6. Sends client email summary 7. Generates PDF report 8. Schedules follow-up events 9. Optional integration with practice management software **Author:** Hyrum Hurst, AI Automation Engineer **Company:** QuarterSmart **Contact:** [email protected]

View

Forecast and report multi-channel tax liabilities with OpenAI, Gmail, Sheets and Airtable

## How It Works This workflow automates tax compliance by aggregating multi-channel revenue data, calculating jurisdiction-specific tax obligations, detecting anomalies, and generating submission-ready reports for tax authorities. Designed for finance teams, tax professionals, and e-commerce operations, it solves the challenge of manually reconciling transactions across multiple sales channels, applying complex tax rules, and preparing compliant filings under tight deadlines. The system triggers monthly or on-demand, fetching revenue data from e-commerce platforms, payment processors, and accounting systems. Transaction records flow through validation layers that merge historical context, classify revenue streams, and calculate tax obligations using jurisdiction-specific rules engines. AI models detect anomalies in tax calculations, identify unusual deduction patterns, and flag potential audit risks. The workflow routes revenue data by tax jurisdiction, applies progressive tax brackets, and generates formatted reports matching authority specifications. Critical anomalies trigger immediate alerts to tax teams via Gmail, while finalized reports store in Google Sheets and Airtable for audit trails. This eliminates 80% of manual tax preparation work, ensures multi-jurisdiction compliance, and reduces filing errors. ## Setup Steps 1. Configure e-commerce API credentials for transaction access 2. Set up payment processor integrations (Stripe, PayPal) for revenue reconciliation 3. Add accounting system credentials (QuickBooks, Xero) for financial data 4. Configure OpenAI API key for anomaly detection and tax analysis 5. Set Gmail OAuth credentials for tax team alert notifications 6. Link Google Sheets for report storage and audit trail documentation 7. Connect Airtable workspace for structured tax record management ## Prerequisites Active e-commerce platform accounts with API access. Payment processor credentials. ## Use Cases Automated monthly sales tax calculations for multi-state e-commerce. ## Customization Modify tax calculation rules for specific jurisdiction requirements. ## Benefits Reduces tax preparation time by 80% through end-to-end automation.

View

Automate satellite data analysis and regulatory reporting with GPT-4 and Slack

## How It Works This workflow automates satellite data processing by ingesting raw geospatial data, applying AI analysis, and submitting formatted reports to regulatory authorities. Designed for environmental agencies, research institutions, and compliance teams, it solves the challenge of manually processing large satellite datasets and preparing standardized submissions for government agencies. The system triggers on scheduled intervals or event webhooks, fetching satellite imagery and sensor data from ECC/climate APIs. Raw data flows through parsing and normalization stages, then routes to AI models for analysis—detecting environmental changes, calculating metrics, and identifying anomalies. Processed results are validated against agency specifications, formatted into SDQAR reports, and automatically stored in designated repositories. The workflow generates submission packages with required metadata, notifies stakeholders via Slack and email, and logs all activities to Google Sheets for audit trails. This eliminates hours of manual data processing, ensures compliance with submission standards, and accelerates environmental monitoring workflows. ## Setup Steps 1. Configure ECC/climate API credentials for satellite data access 2. Set up webhook endpoints for event-driven data ingestion triggers 3. Add OpenAI API key for geospatial analysis and anomaly detection 4. Configure NVIDIA NIM API for specialized environmental modeling 5. Set Google Sheets credentials for audit logging and tracking 6. Connect Slack workspace and specify notification channels for submission updates 7. Configure Gmail OAuth for automated stakeholder notifications ## Prerequisites Active satellite data API access (ECC, NASA, ESA) with authentication credentials. ## Use Cases Automated climate monitoring with monthly regulatory submissions. ## Customization Modify AI analysis prompts for specific environmental parameters. ## Benefits Reduces satellite data processing time by 85% through end-to-end automation.

View

👨‍💻

Need Custom Automation?

N8N Automation Expert

Specialized in N8N automation, I design custom workflows that connect your tools and automate your processes.