Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector

Name: Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector
Availability: InStock
Author: Wassim Abid

Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector preview

Open on n8n.io

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER

Early Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

Overview

Build a fully local RAG chatbot using Ollama that works without tool calling — ideal for smaller open-source models like Qwen that don't support native function calls. This template lets you run a private, self-hosted AI assistant with retrieval-augmented generation using only your own hardware.

How it works

A Webhook receives the user's chat message
A small classifier LLM (Qwen 7B) analyzes the input and decides: is this small talk, or a real question that needs the knowledge base?
For small talk, a dedicated AI agent responds conversationally with chat memory
For real questions, the classifier generates focused sub-queries, which are sent through a loop-based RAG pipeline:

Each sub-query is embedded using BGE-M3 and matched against a Postgres PGVector store
Results are filtered by a relevance score threshold (>0.4)
Chunks are aggregated and deduplicated across all sub-queries

An Answer Generator agent (Qwen 14B) produces a sourced answer using a strict 3-step format: short answer → sources → follow-up question
Both paths use Postgres-backed chat memory for multi-turn conversations
A post-processing step removes <think> tags that some reasoning models produce

Set up steps

Install Ollama and pull the required models:

ollama pull qwen2.5:7b (classifier + small talk)
ollama pull qwen3:14b (answer generation)
ollama pull bge-m3 (embeddings)

Set up PostgreSQL with the pgvector extension enabled
Create your vector store — ingest your documents into the PGVector store using BGE-M3 embeddings (you can use n8n's built-in document loaders for this)
Configure credentials in n8n:

Ollama connection (default: http://localhost:11434)
PostgreSQL connection for both chat memory and vector store

Customize the webhook path and connect it to your frontend or API client
Optional: Adjust the relevance score threshold, swap models for larger/smaller ones, or modify the system prompts to match your use case

Wassim Abid

2 workflows

Complexity

advanced

Published 06 Apr 2026

Likes 0

View on n8n.io Download Workflow

Install path: /data/workflows/14782/14782.json

Share Your Workflow

Have a useful automation to share? Publish it and help the community.

Submit Your Template How to Submit

Related Workflows

Generate n8n workflows from chat using MCP tools, Claude and Postgres

Describe any automation in plain English. This AI agent builds, validates, and deploys a working n8n workflow into your instance in seconds — no node dragging, no JSON editing, no guesswork. 📺 Full setup guide: [https://youtu.be/HMXBInLaUB8?si=Q21gt48omSyzsTGl] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🎯 WHO IS THIS FOR? ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ → n8n builders who want to generate workflows by chatting instead of configuring every node manually → Agencies building automations for clients at speed → Developers who want an AI co-pilot with real n8n knowledge → Non-technical teams who need to create workflows by describing them in plain language ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ⚡ WHAT MAKES THIS DIFFERENT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Most AI workflow generators hallucinate node names, guess parameters, and produce broken JSON. This one doesn't. It uses the official n8n MCP Client Tool node — natively built into n8n, no community packages required — to connect to a self-hosted n8n-mcp server (czlonkowski/n8n-mcp, 16k+ GitHub stars). That server holds the complete, structured documentation of 1,396 n8n nodes in a SQLite database with 12ms response time and zero rate limits. The agent never guesses. It looks everything up first. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✅ WHAT IT DOES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ → Accepts plain English prompts via n8n's built-in chat UI → Looks up exact node type strings — never assumes from training data → Validates every node config before building anything → Runs 3 separate post-build checks: structure, connections, and expression syntax → Deploys directly to your n8n instance via the REST API → Returns only the workflow URL + credential checklist → Remembers conversation history for follow-up edits → Modifies existing workflows with diff updates, not rebuilds ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ⚙️ HOW IT WORKS — STEP BY STEP ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1. Chat Trigger receives your message and assigns a session ID 2. Postgres Chat Memory stores conversation history so follow-ups like "change the trigger to a webhook" work naturally without losing context 3. AI Agent (Claude Sonnet 4.5) reads the system prompt and begins calling MCP tools in strict order: — tools_documentation() to load current best practices — search_nodes() for exact node type strings — get_node_essentials() for the 10-20 params that matter — search_templates() to find proven real-world patterns from 2,646 templates — validate_node_minimal() + validate_node_operation() to pre-validate before building — validate_workflow() + validate_workflow_connections() + validate_workflow_expressions() post-build — n8n_create_workflow() to deploy live — n8n_validate_workflow() to confirm deployment 4. MCP Client Tool (official n8n node, no community packages) connects to the n8n-mcp Docker container on the same Docker network — all 42 tools exposed, bearer auth secured 5. Respond To Chat sends only the workflow URL and a credential checklist — never raw JSON ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🔧 WHAT YOU NEED ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ → Self-hosted n8n (Docker) → n8n-mcp container on the same Docker network (full docker-compose in sticky notes) → OpenRouter API key (or native Anthropic node) → Postgres for chat memory → n8n API key with workflow create permissions → Bearer token credential for n8n-mcp No community nodes required. MCP Client Tool is an official built-in node in n8n. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🛠️ NODES IN THIS WORKFLOW ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ → Chat Trigger — public chat UI, auto session management → AI Agent (Tools Agent, v3.1) — 25 iterations, full prompt → MCP Client Tool (official) — all 42 tools, bearer auth → OpenRouter Chat Model — Claude Sonnet 4.5 → Postgres Chat Memory — 10-message window per session → Respond To Chat — formatted reply with error fallback ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🔁 CUSTOMISATION ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ → Swap to any LLM via OpenRouter or native Anthropic node → Adjust Postgres memory window (default: 10 messages) → Restrict the system prompt to specific workflow categories → Add a human approval step before deployment → Deploy n8n-mcp on Railway or any VPS for remote access ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ⚠️ SELF-HOSTED ONLY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Requires self-hosted n8n. The n8n-mcp container must be reachable from n8n over HTTP on the same Docker network. Not compatible with n8n Cloud without exposing the MCP server publicly. Docker Compose.yml and .env files https://docs.google.com/document/d/1qvkJnybhjeS3T2ySKsJ0MZF1rPZYcC1nbxBJjxC6Pqk/edit?usp=sharing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 📬 SUPPORT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Email: [email protected] LinkedIn: https://www.linkedin.com/in/salman-mehboob-pro/

View

Build an SEO chatbot with GPT-4o-mini using your Google Sheets data

### Description Add your approved SEO Q&A pairs to a Google Sheet, activate the workflow, and share the chat URL with your team or clients. The chatbot reads your entire knowledge base on every question and answers strictly from your own content — never from outside sources. Every conversation is automatically logged to a second sheet so you can track common questions and improve your knowledge base over time. Built for SEO agencies, consultants, and marketing teams who want a branded AI assistant without hallucinated or off-brand answers. --- ## What This Workflow Does - **Reads your knowledge base live** — Pulls all Q&A rows from your Google Sheet on every question so answers always reflect your latest approved content - **Answers strictly from your content** — GPT-4o-mini is instructed to never use outside knowledge, keeping every response on-brand and accurate - **Delivers an honest fallback** — If a question is not covered in your sheet, the bot tells the user directly and points them to your support team instead of guessing - **Maintains conversation memory** — Users can ask follow-up questions naturally without repeating context, just like a real chat - **Logs every exchange automatically** — Appends each session ID, timestamp, question, and answer to a Chat Log sheet for analytics and gap identification - **Returns the reply in real time** — The answer is sent back to the chat interface instantly at the same time as it is being logged --- ## Setup Requirements ### Tools Needed - n8n instance (self-hosted or cloud) - OpenAI account with GPT-4o-mini API access - Google account with two Google Sheets (one for the knowledge base, one for the chat log) ### Credentials Required - OpenAI API key - Google Sheets OAuth2 (used in two steps — read and log) > ⚠️ **Google Sheets OAuth2 appears in 2 steps** — connect it in both **3. Google Sheets — Read Knowledge Base** and **9. Google Sheets — Log Chat** **Estimated Setup Time: 15–20 minutes** --- ## Step-by-Step Setup 1. **Import the workflow** — Open n8n → Workflows → Import from JSON → paste the workflow JSON → click Import 2. **Create your Knowledge Base sheet** — Open Google Sheets → create a new sheet → add a tab named exactly **SEO FAQ** → add these four column headers in row 1: `Question`, `Answer`, `Category`, `Last Updated` → fill in at least 5–10 Q&A rows before testing 3. **Create your Chat Log sheet** — In the same Google Sheet or a separate one → add a tab named exactly **Chat Log** → add these four column headers in row 1: `Session ID`, `Timestamp`, `User Question`, `Bot Answer` 4. **Fill in Config Values** — Open node **2. Set — Config Values** → replace all placeholders: | Field | What to enter | |---|---| | `YOUR_KNOWLEDGE_BASE_SHEET_ID` | The ID from your Knowledge Base sheet URL (the string between `/d/` and `/edit`) | | `SEO FAQ` | Leave as-is, or change to match your tab name exactly | | `YOUR_CHAT_LOG_SHEET_ID` | The ID from your Chat Log sheet URL (same method) | | `Chat Log` | Leave as-is, or change to match your log tab name exactly | | `YOUR COMPANY NAME` | Your agency or business name | | `botPersona` | Edit the persona description to match your brand voice | 5. **Connect Google Sheets for reading** — Open node **3. Google Sheets — Read Knowledge Base** → click the credential dropdown → add Google Sheets OAuth2 → sign in with your Google account → authorize access 6. **Connect OpenAI** — Open node **6. OpenAI — GPT-4o-mini Model** → click the credential dropdown → add your OpenAI API key → test the connection 7. **Connect Google Sheets for logging** — Open node **9. Google Sheets — Log Chat** → click the credential dropdown → select the same Google Sheets OAuth2 credential you connected in step 5 8. **Activate the workflow** — Toggle the workflow to Active → click on node **1. Chat Message Received** → copy the Chat URL shown → share this URL with your team or embed it in your site --- ## How It Works (Step by Step) **Step 1 — Chat Trigger: Receive User Question** This step creates a public chat interface at a shareable URL. Every time a user types a message and hits send, that message is passed to the next step automatically. No credentials are needed for this step — it works as soon as the workflow is active. **Step 2 — Set: Config Values** Your knowledge base Sheet ID, log Sheet ID, tab names, company name, and bot persona are stored here as named variables. A unique session ID is also generated automatically for each new conversation so chats can be tracked individually in the log. **Step 3 — Google Sheets: Read Knowledge Base** The full contents of your SEO FAQ tab are read from Google Sheets every time a question comes in. This means you can add, edit, or remove Q&A rows at any time and the chatbot will reflect your changes immediately — no redeployment needed. **Step 4 — Code: Build Knowledge Base Text** All the rows from your sheet are formatted into a numbered Q&A text block grouped by category. The user's question is also pulled from the chat trigger here. If the sheet is empty, a clear fallback message is used instead of crashing the workflow. Everything is assembled into one clean package for the AI step. **Step 5 — AI Agent: SEO Consultant** GPT-4o-mini receives your bot persona, the full knowledge base text, and the user's question. It searches the knowledge base for a matching answer and responds in plain, friendly language. If the question is not covered, it returns exactly: *"I do not have information about this topic in my current knowledge base. Please contact [your company name] support directly for help with this."* Answers are kept under 150 words and contain no markdown formatting. **Step 6 — OpenAI: GPT-4o-mini Model** This is the language model powering the AI step. It runs at temperature 0.3 for factual, consistent answers and is capped at 400 tokens to keep responses concise and costs low per conversation. **Step 7 — Memory: Conversation Buffer** This step stores the conversation history for the current session. It allows users to ask follow-up questions naturally — for example, asking "Can you explain that in more detail?" — without the bot losing context from earlier in the same chat. **Step 8 — Set: Prepare Log Fields** The bot's answer, the user's original question, the session ID, and the timestamp are all assembled here into a clean set of fields ready for logging and for returning to the chat interface. **Step 9 — Google Sheets: Log Chat** A new row is appended to your Chat Log tab with four fields: session ID, timestamp, user question, and bot answer. This runs at the same time as the reply is sent back to the user. Over time this log shows you which questions are being asked most and where your knowledge base has gaps. **Step 10 — Set: Return Answer to Chat** The bot answer is sent back to the chat interface so the user sees the reply immediately. This step runs simultaneously with the logging step so there is no delay in the response. The final result: the user sees a clean, on-brand answer in the chat window within seconds, and the exchange is permanently recorded in your Google Sheet. --- ## Key Features ✅ **Live knowledge base sync** — Edit your Google Sheet and the chatbot reflects your changes on the very next question — no redeployment ✅ **Hallucination-free by design** — GPT-4o-mini is explicitly instructed to answer only from your sheet, never from its own training data ✅ **Honest fallback built in** — Unknown questions get a clear, branded fallback message instead of a confabulated answer that damages trust ✅ **Multi-turn conversation support** — Conversation memory lets users ask follow-up questions in the same session without losing context ✅ **Automatic chat logging** — Every exchange is saved to Google Sheets with session ID and timestamp — no manual export needed ✅ **Category-organized knowledge base** — Q&A rows include a Category column so the AI receives structured, scannable content per topic ✅ **Zero-cost trigger** — The chat interface is built into n8n with no third-party chat platform required ✅ **Token-efficient responses** — Answers are capped at 400 tokens and 150 words, keeping GPT costs predictable even at high chat volume --- ## Customisation Options **Expand the knowledge base to other topics** — In node **2. Set — Config Values**, edit the `botPersona` field to change the bot's scope from SEO-only to any topic your team needs (e.g. HR policies, product FAQs, onboarding guides) — then populate your sheet accordingly. **Increase answer length for complex topics** — In node **6. OpenAI — GPT-4o-mini Model**, raise `maxTokens` from 400 to 700 and change the 150-word limit in the prompt inside node **5. AI Agent — SEO Consultant** to allow longer, more detailed answers for technical subjects. **Use two separate Google Sheets** — If you want your knowledge base and chat log in different files, paste different Sheet IDs into `knowledgeBaseSheetId` and `logSheetId` in node **2. Set — Config Values** — the workflow handles both independently. **Add a Slack notification for unanswered questions** — After node **9. Google Sheets — Log Chat**, add a Slack node that checks if the bot answer contains the fallback phrase and posts an alert to a `#kb-gaps` channel so your team knows which topics to add next. **Track question frequency in sheets** — Add a Google Sheets step after logging that searches the Chat Log for the same question and increments a count in a separate Frequency tab — helping you prioritize which knowledge base gaps to fill first. --- ## Troubleshooting **Chat not responding after activation:** - Make sure the workflow is toggled to Active — inactive workflows do not respond to chat messages - Click on node **1. Chat Message Received** and confirm the Chat URL is visible — copy it fresh after activating - Send a test message and check the n8n execution log for which step failed **OpenAI credential not working:** - Confirm the API key is connected in node **6. OpenAI — GPT-4o-mini Model** specifically - Check that your OpenAI account has available credits — a depleted account silently fails - Verify the key has access to gpt-4o-mini — some restricted keys block specific models **Knowledge base returning empty or wrong answers:** - Open node **3. Google Sheets — Read Knowledge Base** and confirm the Google Sheets OAuth2 credential is connected - Check that `knowledgeBaseSheetId` in node **2. Set — Config Values** exactly matches the ID in your Google Sheet URL - Confirm the tab is named **SEO FAQ** exactly — spelling and capitalization must match `knowledgeBaseSheetName` **Chat Log not saving rows:** - Open node **9. Google Sheets — Log Chat** and confirm the Google Sheets OAuth2 credential is connected (this is a separate connection from the read step) - Check that `logSheetId` in node **2. Set — Config Values** is correct and the tab is named **Chat Log** exactly - Confirm your Chat Log sheet has the four column headers in row 1: `Session ID`, `Timestamp`, `User Question`, `Bot Answer` **Bot answering from outside the knowledge base:** - The prompt in node **5. AI Agent — SEO Consultant** explicitly restricts answers to the knowledge base — if the bot strays, check that the full prompt text is intact and has not been accidentally edited - Lower the temperature in node **6. OpenAI — GPT-4o-mini Model** from 0.3 to 0.1 for stricter, more literal responses --- ## Support Need help setting this up or want a custom version built for your team or agency? 📧 Email:[[email protected]]([email protected]) 🌐 Website: [https://www.incrementors.com/](https://www.incrementors.com/)

View

Build an OpenAI RAG system with document upload, semantic search and caching

## Overview This workflow implements a complete Retrieval-Augmented Generation (RAG) system for document ingestion and intelligent querying. It allows users to upload documents, convert them into vector embeddings, and query them using natural language. The system retrieves relevant document context and generates accurate AI responses while using caching to improve performance and reduce costs. This workflow is ideal for building AI knowledge bases, document assistants, and internal search systems. --- ## How It Works ### 1. Input & Configuration - Receives requests via webhook (`rag-system`) - Supports two actions: - `upload` → process documents - `query` → answer questions - Defines: - Chunk size & overlap - TopK retrieval count - Database table names --- ### Document Upload Flow 2. **Text Extraction** - Extracts text from uploaded PDF documents 3. **Text Chunking** - Splits text into overlapping chunks for better retrieval accuracy 4. **Document Structuring** - Converts chunks into structured documents 5. **Embedding Generation** - Generates vector embeddings using OpenAI 6. **Vector Storage** - Stores embeddings in PGVector (Postgres) 7. **Upload Logging** - Logs document metadata (user, filename, timestamp) 8. **Response** - Returns success message via webhook --- ### Query Flow 9. **Cache Check** - Checks if query result exists in cache (last 1 hour) 10. **Cache Routing** - If cached → return cached response - If not → proceed to retrieval --- ### Cache Hit Flow 11. **Format Cached Response** - Standardizes cached output format 12. **Respond to User** - Returns cached answer with `cached: true` --- ### Cache Miss Flow 13. **Vector Retrieval** - Retrieves top relevant document chunks from PGVector 14. **AI Answer Generation** - Uses LLM with retrieved context - Generates accurate, context-based answer 15. **Cache Storage** - Saves query + response in database for reuse 16. **Response** - Returns generated answer with `cached: false` --- ## Setup Instructions 1. **Webhook Setup** - Configure endpoint (`rag-system`) - Send payload with: - `action`: upload / query - `user_id` - `document` or `query` 2. **OpenAI Setup** - Add API credentials for: - Embeddings - Chat model 3. **Postgres + PGVector** - Enable PGVector extension - Create tables: - `documents` - `query_cache` - `upload_log` 4. **Configure Parameters** - Adjust: - Chunk size (e.g., 1000) - Overlap (e.g., 200) - TopK (e.g., 5) 5. **Optional Enhancements** - Add authentication layer - Add multi-tenant filtering (user_id) --- ## Use Cases - AI document search systems - Internal knowledge base assistants - Customer support knowledge retrieval - Legal or compliance document analysis - SaaS AI chat with custom data --- ## Requirements - OpenAI API key - Postgres database with PGVector - n8n instance (cloud or self-hosted) --- ## Key Features - Full RAG architecture (upload + query) - PDF document ingestion pipeline - Semantic search with vector embeddings - Context-aware AI responses - Query caching for performance optimization - Multi-user support via metadata filtering - Scalable and modular design --- ## Summary A complete RAG-based AI system that enables document ingestion, semantic search, and intelligent query answering. It combines vector databases, LLMs, and caching to deliver fast, accurate, and scalable AI-powered knowledge retrieval.

View

Need Custom Automation?

Get help designing a custom n8n workflow that connects your stack and fits your process.

Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector

Workflow preview

Overview

How it works

Set up steps