Turn your website docs into a GPT-4.1-mini support chatbot with MrScraper and Pinecone

Name: Turn your website docs into a GPT-4.1-mini support chatbot with MrScraper and Pinecone
Availability: InStock
Author: riandra

Turn your website docs into a GPT-4.1-mini support chatbot with MrScraper and Pinecone preview

Open on n8n.io

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER

Early Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

Overview

Description

This n8n template turns any website or documentation portal into a fully functional AI-powered support chatbot — no manual copy-pasting, no static FAQs. It uses MrScraper to crawl and extract your site's content, OpenAI to generate embeddings, and Pinecone to store and retrieve that knowledge at chat time.

The result is a retrieval-augmented chatbot that answers questions using only your actual website content, always cites its sources, and never hallucinates policies or pricing.

How It Works

Phase 1 – URL Discovery: The Map Agent crawls your target domain using include/exclude patterns to discover all relevant documentation or help center pages. It returns a clean, deduplicated list of URLs ready for content extraction.
Phase 2 – Page Content Extraction: Each discovered URL is processed in controlled batches by the General Agent, which extracts the readable content (title + main text) from every page. Low-quality or near-empty pages are automatically filtered out.
Phase 3 – Chunking & Embedding: Page text is split into overlapping chunks (default: ~1,100 chars with 180-char overlap) to preserve context at boundaries. Each chunk is sent to OpenAI Embeddings to generate a vector, then stored in Pinecone with metadata including the source URL, page title, and chunk index.
Phase 4 – Chat Endpoint: A Chat Trigger exposes a webhook endpoint your website or widget can connect to. When a user asks a question, the Support Chat Agent queries Pinecone for the most relevant chunks and generates a grounded answer using GPT-4.1-mini — always with source URLs included and strict anti-hallucination rules enforced.

How to Set Up

Create 2 scrapers in your MrScraper account:

Map Agent Scraper (for crawling and discovering page URLs)
General Agent Scraper (for extracting title + content from each page)
Copy the scraperId for each — you'll need these in n8n.

Set up your Pinecone index:

Create a Pinecone index with dimensions that match your chosen OpenAI embedding model (e.g. 1536 for text-embedding-ada-002)
Choose a namespace (recommended format: docs-yourdomain)

Add your credentials in n8n:

MrScraper API token
OpenAI API key (used for both embeddings and the chat model)
Pinecone API key

Configure the Map Agent node:

Set your target domain or docs root URL (e.g. https://docs.yoursite.com)
Set includePatterns to focus on relevant sections (e.g. /docs/, /help/, /support/)
Optionally set excludePatterns to skip noise (e.g. /assets/, /tag/, /static/)

Configure the General Agent node:

Enter your General Agent scraperId
Adjust the batch size in the SplitInBatches node (start with 1–5 to stay within rate limits)

Configure the Pinecone nodes:

Select your Pinecone index in both the Upsert and Retriever nodes
Set the correct namespace in both nodes so indexing and retrieval use the same data

Customise the chatbot system prompt:

Edit the Support Chat Agent's system message to set the chatbot's name, tone, and rules
Adjust topK in the Pinecone Retriever (default: 8) based on how much context you want per answer

Connect your chat widget or frontend to the Chat Trigger webhook URL generated by n8n

Requirements

MrScraper account with API access enabled
OpenAI account (for embeddings and GPT-4.1-mini chat)
Pinecone account with an index created and ready

Good to Know

The overlap between chunks (default 180 chars) is intentional — it prevents answers from being cut off at chunk boundaries and significantly improves retrieval quality.
The chatbot is configured to cite 1–3 source URLs per answer, so users can always verify the information themselves.
The anti-hallucination rules in the system prompt instruct the agent to say it can't find an answer rather than guess — making it safe to use for support, pricing, or policy questions.
Re-indexing is as simple as re-running the workflow. Use a consistent Pinecone namespace and upsert mode to update existing vectors without duplicating them.

Customising This Workflow

Swap the chat model: Replace GPT-4.1-mini with GPT-4o or another OpenAI model for higher-quality answers on complex queries.
Scheduled re-indexing: Add a Schedule Trigger to automatically re-crawl and re-index your docs whenever content changes.
Multiple knowledge bases: Use different Pinecone namespaces (e.g. docs-product, docs-api) and route questions to the right namespace based on user intent.
Embed on your website: Connect the Chat Trigger webhook to any chat widget library to give your users a live support experience powered entirely by your own documentation.
Multilingual support: Add a translation node before chunking to index content in multiple languages and serve a global audience.

riandra

5 workflows

Complexity

advanced

Published 02 Mar 2026

Likes 0

View on n8n.io Download Workflow

Install path: /data/workflows/13802/13802.json

Share Your Workflow

Have a useful automation to share? Publish it and help the community.

Submit Your Template How to Submit

Related Workflows

Convert Supabase support FAQs to audio with Google Cloud TTS and Webflow

Enhance your support documentation with an audio-first experience. This workflow converts Supabase FAQ entries into high-quality audio using Google Cloud TTS, hosts them via a CDN, and embeds them into your Webflow support pages. --- ## 🎯 What This Workflow Does This template transforms static FAQ content into a professional audio library through a multi-stage process. --- ### 🔔 Step 1 — Webhook Intake & Intelligent Fetch - **Webhook Trigger:** Processes specific FAQ IDs or categories on demand - **Supabase Query:** Fetches unprocessed or requested FAQ records --- ### 📋 Step 2 — Data Enrichment & SSML Preparation - **Deduplication:** Ensures each FAQ is processed only once - **SSML Formatting:** Adds pauses and emphasis for natural speech delivery --- ### 🗣️ Step 3 — Google Cloud TTS & Hosting - **Audio Generation:** Converts text to MP3 using Google Cloud TTS (WaveNet) - **CDN Hosting:** Uploads audio via UploadToURL to generate a public URL --- ### 🌐 Step 4 — CMS Update & Database Write-Back - **Webflow Update:** Inserts audio URL into FAQ pages for playback - **Supabase Update:** Marks records as published with timestamps --- ### 💼 Step 5 — Reporting & Confirmations - **Microsoft Teams Alert:** Sends summary with audio links - **Webhook Response:** Returns JSON confirmation to trigger source --- ## ✨ Key Features - **Accessibility First:** Enables audio-based support consumption - **SSML Intelligence:** Improves clarity with structured speech formatting - **Rate-Limit Safe:** Processes items sequentially to avoid API limits - **Reliable Hosting:** UploadToURL ensures stable public audio URLs --- ## 🔧 Setup Requirements ### Required Integrations - **Supabase:** FAQ table - **Google Cloud TTS:** API enabled - **Webflow:** CMS API access - **UploadToURL:** CDN hosting - **Microsoft Teams:** Webhook URL --- Upgrade your support experience. Import this template to turn your FAQ library into a voice-enabled support center.

View

Plan voice-based travel and calendar bookings with Claude AI

A hands-free travel planning assistant that accepts voice messages via WhatsApp and Telegram, understands natural language travel requests, searches across multiple providers, and automatically books to your calendar with smart recommendations. ### How it works 1. **Voice Message Reception** - WhatsApp/Telegram webhooks capture incoming voice notes and calls 2. **Audio Transcription** - Converts voice to text using OpenAI Whisper or Google Speech-to-Text 3. **Intent Classification** - Claude AI analyzes the request to determine travel intent and parameters 4. **Context Enrichment** - Pulls user preferences, past trips, and budget profiles from database 5. **Multi-Source Travel Search** - Queries flights (Skyscanner), hotels (Booking.com), activities in parallel 6. **Smart Filtering & Ranking** - AI applies user preferences, budget constraints, and optimal timing 7. **Natural Response Generation** - Claude crafts conversational voice-friendly responses 8. **Calendar Auto-Add** - Creates Google Calendar events with travel details and reminders 9. **Voice Response Delivery** - Sends text + voice message back via original messaging platform 10. **Confirmation & Booking Links** - Provides quick-action buttons for booking or modifying search 11. **Proactive Follow-ups** - Sends price drop alerts and departure reminders 12. **Multi-Turn Conversation** - Maintains context for refinement requests ### Setup Steps 1. Import workflow into n8n 2. Configure credentials: - **Anthropic API** - Claude AI for NLP and response generation - **OpenAI API** - Whisper for voice transcription - **WhatsApp Business API** - Voice message reception and sending - **Telegram Bot API** - Alternative messaging platform - **Google Calendar API** - Automatic event creation - **Flight Search API** - Skyscanner, Amadeus, or Kiwi.com - **Hotel API** - Booking.com or Hotels.com partner API - **Google Sheets** - User preferences and conversation history - **MongoDB or PostgreSQL** - Conversation state management 3. Set up WhatsApp Business account and webhook 4. Create Telegram bot via @BotFather 5. Configure Google Calendar shared calendar for travel 6. Populate user preferences sheet with defaults 7. Set API keys for travel search providers 8. Activate workflow and test with sample voice message ### Sample Voice Requests **Simple Flight Search:** "Hey, find me cheap flights to Paris next month" **Complex Multi-City:** "I need to go to Tokyo in March for a week, then Bangkok for 3 days, budget is $2000 total" **Hotel Only:** "Book a hotel in Barcelona for May 15th to 20th, somewhere near the beach under $150 per night" **Full Package:** "Plan a romantic weekend in Santorini for our anniversary in June, nice hotel with sunset view, under $3000 for two people" **Activity Search:** "What are the best things to do in Amsterdam for 3 days, we like museums and food tours" **Calendar Query:** "When am I flying to London next month? And can you add a reminder 2 days before?" ### Voice Message Webhook Payload ```json { "platform": "whatsapp", "messageId": "wamid.ABC123XYZ", "from": "+15551234567", "timestamp": 1735804800, "type": "audio", "audio": { "id": "audio_id_12345", "mimeType": "audio/ogg", "sha256": "abc123...", "duration": 15, "url": "https://media.whatsapp.com/audio/abc123" }, "context": { "conversationId": "conv-user-001", "previousMessageId": null } } ``` ### Enterprise Features **Voice Intelligence:** - Multi-language transcription (30+ languages) - Accent-adaptive recognition - Background noise filtering - Speaker emotion detection for urgency **Smart Travel Search:** - Multi-provider aggregation (flights, hotels, activities) - Real-time price comparison - Flexible date search (±3 days optimization) - Budget-aware filtering - Loyalty program integration **AI-Powered Personalization:** - Learns from past bookings and preferences - Remembers dietary restrictions, seating preferences - Adapts to budget patterns - Suggests destinations based on season and interests **Proactive Assistance:** - Price drop alerts for saved searches - Flight delay notifications - Weather warnings before departure - Packing list generation - Travel insurance reminders **Calendar Intelligence:** - Conflict detection with existing events - Travel time buffer insertion - Timezone-aware scheduling - Shared calendar support for group trips - Automatic itinerary attachment **Security & Privacy:** - End-to-end encryption for voice messages - PII redaction in logs - Secure credential storage - GDPR-compliant data handling - User data deletion on request **Multi-Platform Support:** - WhatsApp Business - Telegram - Facebook Messenger - SMS fallback - Web widget integration

View

Book and manage appointments with Google Calendar and Gmail

## Overview This workflow automates the complete appointment booking process, from request validation to scheduling, notifications, and reminders. It checks calendar availability in real time, prevents double bookings, suggests alternative slots when unavailable, and automatically sends confirmations and reminders—ensuring a smooth and reliable booking experience. Perfect for service-based businesses, consultants, and teams managing appointments at scale. --- ## How It Works 1. **Webhook Trigger** - Receives booking requests (name, email, date, time, notes). 2. **Workflow Configuration** - Defines: - Google Calendar ID - Appointment duration - Business hours - Sender email 3. **Data Validation** - Parses and validates input fields - Ensures required data is present and correctly formatted 4. **Calendar Availability Check** - Fetches existing events from Google Calendar - Compares requested time with existing bookings 5. **Conflict Detection** - Detects overlapping events - Determines whether the slot is available 6. **Decision Logic** - If available → proceed with booking - If not available → trigger alternative flow --- ### Booking Flow (Available Slot) 7. **Create Calendar Event** - Schedules appointment in Google Calendar - Adds attendee and event details 8. **Confirmation Email** - Sends booking confirmation with event details - Includes calendar event link 9. **Webhook Response** - Returns success response to client/system 10. **Reminder System** - Schedules automated reminders: - 24-hour reminder before appointment - 1-hour reminder before appointment - Uses wait nodes to trigger emails at exact times - Includes appointment details to reduce no-shows --- ### Alternative Flow (Unavailable Slot) 11. **Generate Alternative Slots** - Finds next available time slots within business hours - Ensures slots are within a defined time window 12. **Alternative Email Notification** - Sends suggested time slots to the user 13. **Webhook Response** - Returns unavailable status with alternatives --- ## Setup Instructions 1. **Webhook Setup** - Configure endpoint (`booking`) - Connect to your frontend or booking form 2. **Google Calendar** - Add Google Calendar credentials - Set calendar ID 3. **Gmail Integration** - Add Gmail credentials for: - Confirmation emails - Reminder emails - Alternative slot notifications 4. **Configure Parameters** - Set: - Appointment duration (e.g., 60 minutes) - Business hours (e.g., 9–17) - Sender email 5. **Customize Messages** - Edit email templates for: - Confirmation - Alternatives - Reminders --- ## Use Cases - Appointment booking systems for businesses - Coaching and consulting session scheduling - Service-based business automation (salons, clinics, etc.) - Internal team scheduling tools - Calendly-style booking workflows --- ## Requirements - Google Calendar account - Gmail account - n8n instance (cloud or self-hosted) --- ## Key Features - Real-time availability checking - Automatic conflict detection - Calendar event creation - Alternative slot suggestions - Email notifications and confirmations - Automated reminder system (24h + 1h) - Fully customizable booking logic --- ## Summary A complete appointment scheduling system that automates booking validation, calendar management, notifications, and reminders. It reduces manual coordination, prevents scheduling conflicts, and improves attendance with automated follow-ups.

View

Need Custom Automation?

Get help designing a custom n8n workflow that connects your stack and fits your process.

Turn your website docs into a GPT-4.1-mini support chatbot with MrScraper and Pinecone

Workflow preview

Overview

Description

How It Works

How to Set Up

Requirements

Good to Know

Customising This Workflow