Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets

Workflow preview

100%

Open on n8n.io

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER

Early Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

1. Workflow Overview

Quick Overview This workflow ingests educational PDF URLs from Google Sheets, extracts and chunks their text, generates embeddings with Google Gemini, and stores them in a Supabase pgvector table f...

Best for

Internal Wiki automation workflows
AI RAG automation workflows
advanced n8n builders looking for reusable templates

Tools used

n8n-nodes-base.extractfromfile, n8n-nodes-base.code, n8n-nodes-base.googlesheets, n8n-nodes-base.filter, n8n-nodes-base.set, n8n-nodes-base.if, n8n-nodes-base.httprequest, @n8n/n8n-nodes-langchain.vectorstoresupabase

Source and attribution

This workflow is cataloged by N8N Workflows and links back to its original n8n.io source page by Salman Mehboob.

Original n8n.io source

1.1 Workflow description

Title: Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets
Workflow name: Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets

Quick Overview

This workflow ingests educational PDF URLs from Google Sheets, extracts and chunks their text, generates embeddings with Google Gemini, and stores them in a Supabase pgvector table for retrieval, while also exposing a public chat webhook that answers questions using Gemini and the same Supabase knowledge base.

How it works

Runs every hour on a schedule and reads rows from a Google Sheets document, keeping only entries where the status is empty.
Processes each queued URL one at a time and routes it based on whether it is a seraj-uae.com page or a direct Google Drive file link.
For seraj-uae.com URLs, fetches the HTML page to extract the PDF download link (or embedded Google Drive file ID) and the document title.
Downloads the PDF from either the source website via HTTP or from Google Drive, waits briefly, and extracts text from the PDF binary.
Cleans and validates the extracted text, then splits it into overlapping chunks and generates embeddings using Google Gemini.
Inserts the embedded chunks into a Supabase pgvector table with file/title/source metadata and updates the Google Sheets row to “Embedded” or “Not Text Based PDF” if no text is found.
Separately, exposes a public n8n Chat webhook that receives student questions and uses Gemini Flash with Supabase vector retrieval to return answers.

Setup

Create a Supabase project with pgvector enabled, create the seraj_documents table, and add the match_seraj_documents function used for vector search.
Add Supabase API credentials in n8n and ensure the table name (seraj_documents) and query function name (match_seraj_documents) match your Supabase setup.
Add Google Sheets OAuth2 credentials and update the Google Sheet document ID/sheet reference, ensuring columns include source_url, status, and row_number.
Add Google Drive OAuth2 credentials for downloading PDFs hosted in Drive.
Add a Google Gemini (PaLM) API credential for embeddings and Gemini Flash, then activate the workflow and copy the public chat webhook URL for your website chat widget.

1.2 Logical Blocks

This catalog entry is organized from the workflow JSON. The node-level section below shows the executable blocks available for review before importing the template.

2. Block-by-Block Analysis

Block 1 - Extract from File

Type / Role: n8n-nodes-base.extractFromFile - extractFromFile
Config choices: Version 1.1

Block 2 - Code in JavaScript

Type / Role: n8n-nodes-base.code - code
Config choices: Version 2

Block 3 - Get row(s) in sheet

Type / Role: n8n-nodes-base.googleSheets - googleSheets
Config choices: Version 4.7

Block 4 - Filter

Type / Role: n8n-nodes-base.filter - filter
Config choices: Version 2.3

Block 5 - All Done

Type / Role: n8n-nodes-base.set - set
Config choices: Version 3.4

Block 6 - Seraj or Drive?

Type / Role: n8n-nodes-base.if - if
Config choices: Version 2.2

Block 7 - Fetch File Page

Type / Role: n8n-nodes-base.httpRequest - httpRequest
Config choices: Version 4.4

Block 8 - Extract PDF URL + Title

Type / Role: n8n-nodes-base.code - code
Config choices: Version 2

Block 9 - Download PDF (Seraj)

Type / Role: n8n-nodes-base.httpRequest - httpRequest
Config choices: Version 4.4

Block 10 - Insert to Supabase pgvector

Type / Role: @n8n/n8n-nodes-langchain.vectorStoreSupabase - vectorStoreSupabase
Config choices: Version 1.3

Block 11 - PDF Binary Loader

Type / Role: @n8n/n8n-nodes-langchain.documentDefaultDataLoader - documentDefaultDataLoader
Config choices: Version 1.1

Block 12 - Recursive Text Splitter

Type / Role: @n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter - textSplitterRecursiveCharacterTextSplitter
Config choices: Version 1

Block 13 - Update row in sheet

Type / Role: n8n-nodes-base.googleSheets - googleSheets
Config choices: Version 4.7

Block 14 - Loop Over Items

Type / Role: n8n-nodes-base.splitInBatches - splitInBatches
Config choices: Version 3

Block 15 - Extract Drive File ID + Title

Type / Role: n8n-nodes-base.code - code
Config choices: Version 2

Block 16 - Download PDF (Drive)

Type / Role: n8n-nodes-base.googleDrive - googleDrive
Config choices: Version 3

Block 17 - Drive Link?

Type / Role: n8n-nodes-base.if - if
Config choices: Version 2.3

Block 18 - Wait

Type / Role: n8n-nodes-base.wait - wait
Config choices: Version 1.1

Block 19 - Embeddings Google Gemini

Type / Role: @n8n/n8n-nodes-langchain.embeddingsGoogleGemini - embeddingsGoogleGemini
Config choices: Version 1

Block 20 - text pdf?

Type / Role: n8n-nodes-base.if - if
Config choices: Version 2.3

Block 21 - Update row in sheet1

Type / Role: n8n-nodes-base.googleSheets - googleSheets
Config choices: Version 4.7

Block 22 - Schedule Trigger

Type / Role: n8n-nodes-base.scheduleTrigger - scheduleTrigger
Config choices: Version 1.3

Block 23 - Sticky Note

Type / Role: n8n-nodes-base.stickyNote - stickyNote
Config choices: Version 1

Block 24 - Sticky Note1

Type / Role: n8n-nodes-base.stickyNote - stickyNote
Config choices: Version 1

Showing the first 24 of 36 workflow blocks. Download the JSON for the full node graph.

3. Summary Table

Workflow	Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets
Complexity	advanced
Nodes	36
Categories	Internal Wiki, AI RAG
Author	Salman Mehboob
Published	11 Jun 2026

4. Reproducing the Workflow from Scratch

1. Download the workflow JSON

Use the JSON export at /data/workflows/16258/16258.json as the source template for this automation.
2. Import the template into n8n

Open n8n, import the downloaded JSON, and review each node before activating the workflow.
3. Configure credentials and variables

Replace placeholder credentials, API keys, webhook URLs, account IDs, and environment-specific values with your own settings.
4. Test with sample data

Run the workflow manually or in a staging workspace, inspect node output, and confirm downstream systems receive the expected data.
5. Activate and monitor

Enable the workflow only after testing, then monitor executions, errors, and rate limits during the first production runs.

5. General Notes & Resources

Review imported nodes carefully before activation. This catalog entry is intended to help you inspect the workflow structure, understand required services, and find related templates faster.

Node names, credentials, schedules, webhook paths, and external service limits may need adjustment for your workspace.

Download workflow JSON Original n8n.io source Internal Wiki workflows AI RAG workflows

Frequently asked questions

What does Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets do?

What do I need before importing this workflow?

Review the workflow JSON, configure any required credentials in n8n, and test the automation in a safe workspace before using it in production.

Can I customize this workflow?

Yes. Use the block-by-block analysis and the downloadable JSON to inspect each node, then adjust credentials, prompts, schedules, filters, or destinations for your Internal Wiki, AI RAG use case.

Salman Mehboob

18 workflows

Nodes

n8n-nodes-base.extractfromfile n8n-nodes-base.code n8n-nodes-base.googlesheets n8n-nodes-base.filter n8n-nodes-base.set n8n-nodes-base.if n8n-nodes-base.httprequest @n8n/n8n-nodes-langchain.vectorstoresupabase

Complexity

advanced

Published 11 Jun 2026

Likes 0

View on n8n.io Download Workflow

Install path: /data/workflows/16258/16258.json

Share Your Workflow

Have a useful automation to share? Publish it and help the community.

Submit Your Template How to Submit

Related Workflows

Build a Slack knowledge graph with Claude, Neo4j and Google Sheets

## Quick Overview This workflow ingests chat messages via a webhook or a 15-minute schedule, filters out low-signal content, uses Anthropic Claude to extract entities and relationships, and stores the resulting knowledge graph in Neo4j while appending an audit log row to Google Sheets. ## How it works 1. Receives a chat message via an n8n webhook or runs every 15 minutes to process polled Slack data. 2. Normalizes the incoming payload into consistent fields like channel, sender, timestamp, and message text. 3. Uses a Python script to tag message signals (questions, decisions, action items) and filter out noise such as greetings, bots, and very short messages. 4. Sends each relevant message to Anthropic Claude to extract entities, relationships, a summary, message type, and importance. 5. Sends the extracted graph to Anthropic Claude again to add implicit relationships, weights, decision-chain metadata, and a thread category. 6. Merges both AI passes into a single nodes-and-edges knowledge graph, validates references and edge weights, and drops invalid graphs. 7. Upserts the graph into Neo4j via its transactional HTTP endpoint, appends an audit row to Google Sheets, and returns a JSON success response to the webhook caller. ## Setup 1. Add an Anthropic API credential and select the Claude model used by the two AI steps. 2. Configure Neo4j access by replacing the Neo4j HTTP endpoint URL and setting up HTTP Basic Auth credentials for your Neo4j instance. 3. Configure Google Sheets OAuth2 credentials and replace the placeholder Google Sheet ID, sheet tab name/range (KnowledgeGraph), and any required API permissions. 4. If using the webhook ingest path, copy the production webhook URL from n8n and configure your Slack/app integration to POST message payloads to `/chat-ingest` with the expected fields (text, channel, sender, timestamps).

View

Answer voice queries from a webhook over Google Drive docs using GPT-4o-mini and Supabase

## Quick overview Placetel AI – RAG Voice Assistant with Google Drive & Supabase ## How it works 1. Runs on a daily schedule at 02:00 or via manual start to reindex documents. 2. Lists files from a specified Google Drive folder and iterates through each file. 3. Downloads each Google Drive file, loads its text content, creates embeddings with OpenAI, and stores the resulting chunks in a Supabase vector table. 4. Receives a question via a POST webhook with a JSON body containing `chatInput`. 5. Generates an answer with GPT-4o-mini by semantically retrieving relevant passages from the Supabase vector store using the same OpenAI embeddings model. 6. Returns the generated, source-cited response to the webhook caller for voice output. ## Setup 1. Add Google Drive OAuth2 credentials and replace `DEINE_ORDNER_ID` in the Drive query with the folder you want to index. 2. Add an OpenAI API credential and ensure the same embeddings model/settings are used for both indexing and querying. 3. Create/configure a Supabase project with a `documents` table and the `match_documents` RPC/query used for vector search, then add your Supabase credentials. 4. Copy the webhook URL from the webhook trigger and configure your calling system to POST `{ "chatInput": "..." }` to it.

View

Run a skills-based knowledge chatbot with Google Sheets and GPT-4o-mini

## Quick overview This workflow creates a session-based chatbot in n8n Chat that loads “skills” from Google Sheets and uses OpenAI GPT-4o-mini to respond strictly according to the matched skill’s trigger phrases and instructions. ## How it works 1. Receives each message from the built-in n8n Chat trigger. 2. Retrieves all rows from a Google Sheets document that contains a Skills tab. 3. Filters the rows to skills marked Active = Yes and formats them into a structured skills context for the agent. 4. Uses an OpenAI GPT-4o-mini powered agent with session memory to match the user message to the best skill via trigger phrases and generate a response that follows the skill instructions exactly. 5. Sends the agent’s output back to the n8n chat interface. ## Setup 1. Create a Google Sheet with a Skills tab and columns for Skill Name, Trigger Phrases, Instructions, Example Output, and Active (set to Yes for enabled skills). 2. Add a Google Sheets OAuth credential and replace YOUR_SKILLS_SHEET_ID with your spreadsheet ID. 3. Add an OpenAI credential and ensure the model is set to gpt-4o-mini. 4. Activate the workflow and use the Chat Trigger URL to start a conversation. ## Requirements - Active n8n instance (self-hosted or cloud) - OpenAI account with GPT-4o-mini API access - Google Sheets with a tab named Skills — column headers must be added before the first run - At least one skill row with Active set to Yes before testing ## Customization - Add more skills at any time — open your Google Sheet, add a new row with a skill name, trigger phrases, instructions, and set Active to Yes — the change takes effect on the next chat message with no workflow changes needed - Swap GPT-4o-mini for a different model — open the OpenAI — GPT-4o-mini step and change the model selection to gpt-4o or gpt-4.1-mini depending on the response quality and cost balance you need - Add a fallback email alert for unmatched messages — after node 5. Respond to Chat, add a Gmail step with an IF check that fires when the response contains the fallback message text, sending you an email with the unmatched question so you can add a new skill to cover it - Change the session memory window — open the Session Memory step and adjust the window size to store more or fewer previous messages in the conversation context — a larger window improves multi-turn follow-ups but uses more tokens - Add a Slack interface instead of the built-in chat — replace node 1. Chat Trigger with a Slack trigger that listens for messages in a specific channel, so your team can use the skills chatbot directly from Slack without opening n8n ## Additional info Trigger phrases drive skill matching — the more specific and varied your trigger phrases are, the more accurately the agent picks the right skill. For example, instead of just "email" use "cold email, outreach email, sales email, prospecting email" to cover different ways users might phrase the same request. If a user message could match multiple skills, the agent picks the most relevant one based on the full message context. If you notice the wrong skill being selected, make the trigger phrases of each skill more distinct from each other. The Active column controls which skills are loaded on each run. Setting a row to anything other than Yes — including No, Inactive, or leaving it blank — silently excludes it. You can use this to turn skills on and off without deleting them. The workflow reads the full Skills tab on every single chat message. For large skill sheets with many rows this adds a small delay. If you have more than 50 skills and notice slow responses, consider archiving inactive skills to a separate tab to keep the active tab lean.

View

Need Custom Automation?

Get help designing a custom n8n workflow that connects your stack and fits your process.

Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets

Workflow preview

1. Workflow Overview

Best for

Tools used

Source and attribution

1.1 Workflow description

Quick Overview

How it works

Setup

1.2 Logical Blocks

2. Block-by-Block Analysis

Block 1 - Extract from File

Block 2 - Code in JavaScript

Block 3 - Get row(s) in sheet

Block 4 - Filter

Block 5 - All Done

Block 6 - Seraj or Drive?

Block 7 - Fetch File Page

Block 8 - Extract PDF URL + Title

Block 9 - Download PDF (Seraj)

Block 10 - Insert to Supabase pgvector

Block 11 - PDF Binary Loader

Block 12 - Recursive Text Splitter

Block 13 - Update row in sheet

Block 14 - Loop Over Items

Block 15 - Extract Drive File ID + Title

Block 16 - Download PDF (Drive)

Block 17 - Drive Link?

Block 18 - Wait

Block 19 - Embeddings Google Gemini

Block 20 - text pdf?

Block 21 - Update row in sheet1

Block 22 - Schedule Trigger

Block 23 - Sticky Note

Block 24 - Sticky Note1

3. Summary Table

4. Reproducing the Workflow from Scratch

1. Download the workflow JSON

2. Import the template into n8n

3. Configure credentials and variables

4. Test with sample data

5. Activate and monitor

5. General Notes & Resources

Frequently asked questions