Skip to main content

Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets

Workflow preview

Workflow preview
100%
Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets preview
Open on n8n.io

1. Workflow Overview

Quick Overview This workflow ingests educational PDF URLs from Google Sheets, extracts and chunks their text, generates embeddings with Google Gemini, and stores them in a Supabase pgvector table f...

Best for

  • Internal Wiki automation workflows
  • AI RAG automation workflows
  • advanced n8n builders looking for reusable templates

Tools used

n8n-nodes-base.extractfromfile, n8n-nodes-base.code, n8n-nodes-base.googlesheets, n8n-nodes-base.filter, n8n-nodes-base.set, n8n-nodes-base.if, n8n-nodes-base.httprequest, @n8n/n8n-nodes-langchain.vectorstoresupabase

Source and attribution

This workflow is cataloged by N8N Workflows and links back to its original n8n.io source page by Salman Mehboob.

Original n8n.io source

1.1 Workflow description

Title
Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets
Workflow name
Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets

Quick Overview

This workflow ingests educational PDF URLs from Google Sheets, extracts and chunks their text, generates embeddings with Google Gemini, and stores them in a Supabase pgvector table for retrieval, while also exposing a public chat webhook that answers questions using Gemini and the same Supabase knowledge base.

How it works

  1. Runs every hour on a schedule and reads rows from a Google Sheets document, keeping only entries where the status is empty.
  2. Processes each queued URL one at a time and routes it based on whether it is a seraj-uae.com page or a direct Google Drive file link.
  3. For seraj-uae.com URLs, fetches the HTML page to extract the PDF download link (or embedded Google Drive file ID) and the document title.
  4. Downloads the PDF from either the source website via HTTP or from Google Drive, waits briefly, and extracts text from the PDF binary.
  5. Cleans and validates the extracted text, then splits it into overlapping chunks and generates embeddings using Google Gemini.
  6. Inserts the embedded chunks into a Supabase pgvector table with file/title/source metadata and updates the Google Sheets row to “Embedded” or “Not Text Based PDF” if no text is found.
  7. Separately, exposes a public n8n Chat webhook that receives student questions and uses Gemini Flash with Supabase vector retrieval to return answers.

Setup

  1. Create a Supabase project with pgvector enabled, create the seraj_documents table, and add the match_seraj_documents function used for vector search.
  2. Add Supabase API credentials in n8n and ensure the table name (seraj_documents) and query function name (match_seraj_documents) match your Supabase setup.
  3. Add Google Sheets OAuth2 credentials and update the Google Sheet document ID/sheet reference, ensuring columns include source_url, status, and row_number.
  4. Add Google Drive OAuth2 credentials for downloading PDFs hosted in Drive.
  5. Add a Google Gemini (PaLM) API credential for embeddings and Gemini Flash, then activate the workflow and copy the public chat webhook URL for your website chat widget.

1.2 Logical Blocks

This catalog entry is organized from the workflow JSON. The node-level section below shows the executable blocks available for review before importing the template.

2. Block-by-Block Analysis

Block 1 - Extract from File

Type / Role
n8n-nodes-base.extractFromFile - extractFromFile
Config choices
Version 1.1

Block 2 - Code in JavaScript

Type / Role
n8n-nodes-base.code - code
Config choices
Version 2

Block 3 - Get row(s) in sheet

Type / Role
n8n-nodes-base.googleSheets - googleSheets
Config choices
Version 4.7

Block 4 - Filter

Type / Role
n8n-nodes-base.filter - filter
Config choices
Version 2.3

Block 5 - All Done

Type / Role
n8n-nodes-base.set - set
Config choices
Version 3.4

Block 6 - Seraj or Drive?

Type / Role
n8n-nodes-base.if - if
Config choices
Version 2.2

Block 7 - Fetch File Page

Type / Role
n8n-nodes-base.httpRequest - httpRequest
Config choices
Version 4.4

Block 8 - Extract PDF URL + Title

Type / Role
n8n-nodes-base.code - code
Config choices
Version 2

Block 9 - Download PDF (Seraj)

Type / Role
n8n-nodes-base.httpRequest - httpRequest
Config choices
Version 4.4

Block 10 - Insert to Supabase pgvector

Type / Role
@n8n/n8n-nodes-langchain.vectorStoreSupabase - vectorStoreSupabase
Config choices
Version 1.3

Block 11 - PDF Binary Loader

Type / Role
@n8n/n8n-nodes-langchain.documentDefaultDataLoader - documentDefaultDataLoader
Config choices
Version 1.1

Block 12 - Recursive Text Splitter

Type / Role
@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter - textSplitterRecursiveCharacterTextSplitter
Config choices
Version 1

Block 13 - Update row in sheet

Type / Role
n8n-nodes-base.googleSheets - googleSheets
Config choices
Version 4.7

Block 14 - Loop Over Items

Type / Role
n8n-nodes-base.splitInBatches - splitInBatches
Config choices
Version 3

Block 15 - Extract Drive File ID + Title

Type / Role
n8n-nodes-base.code - code
Config choices
Version 2

Block 16 - Download PDF (Drive)

Type / Role
n8n-nodes-base.googleDrive - googleDrive
Config choices
Version 3

Block 17 - Drive Link?

Type / Role
n8n-nodes-base.if - if
Config choices
Version 2.3

Block 18 - Wait

Type / Role
n8n-nodes-base.wait - wait
Config choices
Version 1.1

Block 19 - Embeddings Google Gemini

Type / Role
@n8n/n8n-nodes-langchain.embeddingsGoogleGemini - embeddingsGoogleGemini
Config choices
Version 1

Block 20 - text pdf?

Type / Role
n8n-nodes-base.if - if
Config choices
Version 2.3

Block 21 - Update row in sheet1

Type / Role
n8n-nodes-base.googleSheets - googleSheets
Config choices
Version 4.7

Block 22 - Schedule Trigger

Type / Role
n8n-nodes-base.scheduleTrigger - scheduleTrigger
Config choices
Version 1.3

Block 23 - Sticky Note

Type / Role
n8n-nodes-base.stickyNote - stickyNote
Config choices
Version 1

Block 24 - Sticky Note1

Type / Role
n8n-nodes-base.stickyNote - stickyNote
Config choices
Version 1

Showing the first 24 of 36 workflow blocks. Download the JSON for the full node graph.

3. Summary Table

Workflow Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets
Complexity advanced
Nodes 36
Categories Internal Wiki, AI RAG
Author Salman Mehboob
Published 11 Jun 2026

4. Reproducing the Workflow from Scratch

  1. 1. Download the workflow JSON

    Use the JSON export at /data/workflows/16258/16258.json as the source template for this automation.

  2. 2. Import the template into n8n

    Open n8n, import the downloaded JSON, and review each node before activating the workflow.

  3. 3. Configure credentials and variables

    Replace placeholder credentials, API keys, webhook URLs, account IDs, and environment-specific values with your own settings.

  4. 4. Test with sample data

    Run the workflow manually or in a staging workspace, inspect node output, and confirm downstream systems receive the expected data.

  5. 5. Activate and monitor

    Enable the workflow only after testing, then monitor executions, errors, and rate limits during the first production runs.

5. General Notes & Resources

Review imported nodes carefully before activation. This catalog entry is intended to help you inspect the workflow structure, understand required services, and find related templates faster.

Node names, credentials, schedules, webhook paths, and external service limits may need adjustment for your workspace.

Frequently asked questions

What does Build a RAG knowledge base from PDFs with Gemini, Supabase and Google Sheets do?

Quick Overview This workflow ingests educational PDF URLs from Google Sheets, extracts and chunks their text, generates embeddings with Google Gemini, and stores them in a Supabase pgvector table f...

What do I need before importing this workflow?

Review the workflow JSON, configure any required credentials in n8n, and test the automation in a safe workspace before using it in production.

Can I customize this workflow?

Yes. Use the block-by-block analysis and the downloadable JSON to inspect each node, then adjust credentials, prompts, schedules, filters, or destinations for your Internal Wiki, AI RAG use case.