Auto-update knowledge base with Drive, LlamaIndex & Azure OpenAI embeddings
$20/month : Unlimited workflows
2500 executions/month
THE #1 IN WEB SCRAPING
Scrape any website without limits
HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20% Try free
DISCOUNT 20%
Self-hosted n8n
Unlimited workflows - from $4.99/mo
#1 hub for scraping, AI & automation
6000+ actors - $5 credits/mo
This Workflow auto-ingests Google Drive documents, parses them with LlamaIndex, and stores Azure OpenAI embeddings in an in-memory vector store—cutting manual update time from ~30 minutes to under 2 minutes per doc.
Why Use This Workflow?
Cost Reduction: Eliminates pays monthly fee on cloud just for store knowledge
Ideal For
- Knowledge Managers / Documentation Teams: Automatically keep product docs and SOPs in sync when source files change on Google Drive.
- Support Teams: Ensure the searchable KB is always up-to-date after doc edits, speeding agent onboarding and resolution time.
- Developer / AI Teams: Populate an in-memory vector store for experiments, rapid prototyping, or local RAG demos.
How It Works
- Trigger: Google Drive Trigger watches a specific document or folder for updates.
- Data Collection: The updated file is downloaded from Google Drive.
- Processing: The file is uploaded to LlamaIndex cloud via an HTTP Request to create a parsing job.
- Intelligence Layer: Workflow polls LlamaIndex job status (Wait + Monitor loop). If parsing status equals SUCCESS, the result is retrieved as markdown.
- Output & Delivery: Parsed markdown is loaded into LangChain's Default Data Loader, passed to Azure OpenAI embeddings (deployment "3small"), then inserted into an in-memory vector store.
- Storage & Logging: Vector store holds embeddings in memory (good for prototyping). Optionally persist to an external vector DB for production.
Setup Guide
Prerequisites
| Requirement | Type | Purpose |
|---|---|---|
| n8n instance | Essential | Execute and import the workflow — use the n8n instance |
| Google Drive OAuth2 | Essential | Watch and download documents from Google Drive |
| LlamaIndex Cloud API | Essential | Parse and convert documents to structured markdown |
| Azure OpenAI Account | Essential | Generate embeddings (deployment configured to model name "3small") |
| Persistent Vector DB (e.g., Pinecone) | Optional | Persist embeddings for production-scale search |
Installation Steps
- Import the workflow JSON into your n8n instance: open your n8n instance and import the file.
- Configure credentials:
- Azure OpenAI: Provide Endpoint, API Key and set deployment name.
- LlamaIndex API: Create an HTTP Header Auth credential in n8n. Header Name: Authorization. Header Value: Bearer YOUR_API_KEY.
- Google Drive OAuth2: Create OAuth 2.0 credentials in Google Cloud Console, enable Drive API, and configure the Google Drive OAuth2 credential in n8n.
- Update environment-specific values:
- Replace the workflow's Google Drive fileId with the GUID or folder ID you want to watch (do not commit public IDs).
- Customize settings:
- Polling interval (Wait node): adjust for faster or slower job status checks.
- Target file or folder: toggled on the Google Drive Trigger node.
- Embedding model: change Azure OpenAI deployment if needed.
- Test execution:
- Save changes and trigger a sample file update on Drive. Verify each node runs and the vector store receives embeddings.
Technical Details
Core Nodes
| Node | Purpose | Key Configuration |
|---|---|---|
| Knowledge Base Updated Trigger (Google Drive Trigger) | Triggers on file/folder changes | Set trigger type to specific file or folder; configure OAuth2 credential |
| Download Knowledge Document (Google Drive) | Downloads file binary | Operation: download; ensure OAuth2 credential is selected |
| Parse Document via LlamaIndex (HTTP Request) | Uploads file to LlamaIndex parsing endpoint | POST multipart/form-data to /parsing/upload; use HTTP Header Auth credential |
| Monitor Document Processing (HTTP Request) | Polls parsing job status | GET /parsing/job/{{jobId}}; check status field |
| Check Parsing Completion (If) | Branches on job status | Condition: {{$json.status}} equals SUCCESS |
| Retrieve Parsed Content (HTTP Request) | Fetches parsed markdown result | GET /parsing/job/{{jobId}}/result/markdown |
| Default Data Loader (LangChain) | Loads parsed markdown into document format | Use as document source for embeddings |
| Embeddings Azure OpenAI | Generates embeddings for documents | Credentials: Azure OpenAI; Model/Deployment: 3small |
| Insert Data to Store (vectorStoreInMemory) | Stores documents + embeddings | Use memory store for prototyping; switch to DB for persistence |
Workflow Logic
- On Drive change, the file binary is downloaded and sent to LlamaIndex.
- Workflow enters a monitor loop: Monitor Document Processing fetches job status, If node checks status. If not SUCCESS, Wait node delays before re-check.
- When parsing completes, the workflow retrieves markdown, loads documents, creates embeddings via Azure OpenAI, and inserts data into an in-memory vector store.
Customization Options
Basic Adjustments:
- Poll Delay: Set Wait node (default: every minute) to balance speed vs. API quota.
- Target Scope: Switch the trigger from a single file to a folder to auto-handle many docs.
- Embedding Model: Swap Azure deployment for a different model name as needed.
Advanced Enhancements:
- Persistent Vector DB Integration: Replace vectorStoreInMemory with Pinecone or Milvus for production search.
- Notification: Add Slack or email nodes to notify when parsing completes or fails.
- Summarization: Add an LLM summarization step to generate chunk-level summaries.
Scaling option:
- Batch uploads and chunking to reduce embedding calls; use a queue (Redis or n8n queue patterns) and horizontal workers for high throughput.
Performance & Optimization
| Metric | Expected Performance | Optimization Tips |
|---|---|---|
| Execution time (per doc) | ~10s–2min (depends on file size & LlamaIndex processing) | Chunk large docs; run embeddings in batches |
| API calls (per doc) | 3–8 (upload, poll(s), retrieve, embedding calls) | Increase poll interval; consolidate requests |
| Error handling | Retries via Wait loop and If checks | Add exponential backoff, failure notifications, and retry limits |
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Authentication errors | Invalid/missing credentials | Reconfigure n8n Credentials; do not paste API keys directly into nodes |
| File not found | Incorrect fileId or permissions | Verify Drive fileId and OAuth scopes; share file with the service account if needed |
| Parsing stuck in PENDING | LlamaIndex processing delay or rate limit | Increase Wait node interval, monitor LlamaIndex dashboard, add retry limits |
| Embedding failures | Model/deployment mismatch or quota limits | Confirm Azure deployment name (3small) and subscription quotas |
Created by: khmuhtadin
Category: Knowledge Management
Tags: google-drive, llamaindex, azure-openai, embeddings, knowledge-base, vector-store
Need custom workflows? Contact us