Process OCR documents from Google Drive into searchable knowledge base with OpenAI & Pinecone
$20/month : Unlimited workflows
2500 executions/month
THE #1 IN WEB SCRAPING
Scrape any website without limits
HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20% Try free
DISCOUNT 20%
Self-hosted n8n
Unlimited workflows - from $4.99/mo
#1 hub for scraping, AI & automation
6000+ actors - $5 credits/mo
How it works
This workflow automates a full RAG ingestion pipeline. When a new OCR JSON file is added to a Google Drive folder, the workflow extracts lesson metadata, parses and cleans the Arabic text, generates semantic chunks, creates AI embeddings, and stores them in a Pinecone vector index. After processing, the file is automatically moved to an archive folder to prevent duplicates.
Set up steps
Follow the sticky notes inside the workflow for detailed instructions.
- Connect your Google Drive credentials.
- Replace the input folder ID and archive folder ID with your own.
- Connect your OpenAI account for embeddings.
- Connect your Pinecone API key and select your index.
The workflow is ready to run once credentials and folder paths are configured.