Ai website scraper & company intelligence
DISCOUNT 20%
AI Website Scraper & Company Intelligence
Description
This workflow automates the process of transforming any website URL into a structured, intelligent company profile.
It's triggered by a form, allowing a user to submit a website and choose between a "basic" or "deep" scrape.
The workflow extracts key information (mission, services, contacts, SEO keywords), stores it in a structured Supabase database, and archives a full JSON backup to Google Drive.
It also features a secondary AI agent that automatically finds and saves competitors for each company, building a rich, interconnected database of company intelligence.
Quick Implementation Steps
Import the Workflow: Import the provided JSON file into your n8n instance.
Install Custom Community Node:
You must install the community node from:
https://www.npmjs.com/package/n8n-nodes-crawl-and-scrape FIRECRAWL N8N Documentation https://docs.firecrawl.dev/developer-guides/workflow-automation/n8nInstall Additional Nodes:
n8n-nodes-crawl-and-scrapeandn8n-nodes-mcpfire crawl mcp .Set up Credentials:
Create credentials in n8n for FIRE CRAWL API,Supabase, Mistral AI, and Google Drive.Configure API Key (CRITICAL):
- Open the Web Search tool node.
- Go to Parameters → Headers and replace the hardcoded Tavily AI API key with your own.
Configure Supabase Nodes:
- Assign your Supabase credential to all Supabase nodes.
- Ensure table names (e.g.,
companies,competitors) match your schema.
Configure Google Drive Nodes:
- Assign your Google Drive credential to the
Google Drive2andsave to Google Drive1nodes. - Select the correct Folder ID.
- Assign your Google Drive credential to the
Activate Workflow:
Turn on the workflow and open the Webhook URL in the “On form submission” node to access the form.
What It Does
Form Trigger
Captures user input: “Website URL” and “Scraping Type” (basic or deep).
Scraping Router
A Switch node routes the flow:
- Deep Scraping → AI-based MCP Firecrawler agent.
- Basic Scraping → Crawlee node.
Deep Scraping (Firecrawl AI Agent)
- Uses Firecrawl and Tavily Web Search.
- Extracts a detailed JSON profile: mission, services, contacts, SEO keywords, etc.
Basic Scraping (Crawlee)
- Uses
Crawl and Scrapenode to collect raw text. - A Mistral-based AI extractor structures the data into JSON.
Data Storage
- Stores structured data in Supabase tables (
companies,company_basicprofiles). - Archives a full JSON backup to Google Drive.
Automated Competitor Analysis
- Runs after a deep scrape.
- Uses Tavily web search to find competitors (e.g., from Crunchbase).
- Saves competitor data to Supabase, linked by
company_id.
Who's It For
- Sales & Marketing Teams: Enrich leads with deep company info.
- Market Researchers: Build structured, searchable company databases.
- B2B Data Providers: Automate company intelligence collection.
- Developers: Use as a base for RAG or enrichment pipelines.
Requirements
- n8n instance (self-hosted or cloud)
- Supabase Account: With tables like
companies,competitors,social_links, etc. - Mistral AI API Key
- Google Drive Credentials
- Tavily AI API Key
- (Optional) Custom Nodes:
n8n-nodes-crawl-and-scrape
How It Works
Flow Summary
- Form Trigger: Captures “Website URL” and “Scraping Type”.
- Switch Node:
deep→ MCP Firecrawler (AI Agent).basic→ Crawl and Scrape node.
- Scraping & Extraction:
- Deep path: Firecrawler → JSON structure.
- Basic path: Crawlee → Mistral extractor → JSON.
- Storage:
- Save JSON to Supabase.
- Archive in Google Drive.
- Competitor Analysis (Deep Only):
- Finds competitors via Tavily.
- Saves to Supabase
competitorstable.
- End: Finishes with a
No Operationnode.
How To Set Up
- Import workflow JSON.
- Install community nodes (especially
n8n-nodes-crawl-and-scrapefrom npm). - Configure credentials (Supabase, Mistral AI, Google Drive).
- Add your Tavily API key.
- Connect Supabase and Drive nodes properly.
- Fix disconnected “basic” path if needed.
- Activate workflow.
- Test via the webhook form URL.
How To Customize
- Change LLMs: Swap Mistral for OpenAI or Claude.
- Edit Scraper Prompts: Modify system prompts in AI agent nodes.
- Change Extraction Schema: Update JSON Schema in extractor nodes.
- Fix Relational Tables: Add
Itemsnode before Supabase inserts for arrays (social links, keywords). - Enhance Automation: Add email/slack notifications, or replace form trigger with a Google Sheets trigger.
Add-ons
- Automated Trigger: Run on new sheet rows.
- Notifications: Email or Slack alerts after completion.
- RAG Integration: Use the Supabase database as a chatbot knowledge source.
Use Case Examples
- Sales Lead Enrichment: Instantly get company + competitor data from a URL.
- Market Research: Collect and compare companies in a niche.
- B2B Database Creation: Build a proprietary company dataset.
WORKFLOW IMAGE
Troubleshooting Guide
| Issue | Possible Cause | Solution |
|---|---|---|
| Form Trigger 404 | Workflow not active | Activate the workflow |
| Web Search Tool fails | Missing Tavily API key | Replace the placeholder key |
| FIRECRAWLER / find competitor fails | Missing MCP node | Install n8n-nodes-mcp |
| Basic scrape does nothing | Switch node path disconnected | Reconnect “basic” output |
| Supabase node error | Wrong table/column names | Match schema exactly |
Need Help or More Workflows?
Want to customize this workflow for your business or integrate it with your existing tools?
Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements.
Contact: [email protected]
For more such offerings, visit us: https://www.digitalbiz.tech