Generate production database schemas from Excel and CSV with OpenAI and LangChain
Workflow preview
$20/month : Unlimited workflows
2500 executions/month
THE #1 IN WEB SCRAPING
Scrape any website without limits
HOSTINGER
Early Deal
DISCOUNT 20% Try free
DISCOUNT 20%
Self-hosted n8n
Unlimited workflows - from $4.99/mo
#1 hub for scraping, AI & automation
6000+ actors - $5 credits/mo
Overview
Overview
This workflow automatically converts CSV or Excel files into a production-ready database schema using AI and rule-based validation.
It analyzes uploaded data, detects column types, relationships, and data quality, then generates a normalized schema. The output includes SQL DDL scripts, ERD diagrams, a data dictionary, and a load plan.
This eliminates manual schema design and accelerates database setup from raw data.
How It Works
- File Upload (Webhook)
- Accepts CSV or XLSX files via webhook endpoint
- Initializes workflow configuration (thresholds, retry limits)
- File Extraction
- Detects file format (CSV or Excel)
- Extracts rows into structured JSON
- Merges extracted datasets
- Data Cleaning & Profiling
- Removes duplicates and normalizes values
- Detects data types (integer, float, date, boolean, string)
- Computes column statistics (nulls, uniqueness, distributions)
- Generates file hash and sample dataset
- Column Profiling Engine
- Identifies potential primary keys
- Detects cardinality and uniqueness levels
- Suggests foreign key relationships based on value overlap
- AI Schema Generation
- Uses an AI agent to design normalized tables
- Assigns SQL data types based on real data
- Defines primary keys, foreign keys, constraints, and indexes
- Validation Layer
- Ensures schema matches actual data
- Validates:
- Data types
- Primary key uniqueness
- Foreign key overlap (>70%)
- Constraint consistency
- Detects circular dependencies
- Revision Loop
- If validation fails:
- Sends feedback to AI agent
- Regenerates schema
- Retries up to configured limit
- Schema Output Generation
- Generates:
- SQL DDL scripts
- ERD (Mermaid format)
- Data dictionary
- Load plan with dependency graph
- Load Plan Engine
- Computes optimal table insertion order
- Detects circular dependencies
- Suggests batching strategy
- Combine & Explain
- Merges all outputs
- Optional AI explanation of schema decisions
- Response Output
- Returns structured JSON via webhook:
- SQL schema
- ERD summary
- Data dictionary
- Load plan
- Optional explanation
Setup Instructions
- Activate the workflow and copy the webhook URL
- Send a POST request with a CSV or XLSX file
- Configure OpenAI credentials (used by AI agent)
- Adjust thresholds if needed (FK overlap, retries, confidence)
- Execute workflow and review generated outputs
Use Cases
- Auto-generate database schema from CSV/Excel files
- Data migration and onboarding pipelines
- Rapid database prototyping
- Reverse engineering datasets
- AI-assisted data modeling
Requirements
- n8n (latest version recommended)
- OpenAI API credentials
- LangChain nodes enabled
- CSV or XLSX input file