Extract invoice data from PDFs to JSON with Gemini AI and XML transformation
$20/month : Unlimited workflows
2500 executions/month
THE #1 IN WEB SCRAPING
Scrape any website without limits
HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20% Try free
DISCOUNT 20%
Self-hosted n8n
Unlimited workflows - from $4.99/mo
#1 hub for scraping, AI & automation
6000+ actors - $5 credits/mo
This n8n workflow converts invoices in PDF format into a structured, ready-to-use JSON, using AI and XML transformation — without writing any code.
🚀 How it works
Upload form → The user uploads a PDF file.
Text extraction → The PDF content is extracted as plain text.
XML schema definition → A standard invoice structure is defined with fields such as:
- Invoice number
- Customer and issuer details
- Items with description, quantity, and price
- Totals and taxes
- Bank account details
AI (Gemini) → The model rewrites the PDF text into a valid XML following the predefined schema.
XML cleanup → Removes extra tags, line breaks, and unnecessary formatting.
JSON conversion → The XML is transformed into a clean, structured JSON object, ready for integrations, APIs, or storage.
✨ Benefits
- Transforms unstructured PDFs into normalized JSON data.
- No coding required, only n8n nodes.
- Scalable to different invoice formats with minimal adjustments.
- Leverages AI to interpret complex textual content.
🛠️ Use cases
- Automating invoice data capture.
- Integration with ERPs, CRMs, or databases.
- Generating financial reports from PDFs.