Automate invoice processing with OCR, GPT-4 & Salesforce opportunity creation
$20/month : Unlimited workflows
2500 executions/month
THE #1 IN WEB SCRAPING
Scrape any website without limits
HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20% Try free
DISCOUNT 20%
Self-hosted n8n
Unlimited workflows - from $4.99/mo
#1 hub for scraping, AI & automation
6000+ actors - $5 credits/mo
PDF Invoice Extractor (AI)
End-to-end pipeline: Watch Drive ➜ Download PDF ➜ OCR text ➜ AI normalize to JSON ➜ Upsert Buyer (Account) ➜ Create Opportunity ➜ Map Products ➜ Create OLI via Composite API ➜ Archive to OneDrive.
Node by node (what it does & key setup)
1) Google Drive Trigger
- Purpose: Fire when a new file appears in a specific Google Drive folder.
- Key settings:
- Event:
fileCreated - Folder ID:
google drive folder id - Polling:
everyMinute - Creds:
googleDriveOAuth2Api
- Event:
- Output: Metadata
{ id, name, ... }for the new file.
2) Download File From Google
- Purpose: Get the file binary for processing and archiving.
- Key settings:
- Operation:
download - File ID:
={{ $json.id }} - Creds:
googleDriveOAuth2Api
- Operation:
- Output: Binary (default key:
data) and original metadata.
3) Extract from File
- Purpose: Extract text from PDF (OCR as needed) for AI parsing.
- Key settings:
- Operation:
pdf - OCR: enable for scanned PDFs (in options)
- Operation:
- Output: JSON with OCR text at
{{ $json.text }}.
4) Message a model (AI JSON Extractor)
- Purpose: Convert OCR text into strict normalized JSON array (invoice schema).
- Key settings:
- Node:
@n8n/n8n-nodes-langchain.openAi - Model:
gpt-4.1(orgpt-4.1-mini) - Message role: system (the strict prompt; references
{{ $json.text }}) jsonOutput:true- Creds:
openAiApi
- Node:
- Output (per item):
$.message.content→ the parsed JSON (ensure it’s an array).
5) Create or update an account (Salesforce)
- Purpose: Upsert Buyer as Account using an external ID.
- Key settings:
- Resource:
account - Operation:
upsert - External Id Field:
tax_id__c - External Id Value:
={{ $json.message.content.buyer.tax_id }} - Name:
={{ $json.message.content.buyer.name }} - Creds:
salesforceOAuth2Api
- Resource:
- Output: Account record (captures
Id) for downstream Opportunity.
6) Create an opportunity (Salesforce)
- Purpose: Create Opportunity linked to the Buyer (Account).
- Key settings:
- Resource:
opportunity - Name:
={{ $('Message a model').item.json.message.content.invoice.code }} - Close Date:
={{ $('Message a model').item.json.message.content.invoice.issue_date }} - Stage:
Closed Won - Amount:
={{ $('Message a model').item.json.message.content.summary.grand_total }} - AccountId:
={{ $json.id }}(from Upsert Account output) - Creds:
salesforceOAuth2Api
- Resource:
- Output: Opportunity
Idfor OLI creation.
7) Build SOQL (Code / JS)
- Purpose: Collect unique product codes from AI JSON and build a SOQL query for PricebookEntry by
Pricebook2Id. - Key settings:
pricebook2Id(hardcoded in script): e.g.,01sxxxxxxxxxxxxxxx- Source lines:
$('Message a model').first().json.message.content.products
- Output:
{ soql, codes }
8) Query PricebookEntries (Salesforce)
- Purpose: Fetch
PricebookEntry.Idfor eachProduct2.ProductCode. - Key settings:
- Resource:
search - Query:
={{ $json.soql }} - Creds:
salesforceOAuth2Api
- Resource:
- Output: Items with
Id,Product2.ProductCode(used for mapping).
9) Code in JavaScript (Build OLI payloads)
- Purpose: Join lines with PBE results and Opportunity
Id➜ build OpportunityLineItem payloads. - Inputs:
- OpportunityId:
={{ $('Create an opportunity').first().json.id }} - Lines:
={{ $('Message a model').first().json.message.content.products }} - PBE rows: from previous node items
- OpportunityId:
- Output:
{ body: { allOrNone:false, records:[{ OpportunityLineItem... }] } } - Notes:
- Converts discount_total ➜ per-unit if needed (currently commented for standard pricing).
- Throws on missing PBE mapping or empty lines.
10) Create Opportunity Line Items (HTTP Request)
- Purpose: Bulk create OLIs via Salesforce Composite API.
- Key settings:
- Method:
POST - URL:
https://<your-instance>.my.salesforce.com/services/data/v65.0/composite/sobjects - Auth:
salesforceOAuth2Api(predefined credential) - Body (JSON):
={{ $json.body }}
- Method:
- Output: Composite API results (per-record statuses).
11) Update File to One Drive
- Purpose: Archive the original PDF in OneDrive.
- Key settings:
- Operation:
upload - File Name:
={{ $json.name }} - Parent Folder ID:
onedrive folder id - Binary Data:
true(from the Download node) - Creds:
microsoftOneDriveOAuth2Api
- Operation:
- Output: Uploaded file metadata.
Data flow (wiring)
- Google Drive Trigger → Download File From Google
- Download File From Google
- → Extract from File
- → Update File to One Drive
- Extract from File → Message a model
- Message a model
- → Create or update an account
- Create or update an account → Create an opportunity
- Create an opportunity → Build SOQL
- Build SOQL → Query PricebookEntries
- Query PricebookEntries → Code in JavaScript
- Code in JavaScript → Create Opportunity Line Items
Quick setup checklist
- 🔐 Credentials: Connect Google Drive, OneDrive, Salesforce, OpenAI.
- 📂 IDs:
- Drive Folder ID (watch)
- OneDrive Parent Folder ID (archive)
- Salesforce Pricebook2Id (in the JS SOQL builder)
- 🧠 AI Prompt: Use the strict system prompt; jsonOutput = true.
- 🧾 Field mappings:
- Buyer tax id/name → Account upsert fields
- Invoice code/date/amount → Opportunity fields
- Product
namemust equal your Product2.ProductCode in SF.
- ✅ Test: Drop a sample PDF → verify:
- AI returns array JSON only
- Account/Opportunity created
- OLI records created
- PDF archived to OneDrive
Notes & best practices
- If PDFs are scans, enable OCR in Extract from File.
- If AI returns non-JSON, keep “Return only a JSON array” as the last line of the prompt and keep
jsonOutputenabled. - Consider adding validation on
parsing.warningsto gate Salesforce writes. - For discounts/taxes in OLI:
- Standard OLI fields don’t support per-line discount amounts directly; model them in UnitPrice or custom fields.
- Replace the Composite API URL with your org’s domain or use the Salesforce node’s Bulk Upsert for simplicity.