Mask PII in documents for GDPR-safe AI processing with Postgres and Claude

Name: Mask PII in documents for GDPR-safe AI processing with Postgres and Claude
Availability: InStock
Author: ResilNext

Mask PII in documents for GDPR-safe AI processing with Postgres and Claude preview

Open on n8n.io

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER

Early Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

Overview

This workflow implements a privacy-preserving AI document processing pipeline that detects, masks, and securely manages Personally Identifiable Information (PII) before any AI processing occurs.

Organizations often need to analyze documents such as invoices, forms, contracts, or reports using AI. However, sending documents containing personal data directly to AI models can create serious privacy, compliance, and security risks.

This workflow solves that problem by automatically detecting sensitive information, replacing it with secure tokens, and storing the original values in a protected vault database.

Only the masked version of the document is sent to the AI model for analysis. If required, a controlled PII re-injection mechanism can restore original values after processing.

The workflow also records all operations in an audit log, making it suitable for environments requiring strong compliance such as GDPR, financial services, healthcare, or enterprise document processing systems.

How It Works

1. Document Upload

A webhook receives a document (typically a PDF) and triggers the workflow.

2. OCR Text Extraction

The OCR Extract node extracts the text content from the document so it can be analyzed for sensitive information.

3. PII Detection

Multiple detectors analyze the text to identify different types of sensitive data:

Email addresses (regex detection)
Phone numbers (multi-pattern detection)
Identification numbers such as PAN, SSN, or bank accounts
Physical addresses detected using an AI model

Each detection includes:

detected value
location in the text
confidence score

4. Detection Consolidation

All detected PII results are merged into a single dataset. The workflow resolves overlapping detections and removes duplicates to produce a clean list of sensitive values.

5. Tokenization and Secure Vault Storage

Each detected PII value is replaced with a secure token, for example:

&lt;&lt;EMAIL_7F3A&gt;&gt;
&lt;&lt;PHONE_A12B&gt;&gt;

The original values are securely stored in a Postgres vault table.

This ensures sensitive data is never exposed to AI models.

6. Masked AI Processing

The masked document is sent to an AI model for structured analysis.

Possible AI tasks include:

Document classification
Data extraction
Document summarization
Entity extraction

Since all sensitive data has been tokenized, the AI processes the document without seeing any real personal data.

7. Controlled PII Re-Injection

After AI processing, the workflow can optionally restore original values from the vault.

The Re-Injection Controller determines which fields are allowed to restore PII based on defined permissions.

8. Compliance Audit Logging

All events are recorded in an audit table, including:

PII detection
token generation
AI processing
PII restoration

This provides traceability and compliance reporting.

Setup Instructions

1. Configure Postgres Database

Create two tables in your database.

PII Vault Table

Example structure:

token
original_value
type
document_id
created_at

This table securely stores original PII values mapped to tokens.

Audit Log Table

Example structure:

document_id
pii_types_detected
token_count
ai_access_confirmed
re_injection_events
timestamp
actor

This table records workflow activity for compliance tracking.

2. Configure AI Model Credentials

This workflow supports multiple AI models:

Anthropic Claude (used for AI document processing)
Ollama local models (used for address detection)

Configure credentials in n8n before running the workflow.

3. Configure Webhook Trigger

The workflow starts when a document is sent to the webhook:

POST /webhook/gdpr-document-upload

Upload a PDF file to this endpoint to trigger processing.

4. Configure Alert Notifications (Optional)

Replace the placeholder alert webhook URL with your monitoring or alerting system.

Example use cases:

Slack alert
monitoring system
incident notification

Alerts are triggered if masking fails.

Use Cases

This workflow is useful for many privacy-sensitive automation scenarios.

GDPR-Compliant Document Processing

Safely process documents containing personal data without exposing PII to AI models.

AI-Powered Document Analysis

Use AI to summarize or extract data from documents while maintaining privacy.

Enterprise Data Redaction Pipelines

Automatically detect and tokenize sensitive data before sending documents to downstream systems.

Financial Document Processing

Process invoices, contracts, and financial reports securely.

Healthcare Document Automation

Analyze patient documents while ensuring sensitive data is protected.

Requirements

To run this workflow you need:

n8n
Postgres database
Anthropic Claude API access
Ollama (optional for local AI address detection)
Webhook endpoint for document uploads

Optional integrations:

Monitoring or alert system
Compliance audit database

Key Features

Automated PII detection and tokenization
AI-safe document processing
Secure vault storage for sensitive data
Controlled PII restoration
Full audit logging
Works with multiple AI models
Designed for GDPR and enterprise compliance

Summary

This workflow creates a secure bridge between sensitive documents and AI systems.

By automatically detecting, masking, and securely storing personal data, it enables organizations to safely apply AI to document processing tasks without exposing sensitive information.

The combination of tokenization, secure vault storage, controlled re-injection, and audit logging makes this workflow suitable for privacy-sensitive industries and enterprise automation pipelines.

ResilNext

33 workflows

Complexity

advanced

Published 08 Mar 2026

Likes 0

View on n8n.io Download Workflow

Install path: /data/workflows/13941/13941.json

Share Your Workflow

Have a useful automation to share? Publish it and help the community.

Submit Your Template How to Submit

Related Workflows

Create fillable document templates from PDF or DOCX with GPT-4o and Google Drive

## 📄Template Creator ### How it works This workflow accepts any uploaded document (PDF or DOCX) via webhook and automatically converts it into a reusable fill-in-the-blank template. **Step 1 — Identify:** GPT-4o first reads the document and determines the document type (e.g., Employment Contract, Invoice, NDA, Lease Agreement, Project Proposal) and the specific variable fields that type of document typically contains. **Step 2 — Templatize:** A second AI pass uses the identified document type and field list to replace all variable content with clearly labeled `[BRACKET]` placeholders while preserving all static boilerplate and structure verbatim. **Step 3 — Deliver:** The cleaned template is rendered to PDF via Gotenberg, uploaded to Google Drive, made publicly accessible, and a JSON response with file URLs is returned to the caller. ### Setup 1. Configure the **Webhook** path if needed (default: `general-template-creator`) 2. Set your **OpenAI** credential on both LLM nodes 3. Set your **Google Drive** credential and confirm the target folder ID in the Upload node 4. Confirm the **Gotenberg** URL matches your self-hosted instance 5. Install the community node `n8n-nodes-word2text` (see ⚠️ warning sticky) ### Customization - Swap GPT-4o for GPT-4.1 or GPT-4.1-mini on the Identify node to reduce cost on the lighter classification task - Add a Switch node after identification to route different document types to type-specific prompts - Modify the Drive folder ID to sort templates into subfolders by document type Document accepts input from a form such as the one found here: [Sample Form](https://iportgpt.com/n8n_assets/template_creator_form.html)

View

Generate PDF pricing proposals from Excel with Gotenberg and Outlook

## How it works 1. **Webhook Trigger** — A web form POSTs prospect details, selected services, discount level, and requestor info to the webhook endpoint. 2. **Fetch Pricing Sheet** — Retrieves all rows from your `pricing_request.xlsx` Excel table in OneDrive/SharePoint. 3. **Process & Price** — Filters the pricing rows to the selected services, applies the correct Discount / Retail / Premium price tier, and calculates the total. 4. **Build HTML** — Renders a fully branded HTML proposal document with cover info, line-item pricing table, terms, and a signature block. 5. **Convert to PDF** — Sends the HTML to a self-hosted [Gotenberg](https://gotenberg.dev) instance, which returns a print-ready PDF. 6. **Email Proposal** — Delivers the PDF as an attachment via Gmail OAuth2 to the requester and any internal recipients you configure.

View

Classify and route email attachments with easybits, Gmail and Google Drive

## What This Workflow Does Receive any business document via email. The attachment is automatically **classified** (Invoice, Contract, or Purchase Order) using easybits Extractor, then **routed** down the correct path where a second Extractor pulls out document-specific data. Each route stores the file in Google Drive and triggers the appropriate action – Invoices go to a finance spreadsheet, Contracts and Purchase Orders trigger Slack notifications. ## How It Works 1. **Receive** – Gmail polls for new emails with attachments every minute 2. **Classify** – easybits Extractor identifies the document type and returns a class label 3. **Route** – A Switch node sends the item down the matching path (Invoice / Contract / PO) 4. **Merge Binary** – The original file is merged back into the routed item (classification strips the binary) 5. **Extract** – A second easybits Extractor pulls fields specific to that document type 6. **Merge Data + File** – Extracted JSON and original binary are combined for upload 7. **Store & Notify** – The file is uploaded to Google Drive; Invoices update a spreadsheet, Contracts and POs trigger Slack alerts --- ## Setup Guide ### 1. Create Your easybits Classification Pipeline 1. Go to **extractor.easybits.tech** and create a new pipeline 2. Add **one field** called `document_class` 3. In the field prompt, describe your classification categories and how to identify each one (see the "easybits: Classify Document" node for a reference prompt) 4. The prompt should instruct the model to return exactly one category label – no explanations, no extra text 5. Adjust the categories and identification criteria to match your specific document types 6. Copy your **Pipeline ID** and connect the credential in the classification node > 💡 **Tip:** The classification prompt is the heart of this workflow. The more specific your category descriptions and decision rules are, the more accurate your results will be. ### 2. Create Three Extraction Pipelines Create one pipeline per document type on **extractor.easybits.tech**: - **Invoice pipeline** – fields: `invoice_number`, `total_amount`, `currency`, `due_date`, `vendor_name` - **Contract pipeline** – fields: `client_name`, `contract_type`, `contract_value`, `currency`, `start_date`, `notice_period` - **Purchase Order pipeline** – fields: `supplier_name`, `po_number`, `order_date`, `expected_delivery_date`, `total_amount`, `currency` Connect each pipeline's credentials to the matching Extractor node. ### 3. Set Up Gmail Connect your Gmail account via OAuth2. Optionally filter by label to only process specific emails. Make sure **Download Attachments** is enabled in the node options. ### 4. Set Up Google Drive Create three folders in Google Drive: **Invoices**, **Contracts**, **Purchase Orders**. Select the correct folder in each Upload node. Set the **Input Binary Field** to `attachment_0` (or whichever field carries the PDF). ### 5. Set Up Google Sheets Create a spreadsheet (or use an existing Master Finance File). Make sure the column headers match the field mappings in the "Update Master Finance Sheet" node. ### 6. Set Up Slack Connect your Slack workspace. Select the channel for contract notifications (e.g. `#contracts`) and the channel for PO updates (e.g. `#operations`). Adjust the message templates if your extracted field names differ. ### 7. Activate & Test Set the workflow to active and send a test email with an invoice, contract, and purchase order attached to verify each route works end to end.

View

Need Custom Automation?

Get help designing a custom n8n workflow that connects your stack and fits your process.

Mask PII in documents for GDPR-safe AI processing with Postgres and Claude

Workflow preview

Overview

Overview

How It Works

1. Document Upload

2. OCR Text Extraction

3. PII Detection

4. Detection Consolidation

5. Tokenization and Secure Vault Storage

6. Masked AI Processing

7. Controlled PII Re-Injection

8. Compliance Audit Logging

Setup Instructions

1. Configure Postgres Database

PII Vault Table

Audit Log Table

2. Configure AI Model Credentials

3. Configure Webhook Trigger

4. Configure Alert Notifications (Optional)

Use Cases

GDPR-Compliant Document Processing

AI-Powered Document Analysis

Enterprise Data Redaction Pipelines

Financial Document Processing

Healthcare Document Automation

Requirements

Key Features

Summary