Skip to main content

Deduplicate data records using JavaScript array methods

Workflow preview

Deduplicate data records using JavaScript array methods preview
Open on n8n.io

Important notice

This workflow is provided as-is. Please review and test before using in production.

Overview

How It Works – Data Deduplication in n8n

This tutorial demonstrates how to remove duplicate records from a dataset using JavaScript logic inside n8n's Code nodes. It simulates real-world data cleaning by generating sample user data with intentional duplicates (based on email addresses) and walks you through the process of deduplication step-by-step.

The process includes:

Creating Sample Data with duplicates. Filtering Out Duplicates using filter() and findIndex() based on email. Displaying Cleaned Results with simple statistics for before-and-after comparison. This is ideal for scenarios like CRM imports, ETL processes, and general data hygiene.

⚙️ Set-Up Steps

🔹 Step 1: Manual Trigger Node: When clicking 'Test workflow' Purpose: Initiates the workflow manually for testing.

🔹 Step 2: Generate Sample Data Node: Create Sample Data (Code node) What it does:

Creates 6 users, including 2 intentional duplicates (by email). Outputs data as usersJson with metadata (totalCount, message). Mimics real-world messy datasets. 🔹 Step 3: Deduplicate the Data Node: Deduplicate Users (Code node) What it does:

Parses usersJson. Uses .filter() + .findIndex() to keep only the first instance of each email. Logs total, unique, and removed counts. Outputs clean user list as separate items. 🔹 Step 4: Display Results Node: Display Results (Code node) What it does:

Outputs structured summary: Unique users Status Timestamp Prepares results for review or downstream use. 📈 Sample Output

Original count: 6 users Deduplicated count: 4 users Duplicates removed: 2 users 🎯 Learning Objectives

You'll learn how to:

Use .filter() and .findIndex() in n8n Code nodes Clean JSON data within workflows Create simple, effective deduplication pipelines Output structured summaries for reporting or integration 🧠Best Practices

Validate input format (e.g., JSON schema) Handle null or missing fields gracefully Use logging for visibility Add error handling for production use Use pagination/chunking for large datasets