Skip to main content

Extract product details from search result URLs with MrScraper and Google Sheets

Workflow preview

Extract product details from search result URLs with MrScraper and Google Sheets preview
Open on n8n.io

Overview

Description

This n8n template automates the extraction of structured product data from search results pages — using a pre-built list of search/listing URLs stored in Google Sheets as the starting point. Instead of crawling a domain from scratch, this workflow picks up exactly where your research left off: you bring the URLs, it brings the data.

Perfect for price monitoring, product research, competitor analysis, or building any kind of structured product database from e-commerce or directory sites.


How It Works

  • Phase 1 – Load Search Page URLs: The workflow reads your pre-prepared list of search or listing page URLs directly from a Google Sheets tab. This gives you full control over which pages get scraped without any crawling step.
  • Phase 2 – Scrape Listing Pages: Each URL is looped through the Listing Agent, which navigates the search results page and extracts all individual product/detail page URLs. Duplicates are automatically removed.
  • Phase 3 – Scrape Detail Pages: Every detail URL is then processed by the General Agent, which extracts structured fields such as title, price, description, attributes, and more. Nested JSON is automatically flattened into clean, spreadsheet-ready rows.
  • Phase 4 – Export & Notify: All scraped records are appended or upserted into a Google Sheets output tab. A Gmail notification is sent on completion with a run summary.

How to Set Up

  1. Create 2 scrapers in your MrScraper account:
  • Listing Agent Scraper (for extracting detail URLs from search/listing pages)
  • General Agent Scraper (for extracting structured data from each detail page)
  • Copy the scraperId for each — you'll need these in n8n.
  1. Enable AI Scraper API access in your MrScraper account settings.

  2. Prepare your Google Sheet with search URLs:

  • Create a sheet with a column containing the listing/search page URLs you want to scrape
  • This is your input — add as many URLs as needed
  1. Add your credentials in n8n:
  • MrScraper API token
  • Google Sheets OAuth2 (for both input and output sheets)
  • Gmail OAuth2
  1. Configure the "Get List Search Page" node:
  • Connect to the spreadsheet and sheet tab containing your input URLs
  1. Configure the Listing Agent node:
  • Enter your Listing scraperId
  • Adjust maxPages based on how many result pages to scrape per URL
  1. Configure the General Agent node:
  • Enter your General scraperId
  1. Configure the output Google Sheets node:
  • Enter your output spreadsheet and sheet tab URL
  • Set a unique match key (recommended: url) for upsert to avoid duplicates on re-runs
  1. Configure Gmail:
  • Set recipient email, subject line, and message body for your run notification

Requirements

  • MrScraper account with API access enabled
  • Google Sheets (OAuth2 connected) — one sheet for input URLs, one for output data
  • Gmail (OAuth2 connected)

Good to Know

  • Unlike a full crawl workflow, this template is input-driven — you control exactly which pages are scraped by managing your Google Sheets input list.
  • You can run this on a schedule and simply add new URLs to the input sheet whenever you want fresh data scraped.
  • The Flatten Object node handles deeply nested JSON automatically, so most sites won't require any manual field mapping.
  • Use upsert mode with a unique key in the output sheet to safely re-run the workflow without creating duplicate rows.

Customising This Workflow

  • Scheduled monitoring: Replace the manual trigger with a Schedule Trigger and keep your input sheet updated to run automated price or availability checks.
  • Multi-category tracking: Organise your input sheet by category or source site and add a Filter node to process specific subsets on each run.
  • Add data enrichment: Insert an AI node after Phase 3 to summarise descriptions, classify products, or translate content before saving to Sheets.
  • Slack or webhook alerts: Swap the Gmail node for a Slack or webhook notification if you prefer real-time alerts in a different channel.