Scrape Yelp business data with Scrape.do API & Google Sheets storage

Name: Scrape Yelp business data with Scrape.do API & Google Sheets storage
Availability: InStock
Rating: 4.5 (3 reviews)
Author: Onur

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

Yelp Business Scraper by URL via Scrape.do API with Google Sheets Storage

Overview

This n8n workflow automates the process of scraping comprehensive business information from Yelp using individual business URLs. It integrates with Scrape.do for professional web scraping with anti-bot bypass capabilities and Google Sheets for centralized data storage, providing detailed business intelligence for market research, competitor analysis, and lead generation.

Workflow Components

1. 📥 Form Trigger

Property	Value
Type	Form Trigger
Purpose	Initiates the workflow with user-submitted Yelp business URL
Input Fields	Yelp Business URL
Function	Captures target business URL to start the scraping process

2. 🔍 Create Scrape.do Job

Property	Value
Type	HTTP Request (POST)
Purpose	Creates an async scraping job via Scrape.do API
Endpoint	`https://q.scrape.do/api/v1/jobs`
Authentication	X-Token header

Request Parameters:

Targets: Array containing the Yelp business URL
Super: true (uses residential/mobile proxies for better success rate)
GeoCode: us (targets US-based content)
Device: desktop
Render: JavaScript rendering enabled with networkidle2 wait condition

Function: Initiates comprehensive business data extraction from Yelp with headless browser rendering to handle dynamic content.

3. 🔧 Parse Yelp HTML

Property	Value
Type	Code Node (JavaScript)
Purpose	Extracts structured business data from raw HTML
Mode	Run once for each item

Function: Parses the scraped HTML content using regex patterns and JSON-LD extraction to retrieve:

Business name
Overall rating
Review count
Phone number
Full address
Price range
Categories
Website URL
Business hours
Image URLs

4. 📊 Store to Google Sheet

Property	Value
Type	Google Sheets Node
Purpose	Stores scraped business data for analysis and storage
Operation	Append rows
Target	"Yelp Scraper Data - Scrape.do" sheet

Data Mapping:

Business Name, Overall Rating, Reviews Count
Business URL, Phone, Address
Price Range, Categories, Website
Hours, Images/Videos URLs, Scraped Timestamp

Workflow Flow

Form Input → Create Scrape.do Job → Parse Yelp HTML → Store to Google Sheet
     │              │                      │                    │
     ▼              ▼                      ▼                    ▼
 User submits   API creates job      JavaScript code      Data appended
 Yelp URL       with JS rendering    extracts fields      to spreadsheet

Configuration Requirements

API Keys & Credentials

Credential	Purpose
Scrape.do API Token	Required for Yelp business scraping with anti-bot bypass
Google Sheets OAuth2	For data storage and export access
n8n Form Webhook	For user input collection

Setup Parameters

Parameter	Description
`YOUR_SCRAPEDO_TOKEN`	Your Scrape.do API token (appears in 3 places)
`YOUR_GOOGLE_SHEET_ID`	Target spreadsheet identifier
`YOUR_GOOGLE_SHEETS_CREDENTIAL_ID`	OAuth2 authentication reference

Key Features

🛡️ Anti-Bot Bypass Technology

Residential Proxy Rotation: 110M+ proxies across 150 countries
WAF Bypass: Handles Cloudflare, Akamai, DataDome, and PerimeterX
Dynamic TLS Fingerprinting: Authentic browser signatures
CAPTCHA Handling: Automatic bypass for uninterrupted scraping

🌐 JavaScript Rendering

Full headless browser support for dynamic Yelp content
networkidle2 wait condition ensures complete page load
Custom wait times for complex page elements
Real device fingerprints for detection avoidance

📊 Comprehensive Data Extraction

Field	Description	Example
`name`	Business name	"Joe's Pizza Restaurant"
`overall_rating`	Average customer rating	"4.5"
`reviews_count`	Total number of reviews	"247"
`url`	Original Yelp business URL	"https://www.yelp.com/biz/..."
`phone`	Business phone number	"(555) 123-4567"
`address`	Full street address	"123 Main St, New York, NY 10001"
`price_range`	Price indicator	"$$"
`categories`	Business categories	"Pizza, Italian, Delivery"
`website`	Business website URL	"https://joespizza.com"
`hours`	Operating hours	"Mon-Fri 11:00-22:00"
`images_videos_urls`	Media content links	"https://s3-media1.fl.yelpcdn.com/..."
`scraped_at`	Extraction timestamp	"2025-01-15T10:30:00Z"

🗂️ Centralized Data Storage

Automatic Google Sheets export
Organized business data format with 12 data fields
Historical scraping records with timestamps
Easy sharing and collaboration

Use Cases

📈 Market Research

Competitor business analysis
Local market intelligence gathering
Industry benchmark establishment
Service offering comparison

🎯 Lead Generation

Business contact information extraction
Potential client identification
Market opportunity assessment
Sales prospect development

📊 Business Intelligence

Customer sentiment analysis through ratings
Competitor performance monitoring
Market positioning research
Brand reputation tracking

📍 Location Analysis

Geographic business distribution
Local competition assessment
Market saturation evaluation
Expansion opportunity identification

Technical Notes

Specification	Value
Processing Time	15-45 seconds per business URL
Data Accuracy	95%+ for publicly available business information
Success Rate	99.98% (Scrape.do guarantee)
Proxy Pool	110M+ residential, mobile, and datacenter IPs
JS Rendering	Full headless browser with `networkidle2` wait
Data Format	JSON with structured field mapping
Storage Format	Structured Google Sheets with 12 predefined columns

Setup Instructions

Step 1: Import Workflow

Copy the JSON workflow configuration
Import into n8n: Workflows → Import from JSON
Paste configuration and save

Step 2: Configure Scrape.do

Get your API token:

Sign up at Scrape.do
Navigate to Dashboard → API Token
Copy your token

Update workflow references (3 places):

🔍 Create Scrape.do Job node → Headers → X-Token
📡 Check Job Status node → Headers → X-Token
📥 Fetch Task Results node → Headers → X-Token

Replace YOUR_SCRAPEDO_TOKEN with your actual API token.

Step 3: Configure Google Sheets

Create target spreadsheet:

Create new Google Sheet named "Yelp Business Data" or similar

Add header row with columns:

name | overall_rating | reviews_count | url | phone | address | price_range | categories | website | hours | images_videos_urls | scraped_at

Copy the Sheet ID from URL (the long string between /d/ and /edit)

Set up OAuth2 credentials:

In n8n: Credentials → Add Credential → Google Sheets OAuth2
Complete the Google authentication process
Grant access to Google Sheets

Update workflow references:

Replace YOUR_GOOGLE_SHEET_ID with your actual Sheet ID
Update YOUR_GOOGLE_SHEETS_CREDENTIAL_ID with credential reference

Step 4: Test and Activate

Test with sample URL:

Use a known Yelp business URL (e.g., https://www.yelp.com/biz/example-business-city)
Submit through the form trigger
Monitor execution progress in n8n
Verify data appears in Google Sheet

Activate workflow:

Toggle workflow to "Active"
Share form URL with users

Sample Business Data

The workflow captures comprehensive business information including:

Category	Data Points
Basic Information	Name, category, location
Performance Metrics	Ratings, review counts, popularity
Contact Details	Phone, website, address
Visual Content	Photos, videos, gallery URLs
Operational Data	Hours, services, price range

Advanced Configuration

Batch Processing

Modify the input to accept multiple URLs by updating the job creation body:

{
  "Targets": [
    "https://www.yelp.com/biz/business-1",
    "https://www.yelp.com/biz/business-2",
    "https://www.yelp.com/biz/business-3"
  ],
  "Super": true,
  "GeoCode": "us",
  "Render": {
    "WaitUntil": "networkidle2",
    "CustomWait": 3000
  }
}

Enhanced Rendering Options

For complex Yelp pages, add browser interactions:

{
  "Render": {
    "BlockResources": false,
    "WaitUntil": "networkidle2",
    "CustomWait": 5000,
    "WaitSelector": ".biz-page-header",
    "PlayWithBrowser": [
      { "Action": "Scroll", "Direction": "down" },
      { "Action": "Wait", "Timeout": 2000 }
    ]
  }
}

Notification Integration

Add alert mechanisms:

Email notifications for completed scrapes
Slack messages for team updates
Webhook triggers for external systems

Error Handling

Common Issues

Issue	Cause	Solution
Invalid URL	URL is not a valid Yelp business page	Ensure URL format: `https://www.yelp.com/biz/...`
401 Unauthorized	Invalid or missing API token	Verify `X-Token` header value
Job Timeout	Page too complex or slow	Increase `CustomWait` value
Empty Data	HTML parsing failed	Check page structure, update regex patterns
Rate Limiting	Too many concurrent requests	Reduce request frequency or upgrade plan

Troubleshooting Steps

Verify URLs: Ensure Yelp business URLs are correctly formatted
Check Credentials: Validate Scrape.do token and Google OAuth
Monitor Logs: Review n8n execution logs for detailed errors
Test Connectivity: Verify network access to all external services
Check Job Status: Use Scrape.do dashboard to monitor job progress

Performance Specifications

Metric	Value
Processing Time	15-45 seconds per business URL
Data Accuracy	95%+ for publicly available information
Success Rate	99.98% (with Scrape.do anti-bot bypass)
Concurrent Processing	Depends on Scrape.do plan limits
Storage Capacity	Unlimited (Google Sheets based)
Proxy Pool	110M+ IPs across 150 countries

Scrape.do API Reference

Async API Endpoints

Endpoint	Method	Purpose
`/api/v1/jobs`	POST	Create new scraping job
`/api/v1/jobs/{jobID}`	GET	Check job status
`/api/v1/jobs/{jobID}/{taskID}`	GET	Retrieve task results
`/api/v1/me`	GET	Get account information

Job Status Values

Status	Description
`queuing`	Job is being prepared
`queued`	Job is in queue waiting to be processed
`pending`	Job is currently being processed
`rotating`	Job is retrying with different proxies
`success`	Job completed successfully
`error`	Job failed
`canceled`	Job was canceled by user

For complete API documentation, visit: Scrape.do Documentation

Support & Resources

Scrape.do Documentation: https://scrape.do/documentation/
Scrape.do Dashboard: https://dashboard.scrape.do/
n8n Documentation: https://docs.n8n.io/
Google Sheets API: https://developers.google.com/sheets/api

This workflow is powered by Scrape.do - Reliable, Scalable, Unstoppable Web Scraping

Onur

0 workflows

Nodes

set gmail telegram agent google-gemini

Complexity

advanced

Published 03 Dec 2025

Likes 0

View on n8n.io Download Workflow

✨

Share Your Workflow

Have a great workflow to share? Join the n8n Creator Hub and help the community!

Submit Your Template How to Submit

Related Workflows

Draft and manage academic research papers with GPT-4 and Pinecone

## How It Works This workflow automates academic research processing by routing queries through specialized AI models while maintaining contextual memory. Designed for researchers, faculty, and graduate students, it solves the challenge of managing multiple AI models for different research tasks while preserving conversation context across sessions. The system accepts research queries via webhook, stores them in vector databases for semantic search, and intelligently routes requests to appropriate AI models (OpenAI, Anthropic Claude, or NVIDIA NIM). Results are consolidated, formatted, and delivered via email with full citation tracking. The workflow maintains conversation history using Pinecone vector storage, enabling follow-up queries that reference previous interactions. This eliminates manual model switching, context loss, and repetitive credential management—streamlining research workflows from literature review to hypothesis generation. ## Setup Steps 1. Configure Pinecone credentials 2. Add OpenAI API key for GPT-4 access and embeddings 3. Set up Anthropic Claude API credentials for advanced reasoning 4. Configure NVIDIA NIM API key for specialized academic models 5. Connect Google Sheets for query logging and result tracking 6. Set Gmail OAuth credentials for automated result delivery 7. Configure webhook URL for query submission endpoint ## Prerequisites Active accounts and API keys for Pinecone, OpenAI ## Use Cases Literature review automation with semantic paper discovery. ## Customization Modify AI model selection logic for domain-specific optimization. ## Benefits Reduces research processing time by 60% through automated routing.

View

Scrape Trustpilot reviews 📊 with ScrapegraphAI and OpenAI Reputation analysis

This workflow automates the **collection, analysis, and reporting of Trustpilot reviews** for a specific company, transforming unstructured customer feedback into **structured insights and actionable intelligence**. --- ### Key Advantages #### 1. ✅ End-to-End Automation The entire process—from scraping reviews to delivering a polished management report—is fully automated, eliminating manual data collection and analysis . #### 2. ✅ Structured Insights from Unstructured Data The workflow transforms raw, unstructured review text into structured fields and standardized sentiment categories, making analysis reliable and repeatable. #### 3. ✅ Company-Level Reputation Intelligence Instead of focusing on individual products, the analysis evaluates the **overall brand, service quality, customer experience, and operational performance**, which is critical for leadership and strategic teams. #### 4. ✅ Action-Oriented Outputs The AI-generated report goes beyond summaries by: * Identifying reputational risks * Highlighting improvement opportunities * Proposing concrete actions with priorities, effort estimates, and KPIs #### 5. ✅ Visual & Executive-Friendly Reporting Automatic sentiment charts and structured executive summaries make insights immediately understandable for non-technical stakeholders. #### 6. ✅ Scalable and Configurable * Easily adaptable to different companies or review volumes * Page limits and batching protect against rate limits and excessive API usage #### 7. ✅ Cross-Team Value The output is tailored for multiple internal teams: * Management * Marketing * Customer Support * Operations * Product & UX --- ### Ideal Use Cases * Brand reputation monitoring * Voice-of-the-customer programs * Executive reporting * Customer experience optimization * Competitive benchmarking (by reusing the workflow across brands) --- ### **How It Works** This workflow automates the complete process of scraping Trustpilot reviews, extracting structured data, analyzing sentiment, and generating comprehensive reports. The workflow follows this sequence: 1. **Trigger & Configuration**: The workflow starts with a manual trigger, allowing users to set the target company URL and the number of review pages to scrape. 2. **Review Scraping**: An HTTP request node fetches review pages from Trustpilot with pagination support, extracting review links from the HTML content. 3. **Review Processing**: The workflow processes individual review pages in batches (limited to 5 reviews per execution for efficiency). Each review page is converted to clean markdown using ScrapegraphAI. 4. **Data Extraction**: An information extractor using OpenAI's GPT-4.1-mini model parses the markdown to extract structured review data including author, rating, date, title, text, review count, and country. 5. **Sentiment Analysis**: Another OpenAI model performs sentiment classification on each review text, categorizing it as Positive, Neutral, or Negative. 6. **Data Aggregation**: Processed reviews are collected and compiled into a structured dataset. 7. **Analytics & Visualization**: - A pie chart is generated showing sentiment distribution - A comprehensive reputation analysis report is created using an AI agent that evaluates company-level insights, recurring themes, and provides actionable recommendations 8. **Reporting & Delivery**: The analysis is converted to HTML format and sent via email, providing stakeholders with immediate insights into customer feedback and company reputation. ## **Set Up Steps** To configure and run this workflow: 1. **Credential Setup**: - Configure OpenAI API credentials for the chat models and information extraction - Set up ScrapegraphAI credentials for webpage-to-markdown conversion - Configure Gmail OAuth2 credentials for email notifications 2. **Company Configuration**: - In the "Set Parameters" node, update `company_id` to the target Trustpilot company URL - Adjust `max_page` to control how many review pages to scrape 3. **Review Processing Limits**: - The "Limit" node restricts processing to 5 reviews per execution to manage API costs and processing time - Adjust this value based on your needs and OpenAI usage limits 4. **Email Configuration**: - Update the "Send a message" node with the recipient email address - Customize the email subject and content as needed 5. **Analysis Customization**: - Modify the prompt in the "Company Reputation Analyst" node to tailor the report format - Adjust sentiment analysis categories if different classification is needed 6. **Execution**: - Click "Test workflow" to execute the manual trigger - Monitor execution in the n8n editor to ensure all API calls succeed - Check the configured email inbox for the generated report **Note**: Be mindful of API rate limits and costs associated with OpenAI and ScrapegraphAI services when processing large numbers of reviews. The workflow includes a 5-second delay between paginated requests to comply with Trustpilot's terms of service. --- 👉 [Subscribe to my new **YouTube channel**](https://youtube.com/@n3witalia). Here I’ll share videos and Shorts with practical tutorials and **FREE templates for n8n**. [![image](https://n3wstorage.b-cdn.net/n3witalia/youtube-n8n-cover.jpg)](https://youtube.com/@n3witalia) --- ### **Need help customizing?** [Contact me](mailto:[email protected]) for consulting and support or add me on [Linkedin](https://www.linkedin.com/in/davideboizza/).

View

Monitor multi-city weather with OpenWeatherMap, GPT-4o-mini, and Discord

## Weather Monitoring Across Multiple Cities with OpenWeatherMap, GPT-4o-mini, and Discord This workflow provides an automated, intelligent solution for global weather monitoring. It goes beyond simple data fetching by calculating a custom "Comfort Index" and using AI to provide human-like briefings and activity recommendations. Whether you are managing remote teams or planning travel, this template centralizes complex environmental data into actionable insights. ## Who’s it for - **Remote Team Leads:** Keep an eye on environmental conditions for team members across different time zones. - **Frequent Travelers & Event Planners:** Monitor weather risks and comfort levels for multiple destinations simultaneously. - **Smart Home/Life Enthusiasts:** Receive daily morning briefings on air quality and weather alerts directly in Discord. ## How it works 1. **Schedule Trigger:** The workflow runs every 6 hours (customizable) to ensure data is up to date. 2. **Data Collection:** It loops through a list of cities, fetching current weather, 5-day forecasts, and Air Quality Index (AQI) data via the **OpenWeatherMap node** and **HTTP Request node**. 3. **Smart Processing:** A **Code node** calculates a "Comfort Index" (based on temperature and humidity) and flags specific alerts (e.g., extreme heat, high winds, or poor AQI). 4. **AI Analysis:** The **OpenAI node** (using GPT-4o-mini) analyzes the aggregated data to compare cities and recommend the best location for outdoor activities. 5. **Conditional Routing:** An **If node** checks for active weather alerts. Urgent alerts are routed to a specific Discord notification, while routine briefings are sent normally. 6. **Archiving:** All processed data is appended to **Google Sheets** for historical tracking and future analysis. ## How to set up 1. **Credentials:** Connect your OpenWeatherMap, OpenAI, Discord (Webhook), and Google Sheets accounts. 2. **Locations:** Open the **'Set Monitoring Locations'** node and edit the JSON array with the cities, latitudes, and longitudes you wish to track. 3. **Google Sheets:** Configure the **'Log to Google Sheets'** node with your specific Spreadsheet ID and Sheet Name. 4. **Discord:** Ensure your Webhook URL is correctly pasted into the **Discord nodes**. ## Requirements - **OpenWeatherMap API Key** (Free tier is sufficient). - **OpenAI API Key** (Configured for GPT-4o-mini). - **Discord Webhook URL**. - **Google Sheet** with headers ready for logging. ## How to customize - **Adjust Alert Thresholds:** Modify the logic in the 'Process and Analyze Data' Code node to change what triggers a "High Wind" or "Extreme Heat" alert. - **Refine AI Persona:** Edit the System Prompt in the 'AI Weather Analysis' node to change the tone or focus of the weather briefing. - **Change Frequency:** Adjust the Schedule Trigger to run once a day or every hour depending on your needs.

View

👨‍💻

Need Custom Automation?

N8N Automation Expert

Specialized in N8N automation, I design custom workflows that connect your tools and automate your processes.