Skip to main content
L

Le Thua Phu

1
Workflow

Workflows by Le Thua Phu

Workflow preview: Extract Website URLs from Sitemap.XML for SEO Analysis
Free intermediate

Extract Website URLs from Sitemap.XML for SEO Analysis

## Overview This n8n workflow automates the process of crawling a website's sitemap to extract URLs, which is particularly useful for SEO analysis, website auditing, or content monitoring. By leveraging n8n's nodes, the workflow fetches the sitemap from a specified URL, processes the XML data, and extracts individual URLs, which can then be converted into a downloadable file or integrated with tools like Google Sheets. ## How It Works The workflow operates in a sequential manner, utilizing a series of nodes to fetch, parse, and process sitemap data: 1. **Trigger**: Initiates when the user clicks "Test workflow" (`Manual Trigger` node). 2. **Set URL**: Defines the base domain (e.g., `https://phu.io.vn/`) for the sitemap (`Set URL` node). 3. **Crawl Sitemap**: Fetches the main sitemap file (`sitemap.xml`) from the specified domain using an HTTP request (`Crawl sitemap` node). 4. **Parse XML**: Converts the sitemap XML into a JSON format for easier processing (`XML` node). 5. **Split Sitemap**: Extracts individual sitemap entries (e.g., `<sitemap>` tags) from the parsed data (`Split Out` node). 6. **Crawl Sub-Sitemap**: Fetches each sub-sitemap URL listed in the main sitemap (`Crawl sitemap 2` node). 7. **Parse Sub-Sitemap XML**: Converts the sub-sitemap XML into JSON (`XML 2` node). 8. **Split URLs**: Extracts individual URLs (e.g., `<url>` tags) from the sub-sitemap (`Split Out 2` node). 9. **Convert to File**: Saves the extracted URLs into a file for download or further use (`Convert to File` node). This workflow supports both single sitemap files and sitemap indexes that reference multiple sub-sitemaps, ensuring comprehensive URL extraction. ## How to Use To implement this workflow in n8n, follow these steps: 1. **Set Up n8n**: Ensure you have an active n8n instance (Cloud, npm, or self-hosted). Refer to the [n8n documentation](https://docs.n8n.io/choose-n8n/) for setup instructions. 2. **Import Workflow**: Copy the JSON from the provided `Extract Website URLs from Sitemap.XML for SEO Analysis.json` file and import it into your n8n instance via the workflow editor. 3. **Configure the Domain**: - In the `Set URL` node, update the `Domain` parameter with the target website's base URL (e.g., `https://example.com/`). - Alternatively, in the `Crawl sitemap` node, directly paste the full sitemap URL if known (e.g., `https://example.com/sitemap.xml`). 4. **Test the Workflow**: - Click "Test workflow" to execute the `Manual Trigger` node. - Verify that the workflow fetches the sitemap and processes the URLs correctly. 5. **Download or Integrate**: - The `Convert to File` node generates a file containing the extracted URLs. - Optionally, replace this node with a Google Sheets node to append URLs to a spreadsheet. Refer to the [Google Sheets node documentation](https://docs.n8n.io/integrations/builtin/app-nodes/n8n-nodes-base.googlesheets/) for setup. 6. **Save and Activate**: Save the workflow and activate it for production use if needed, using a trigger like a schedule or webhook (see [Trigger Node](https://docs.n8n.io/glossary/#trigger-node-n8n)). ## Requirements - **n8n Instance**: An active n8n instance (version 1.0 or later recommended) on n8n Cloud, npm, or self-hosted (Docker). See [Choose your n8n](https://docs.n8n.io/choose-n8n/) for details. - **Technical Knowledge**: Basic understanding of n8n's editor UI and node configuration. Familiarity with XML sitemaps is helpful but not mandatory. - **Permissions**: For self-hosted setups, ensure the n8n process has network access to fetch the sitemap URL. For Docker deployments, verify permissions as outlined in the [n8n v1.0 migration guide](https://docs.n8n.io/1-0-migration-checklist/#docker). - **Optional**: If integrating with Google Sheets, valid Google Sheets credentials are required (see [Credentials](https://docs.n8n.io/credentials/)). - **Timeout Configuration**: The HTTP Request nodes (`Crawl sitemap` and `Crawl sitemap 2`) have a 10-second timeout. Adjust the `timeout` parameter in the node settings if dealing with slow-responding servers. ## FAQ **Q: What happens if the sitemap is large or contains many sub-sitemaps?** A: The workflow handles sitemap indexes by splitting and processing each sub-sitemap individually. For very large sitemaps, ensure your n8n instance has sufficient resources (memory and CPU) to avoid performance issues. See [Scaling n8n](https://docs.n8n.io/hosting/scaling/) for optimization tips. **Q: Can I use this workflow with a specific sitemap URL instead of a domain?** A: Yes, in the `Crawl sitemap` node, replace the `url` parameter (`{{ $json.Domain }}sitemap.xml`) with the direct sitemap URL (e.g., `https://example.com/sitemap.xml`). Update the node’s notes for clarity. **Q: Why am I getting a timeout error?** A: The HTTP Request nodes have a default timeout of 10 seconds. If the target server is slow, increase the `timeout` value in the `options` parameter of the `Crawl sitemap` or `Crawl sitemap 2` nodes. **Q: How can I save the URLs to Google Sheets instead of a file?** A: Replace the `Convert to File` node with a Google Sheets node. Configure it with your Google Sheets credentials and map the `loc` field from the `Split Out 2` node to the desired spreadsheet column. Refer to the [Google Sheets node documentation](https://docs.n8n.io/integrations/builtin/app-nodes/n8n-nodes-base.googlesheets/). **Q: Is this workflow compatible with older n8n versions?** A: The workflow uses nodes compatible with n8n version 1.0 and later. For older versions, check for deprecated features (e.g., MySQL support) in the [n8n v1.0 migration guide](https://docs.n8n.io/1-0-migration-checklist/). **Q: Can I automate this workflow to run periodically?** A: Yes, replace the `Manual Trigger` node with a `Schedule Trigger` node to run the workflow at set intervals. See [Trigger Nodes](https://docs.n8n.io/glossary/#trigger-node-n8n) for configuration details. For further assistance, consult the [n8n Community Forum](https://community.n8n.io/) or submit an issue on the [n8n GitHub repository](https://github.com/n8n-io/n8n). ## Need help customizing? Contact me for consulting and support or add me on [Facebook](https://www.facebook.com/lethuaphu) or [email](mailto:[email protected]).

L
Le Thua Phu
Market Research
5 Jun 2025
6438
0