Api schema extractor
Workflow preview
$20/month : Unlimited workflows
2500 executions/month
THE #1 IN WEB SCRAPING
Scrape any website without limits
HOSTINGER
Early Deal
DISCOUNT 20% Try free
DISCOUNT 20%
Self-hosted n8n
Unlimited workflows - from $4.99/mo
#1 hub for scraping, AI & automation
6000+ actors - $5 credits/mo
Important notice
This workflow is provided as-is. Please review and test before using in production.
Overview
This workflow automates the process of discovering and extracting APIs from various services, followed by generating custom schemas. It works in three distinct stages: research, extraction, and schema generation, with each stage tracking progress in a Google Sheet.
š Jim Le deserves major kudos for helping to build this sophisticated three-stage workflow that cleverly automates API documentation processing using a smart combination of web scraping, vector search, and LLM technologies.
How it works
Stage 1 - Research:
- Fetches pending services from a Google Sheet
- Uses Google search to find API documentation
- Employs Apify for web scraping to filter relevant pages
- Stores webpage contents and metadata in Qdrant (vector database)
- Updates progress status in Google Sheet (pending, ok, or error)
Stage 2 - Extraction:
- Processes services that completed research successfully
- Queries vector store to identify products and offerings
- Further queries for relevant API documentation
- Uses Gemini (LLM) to extract API operations
- Records extracted operations in Google Sheet
- Updates progress status (pending, ok, or error)
Stage 3 - Generation:
- Takes services with successful extraction
- Retrieves all API operations from the database
- Combines and groups operations into a custom schema
- Uploads final schema to Google Drive
- Updates final status in sheet with file location
Ideal for:
- Development teams needing to catalog multiple APIs
- API documentation initiatives
- Creating standardized API schema collections
- Automating API discovery and documentation
Accounts required:
- Google account (for Sheets and Drive access)
- Apify account (for web scraping)
- Qdrant database
- Gemini API access
Set up instructions:
- Prepare your Google Sheets document with the services information. Here's an example of a Google Sheet ā you can copy it and change or remove the values under the columns. Also, make sure to update Google Sheets nodes with the correct Google Sheet ID.
- Configure Google Sheets OAuth2 credentials, required third-party services (Apify, Qdrant) and Gemini.
- Ensure proper permissions for Google Drive access.