Skip to main content

Route AI queries cost‐efficiently with GPT‐4o‐mini, GPT‐4o and confidence scoring

Workflow preview

Route AI queries cost‐efficiently with GPT‐4o‐mini, GPT‐4o and confidence scoring preview
Open on n8n.io

Overview

This workflow implements a cost-optimized AI routing system using n8n. It intelligently decides whether a request should be handled by a low-cost model or escalated to a higher-quality model based on response confidence.

The goal is to minimize LLM usage costs while maintaining high answer quality.

A query is first processed by a cheaper model. The response is then evaluated by a confidence-scoring AI agent. If the response quality is insufficient, the workflow automatically escalates the request to a more capable model.

This approach is useful for building scalable AI systems where most queries can be answered cheaply, while complex queries still receive high-quality responses.


How It Works

  1. Webhook Trigger
  • Receives a user query from an external application.
  1. Workflow Configuration
  • Defines parameters such as:
  • confidence threshold
  • cheap model cost
  • expensive model cost
  1. Cheap Model Response
  • The query is first processed using GPT-4o-mini to minimize cost.
  1. Confidence Evaluation
  • An AI agent analyzes the response quality.
  • It evaluates accuracy, completeness, clarity, and relevance.
  1. Structured Output Parsing
  • The evaluator returns structured data including:
  • confidence score
  • explanation
  • escalation recommendation.
  1. Decision Logic
  • If the confidence score is below the configured threshold, the workflow escalates the request.
  1. Expensive Model Escalation
  • The query is reprocessed using GPT-4o for a higher-quality answer.
  1. Cost Calculation
  • Token usage is analyzed to estimate:
  • total cost
  • cost difference between models.
  1. Final Response Formatting
  • The workflow returns:
  • AI response
  • model used
  • confidence score
  • escalation status
  • estimated cost.

Setup Instructions

  1. Create an OpenAI credential in n8n.

  2. Configure the following nodes:

  • Cheap Model (GPT-4o-mini)
  • Expensive Model (GPT-4o)
  • OpenAI Chat Model used by the confidence evaluator agent.
  1. Adjust configuration values in the Workflow Configuration node:
  • confidenceThreshold
  • cheapModelCostPer1kTokens
  • expensiveModelCostPer1kTokens
  1. Deploy the workflow and send requests to the Webhook URL.

Example webhook payload:

{
 "query": "Explain how photosynthesis works."
}