Skip to main content

Evaluation metric example: categorization

Workflow preview

Evaluation metric example: categorization preview
Open on n8n.io

Important notice

This workflow is provided as-is. Please review and test before using in production.

Overview

AI evaluation in n8n

This is a template for n8n's evaluation feature.

Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.

By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.

How it works

This template shows how to calculate a workflow evaluation metric: whether a category matches the expected one.

The workflow takes support tickets and generates a category and priority, which is then compared with the correct answers in the dataset.

  • We use an evaluation trigger to read in our dataset
  • It is wired up in parallel with the regular trigger so that the workflow can be started from either one. More info
  • Once the category is generated by the agent, we check whether it matches the expected one in the dataset
  • Finally we pass this information back to n8n as a metric