Evaluation metric example: String similarity
$20/month : Unlimited workflows
2500 executions/month
THE #1 IN WEB SCRAPING
Scrape any website without limits
HOSTINGER 🎉 Early Black Friday Deal
DISCOUNT 20% Try free
DISCOUNT 20%
Self-hosted n8n
Unlimited workflows - from $4.99/mo
#1 hub for scraping, AI & automation
6000+ actors - $5 credits/mo
AI evaluation in n8n
This is a template for n8n's evaluation feature.
Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.
By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.
How it works
This template shows how to calculate a workflow evaluation metric: text similarity, measured character-by-character.
The workflow takes images of hand-written codes, extracts the code and compares it with the expected answer from the dataset.
The images look like this:

The workflow works as follows:
- We use an evaluation trigger to read in our dataset
- It is wired up in parallel with the regular trigger so that the workflow can be started from either one. More info
- We download the image and use AI to extract the code
- If we’re evaluating (i.e. the execution started from the evaluation trigger), we calculate the string distance metric
- We pass this information back to n8n as a metric