Evaluations
Automatic per-call scoring across accuracy and experience — so you can trust and improve your agents.
Evaluations score the quality of your agent's calls automatically. Each call is rated on several dimensions, grouped into accuracy (did it get things right?) and experience (was it a good conversation?), then rolled up into an overall score.
Accuracy
Faithfulness
Sticks to your content and instructions — no making things up.
Tool-call validity
Calls tools correctly, with valid inputs.
Correct finish
Ends the conversation cleanly instead of trailing off.
Experience
Conciseness
Gets to the point without rambling.
Conversation progression
Keeps the conversation moving toward its goal.
Turn-taking
Takes turns naturally, without talking over the caller.
Speakability
Says things that sound natural spoken aloud.
How scoring works
Each dimension is scored from 0 to 1 and averaged into an overall score; a call passes at 0.80 and above. Most dimensions are judged by a model; tool-call validity and correct finish are checked by rules.
How you use it
Turn it on per agent
Enable evaluation for an agent so its calls are scored automatically.
Run or review
Scores appear on calls; you can also run a test evaluation (text or audio) from the editor.
Read the breakdown
See the score per dimension, the strengths, and the specific issues to fix — then refine the agent and re-check.
Optional deeper checks — completeness, staying in character, and following the prompt — can run when you want a self-improvement pass. Evaluation is the quality loop behind how Intrlume works: measure, fix, and run again.