Core ConceptsEvaluations

Evaluations

Automatic per-call scoring across accuracy and experience — so you can trust and improve your agents.

Evaluations score the quality of your agent's calls automatically. Each call is rated on several dimensions, grouped into accuracy (did it get things right?) and experience (was it a good conversation?), then rolled up into an overall score.

Accuracy

Experience

How scoring works

Each dimension is scored from 0 to 1 and averaged into an overall score; a call passes at 0.80 and above. Most dimensions are judged by a model; tool-call validity and correct finish are checked by rules.

How you use it

Turn it on per agent

Enable evaluation for an agent so its calls are scored automatically.

Run or review

Scores appear on calls; you can also run a test evaluation (text or audio) from the editor.

Read the breakdown

See the score per dimension, the strengths, and the specific issues to fix — then refine the agent and re-check.

Optional deeper checks — completeness, staying in character, and following the prompt — can run when you want a self-improvement pass. Evaluation is the quality loop behind how Intrlume works: measure, fix, and run again.