Evaluations

Automatic per-call scoring across accuracy and experience — so you can trust and improve your agents.

Evaluations score the quality of your agent's calls automatically. Each call is rated on several dimensions, grouped into accuracy (did it get things right?) and experience (was it a good conversation?), then rolled up into an overall score.

Accuracy

Faithfulness

Sticks to your content and instructions — no making things up.

Tool-call validity

Calls tools correctly, with valid inputs.

Correct finish

Ends the conversation cleanly instead of trailing off.

Experience

Conciseness

Gets to the point without rambling.

Conversation progression

Keeps the conversation moving toward its goal.

Turn-taking

Takes turns naturally, without talking over the caller.

Speakability

Says things that sound natural spoken aloud.

How scoring works

Each dimension is scored from 0 to 1 and averaged into an overall score; a call passes at 0.80 and above. Most dimensions are judged by a model; tool-call validity and correct finish are checked by rules.

How you use it

Turn it on per agent

Enable evaluation for an agent so its calls are scored automatically.

Run or review

Scores appear on calls; you can also run a test evaluation (text or audio) from the editor.

Read the breakdown

See the score per dimension, the strengths, and the specific issues to fix — then refine the agent and re-check.

Optional deeper checks — completeness, staying in character, and following the prompt — can run when you want a self-improvement pass. Evaluation is the quality loop behind how Intrlume works: measure, fix, and run again.

Was this page helpful?