Move cursor | Click to ripple
Move cursor | Click to ripple
Resources

The AI Agent QA glossary.

Plain-language definitions for the terms that come up when you score human and AI agents: QA, coaching, the architecture underneath, compliance, and the metrics that decide whether any of it worked.

Compliance

Data Privacy Framework DPF

The mechanism that succeeded Safe Harbor and Privacy Shield for transferring personal data from the EEA, UK, and Switzerland to the US, used alongside Standard Contractual Clauses. See Data Transfers.

ISO 42001

The international management-system standard for artificial intelligence, addressing how AI is governed, risk-assessed, and operated responsibly, a 2026 forward marker for AI-era QA.

PCI DSS

The Payment Card Industry Data Security Standard governing how card data is handled. Relevant to any program that takes payments over the phone or chat.

PHI Protected Health Information

Health information protected under HIPAA. In a contact center it must be detected and redacted, and its handling scored, on every relevant interaction.

PII redaction

Removing personally identifiable information from conversations at ingest, before it is stored, so sensitive data is protected from the first moment. See Compliance.

SOC 2 Type II

An independent attestation that an organization’s security controls operated effectively over a period of time, a baseline expectation in enterprise security due diligence. See Security and Trust.

Metrics

Average Handle Time AHT

The average duration of a customer interaction including talk, hold, and after-call work. Lower is usually better, but only when quality and resolution hold.

Churn risk

The likelihood that a customer will leave, inferred in part from the sentiment trajectory and signals within a conversation, so at-risk customers can be flagged in time.

Cost of Poor Quality COPQ

The downstream cost of quality failures: compliance penalties, churn, repeat contacts, and chargebacks. Full-coverage scoring is meant to catch the drivers before they compound.

Customer Satisfaction CSAT

A measure of how satisfied a customer was with an interaction, typically from a post-contact survey. QEval® also predicts CSAT from the conversation itself.

First Contact Resolution FCR

The share of issues resolved in a single interaction, with no callback or repeat contact. A strong proxy for both efficiency and customer experience.

Net Promoter Score NPS

A measure of how likely customers are to recommend a brand, scored from detractors through promoters. Often the executive-level read on loyalty.

Commitments

120-day ROI

A measured customer-average outcome: the point by which the platform has typically paid back. It is a documented result, not a contractual provision. See the ROI calculator.

30-day deployment

QEval®’s contractual commitment to be live within thirty days, against an industry norm measured in quarters.

60-day exit

A contractual right to exit within sixty days with no penalty, a commitment most vendors do not publish.

94%+ accuracy SLA

QEval®’s contractual classification-accuracy commitment, written into the agreement rather than asserted in marketing.

Outcome attribution

Connecting a coached behavior change to the business result it produced, so improvement can be tied to outcomes rather than left as unproven activity.

Six Layers of Intelligence

QEval®’s framework for the value in every interaction: Quality and Compliance, Customer, Revenue, Operational, Training, and Strategic intelligence. QA is Layer 1; the other five drive 82% of the value. See Six Layers.

AI agents

AI Agent

An autonomous or assistive system that handles customer conversations, from vendor platforms such as Sierra, Decagon, Agentforce, Ada, and Forethought to in-house GenAI. QEval® scores them the way it scores people.

AI drift

When an AI agent’s behavior degrades over time, for example repeating the same phrasing across turns or sliding away from approved answers. QEval® flags drift as a scored risk signal.

Containment

The share of conversations an AI agent resolves without handing off to a human. Containment only matters if the contained conversations were actually handled well, which is why it is paired with QA scoring.

Hallucination

When an AI agent states something false or unsupported with confidence. In a contact center this can mean inventing a policy, a price, or an entitlement, which is why it is scored as a compliance and accuracy risk.

off-script

When an agent, human or AI, departs from the approved flow or required disclosures. Off-script behavior can be acceptable or a compliance miss, which is exactly what scoring is meant to tell apart.

Vendor-neutral scoring

Scoring every human and AI agent on one scorecard regardless of which platform built the agent, so quality, brand voice, and compliance are measured consistently across a mixed estate. See Why QEval®.

Scoring & QA

AI Agent QA

Quality assurance applied to AI agents and chatbots, not just humans. The same scorecard checks an AI agent for compliance, brand voice, resolution, and risks such as hallucination and drift. See AI Agent QA.

Auto QA

Automated quality assurance. Every interaction is scored against a customer-defined scorecard by AI, then the lowest-scoring and outlier conversations are routed to human calibration, rather than a QA analyst manually reviewing a small random sample. See Auto QA.

Calibration

The process of keeping AI scores aligned with what a supervisor would decide, by reviewing a slice of scored conversations and tuning the model. QEval® targets a calibration gap under two percent.

Full Coverage

Scoring every interaction rather than a 2 to 5 percent manual sample. Full coverage removes sample bias, gives complete compliance defensibility, and targets coaching at the conversations that actually need it.

Sampling

The traditional QA practice of manually reviewing a small random share of interactions, often 2 to 5 percent. The blind spot is everything outside the sample, where most compliance and coaching signal hides.

Scorecard

The structured set of criteria a conversation is graded against, covering categories such as compliance, empathy, resolution, and brand voice. QEval® applies the same scorecard to both human and AI agents.

Speech analytics

Analysis of voice interactions, including transcription, sentiment, talk and silence patterns, and keyword or phrase detection, used as scoring inputs alongside text channels.

Vision model

A model that scores what an agent or AI did on screen, for processes that play out visually rather than only in words, extending QA beyond the transcript.

Coaching

Auto Coach

AI-generated coaching tied to the specific moment in a scored conversation that needs it, so feedback is concrete and evidence-backed rather than generic.

Coaching lifecycle

The loop from score, to diagnosis, to targeted coaching, to reinforcement, to measured behavior change. QEval® turns scores into action instead of leaving them in a report. See Coaching.

Gamification

Using goals, recognition, and friendly competition to reinforce the behaviors coaching is trying to build, applied carefully so it rewards quality rather than gaming the metric.

HI Model

QEval®’s coaching approach that pairs the score with the human side of behavior change, attributing each change up through the Six Layers so coaching can be tied to outcomes, not just activity.

Next best action NBA

The single highest-impact thing to coach or do next for a given agent, ranked by predicted impact rather than by whatever the last call happened to surface.

Real-Time Agent Assist RTAA

Help delivered to the agent during the live conversation: surfacing knowledge, checking compliance, drafting notes, so the agent can focus on the customer while QEval®Eval handles the rest. See RTAA.

Architecture

Classification

The act of labeling part of a conversation, for example tagging a disclosure, an empathy moment, or a risk signal. QEval® runs about 326 million classifications every five minutes.

Intent Recognition

Detecting what a customer is actually trying to do, for example cancel, dispute a charge, or escalate, so the interaction can be scored and routed against the right expectations.

Large language model LLM

A model trained on large text corpora to understand and generate language. QEval® uses purpose-trained expert models rather than a generic LLM wrapper, and customer data does not enter a third-party foundation model’s training loop.

Mixture of Experts MoE

An architecture that routes each scoring task to a purpose-trained expert sub-model rather than relying on one general-purpose model. QEval® uses a proprietary MoE to score at 94%+ accuracy. See the platform.

Named entity recognition NER

Identifying entities such as names, account numbers, and card numbers in a conversation. NER underpins both analytics and the redaction of sensitive data.

Natural language understanding NLU

The branch of AI that interprets meaning, intent, and sentiment in human language, the foundation under intent recognition and sentiment analysis.

Sentiment analysis

Measuring the emotional trajectory of a conversation across turns, used to predict CSAT, flag churn risk, and pinpoint the moment an interaction turned.

Vocabulary Library

The curated set of phrases, intents, and compliance language QEval® recognizes, built and tuned from real contact center conversations. With the classification engine, it is the operational moat behind the scores.

No terms match that search.
These definitions are written for general understanding of AI Agent QA. For the figures behind QEval, see Why QEval and the platform overview. Industry context here is general background, not QEval results.