The AI Agent QA glossary.
Plain-language definitions for the terms that come up when you score human and AI agents: QA, coaching, the architecture underneath, compliance, and the metrics that decide whether any of it worked.
Compliance
Data Privacy Framework DPF
The mechanism that succeeded Safe Harbor and Privacy Shield for transferring personal data from the EEA, UK, and Switzerland to the US, used alongside Standard Contractual Clauses. See Data Transfers.
ISO 42001
The international management-system standard for artificial intelligence, addressing how AI is governed, risk-assessed, and operated responsibly, a 2026 forward marker for AI-era QA.
PCI DSS
The Payment Card Industry Data Security Standard governing how card data is handled. Relevant to any program that takes payments over the phone or chat.
PHI Protected Health Information
Health information protected under HIPAA. In a contact center it must be detected and redacted, and its handling scored, on every relevant interaction.
PII redaction
Removing personally identifiable information from conversations at ingest, before it is stored, so sensitive data is protected from the first moment. See Compliance.
SOC 2 Type II
An independent attestation that an organization’s security controls operated effectively over a period of time, a baseline expectation in enterprise security due diligence. See Security and Trust.
Metrics
Average Handle Time AHT
The average duration of a customer interaction including talk, hold, and after-call work. Lower is usually better, but only when quality and resolution hold.
Churn risk
The likelihood that a customer will leave, inferred in part from the sentiment trajectory and signals within a conversation, so at-risk customers can be flagged in time.
Cost of Poor Quality COPQ
The downstream cost of quality failures: compliance penalties, churn, repeat contacts, and chargebacks. Full-coverage scoring is meant to catch the drivers before they compound.
Customer Satisfaction CSAT
A measure of how satisfied a customer was with an interaction, typically from a post-contact survey. QEval® also predicts CSAT from the conversation itself.
First Contact Resolution FCR
The share of issues resolved in a single interaction, with no callback or repeat contact. A strong proxy for both efficiency and customer experience.
Net Promoter Score NPS
A measure of how likely customers are to recommend a brand, scored from detractors through promoters. Often the executive-level read on loyalty.
Commitments
120-day ROI
A measured customer-average outcome: the point by which the platform has typically paid back. It is a documented result, not a contractual provision. See the ROI calculator.
30-day deployment
QEval®’s contractual commitment to be live within thirty days, against an industry norm measured in quarters.
60-day exit
A contractual right to exit within sixty days with no penalty, a commitment most vendors do not publish.
94%+ accuracy SLA
QEval®’s contractual classification-accuracy commitment, written into the agreement rather than asserted in marketing.
Outcome attribution
Connecting a coached behavior change to the business result it produced, so improvement can be tied to outcomes rather than left as unproven activity.
Six Layers of Intelligence
QEval®’s framework for the value in every interaction: Quality and Compliance, Customer, Revenue, Operational, Training, and Strategic intelligence. QA is Layer 1; the other five drive 82% of the value. See Six Layers.
AI agents
AI Agent
An autonomous or assistive system that handles customer conversations, from vendor platforms such as Sierra, Decagon, Agentforce, Ada, and Forethought to in-house GenAI. QEval® scores them the way it scores people.
AI drift
When an AI agent’s behavior degrades over time, for example repeating the same phrasing across turns or sliding away from approved answers. QEval® flags drift as a scored risk signal.
Containment
The share of conversations an AI agent resolves without handing off to a human. Containment only matters if the contained conversations were actually handled well, which is why it is paired with QA scoring.
Hallucination
When an AI agent states something false or unsupported with confidence. In a contact center this can mean inventing a policy, a price, or an entitlement, which is why it is scored as a compliance and accuracy risk.
off-script
When an agent, human or AI, departs from the approved flow or required disclosures. Off-script behavior can be acceptable or a compliance miss, which is exactly what scoring is meant to tell apart.
Vendor-neutral scoring
Scoring every human and AI agent on one scorecard regardless of which platform built the agent, so quality, brand voice, and compliance are measured consistently across a mixed estate. See Why QEval®.
Scoring & QA
AI Agent QA
Quality assurance applied to AI agents and chatbots, not just humans. The same scorecard checks an AI agent for compliance, brand voice, resolution, and risks such as hallucination and drift. See AI Agent QA.
Auto QA
Automated quality assurance. Every interaction is scored against a customer-defined scorecard by AI, then the lowest-scoring and outlier conversations are routed to human calibration, rather than a QA analyst manually reviewing a small random sample. See Auto QA.
Calibration
The process of keeping AI scores aligned with what a supervisor would decide, by reviewing a slice of scored conversations and tuning the model. QEval® targets a calibration gap under two percent.
Full Coverage
Scoring every interaction rather than a 2 to 5 percent manual sample. Full coverage removes sample bias, gives complete compliance defensibility, and targets coaching at the conversations that actually need it.
Sampling
The traditional QA practice of manually reviewing a small random share of interactions, often 2 to 5 percent. The blind spot is everything outside the sample, where most compliance and coaching signal hides.
Scorecard
The structured set of criteria a conversation is graded against, covering categories such as compliance, empathy, resolution, and brand voice. QEval® applies the same scorecard to both human and AI agents.
Speech analytics
Analysis of voice interactions, including transcription, sentiment, talk and silence patterns, and keyword or phrase detection, used as scoring inputs alongside text channels.
Vision model
A model that scores what an agent or AI did on screen, for processes that play out visually rather than only in words, extending QA beyond the transcript.
Coaching
Auto Coach
AI-generated coaching tied to the specific moment in a scored conversation that needs it, so feedback is concrete and evidence-backed rather than generic.
Coaching lifecycle
The loop from score, to diagnosis, to targeted coaching, to reinforcement, to measured behavior change. QEval® turns scores into action instead of leaving them in a report. See Coaching.
Gamification
Using goals, recognition, and friendly competition to reinforce the behaviors coaching is trying to build, applied carefully so it rewards quality rather than gaming the metric.
HI Model
QEval®’s coaching approach that pairs the score with the human side of behavior change, attributing each change up through the Six Layers so coaching can be tied to outcomes, not just activity.
Next best action NBA
The single highest-impact thing to coach or do next for a given agent, ranked by predicted impact rather than by whatever the last call happened to surface.
Real-Time Agent Assist RTAA
Help delivered to the agent during the live conversation: surfacing knowledge, checking compliance, drafting notes, so the agent can focus on the customer while QEval®Eval handles the rest. See RTAA.
Architecture
Classification
The act of labeling part of a conversation, for example tagging a disclosure, an empathy moment, or a risk signal. QEval® runs about 326 million classifications every five minutes.
Intent Recognition
Detecting what a customer is actually trying to do, for example cancel, dispute a charge, or escalate, so the interaction can be scored and routed against the right expectations.
Large language model LLM
A model trained on large text corpora to understand and generate language. QEval® uses purpose-trained expert models rather than a generic LLM wrapper, and customer data does not enter a third-party foundation model’s training loop.
Mixture of Experts MoE
An architecture that routes each scoring task to a purpose-trained expert sub-model rather than relying on one general-purpose model. QEval® uses a proprietary MoE to score at 94%+ accuracy. See the platform.
Named entity recognition NER
Identifying entities such as names, account numbers, and card numbers in a conversation. NER underpins both analytics and the redaction of sensitive data.
Natural language understanding NLU
The branch of AI that interprets meaning, intent, and sentiment in human language, the foundation under intent recognition and sentiment analysis.
Sentiment analysis
Measuring the emotional trajectory of a conversation across turns, used to predict CSAT, flag churn risk, and pinpoint the moment an interaction turned.
Vocabulary Library
The curated set of phrases, intents, and compliance language QEval® recognizes, built and tuned from real contact center conversations. With the classification engine, it is the operational moat behind the scores.