Zendesk AI agent performance report: measure automation ROI

AI agents in Zendesk handle an increasing share of customer conversations. But “AI is responding to tickets” is not the same as “AI is resolving tickets.” If your reporting only tracks that the bot replied, you are measuring activity, not value.

A proper AI agent performance report answers the questions your support leadership actually asks: How many conversations did AI fully resolve? How many escalated to a human — and did the handoff lose context? Is the customer experience better, the same, or worse when AI handles the ticket? And ultimately, is the automation investment reducing cost and freeing human agents for complex work?

This guide walks through the metrics that matter, how to build the report in Zendesk Explore (and its limitations), and how to connect AI performance data to your broader support metrics dashboard.

The metrics that matter for AI agents

Not every metric Zendesk surfaces for AI agents is equally useful. Here is what to track and why, ordered by operational importance.

1. Automated resolution rate

The percentage of conversations that the AI agent resolved without any human intervention. This is the single most important metric for measuring AI value.

How to calculate: Automated resolutions ÷ Total AI agent conversations × 100.

A resolution counts when the customer’s issue is addressed and the conversation ends without transfer to a human agent. Zendesk typically applies a 72-hour window before marking a conversation as resolved.

What good looks like: This varies enormously by use case. A bot handling password resets may achieve 80%+ resolution. A bot handling billing disputes may sit at 15%. Compare within categories, not across your entire operation.

2. Bot containment rate

Similar to automated resolution rate but measures whether the conversation stayed with the bot, regardless of whether the issue was truly resolved. See bot containment rate for the definition.

The gap between containment rate and resolution rate reveals a problem: conversations that stayed with the bot but were not actually resolved. These are customers who gave up, asked the same question a different way, or left dissatisfied without escalating. A high containment rate with low resolution rate is a red flag, not a success metric.

3. Escalation rate and escalation quality

When AI escalates to a human agent, two things matter:

Escalation rate: What percentage of AI conversations transfer to a human? This is the inverse of containment.
Escalation quality: When the handoff happens, does the human agent have enough context to continue without re-asking the customer’s question?

Escalation quality is harder to measure in Explore. Proxy indicators include: first reply time on escalated tickets (longer FRT after escalation suggests context loss), agent handle time on escalated vs. non-escalated tickets, and customer satisfaction scores on escalated conversations.

4. Bot satisfaction (BSAT)

If you have enabled satisfaction surveys for AI-handled conversations, track the BSAT score alongside your standard CSAT for human-handled tickets.

What to watch for:

BSAT significantly below CSAT: AI experience is worse than human experience. Identify which topics or intents drive the gap.
BSAT close to CSAT: The automation is delivering comparable quality. This is the target.
BSAT higher than CSAT on simple topics: Expected for straightforward requests where speed matters more than empathy.

5. Deflection rate vs. resolution rate

Ticket deflection and resolution are not the same thing. Deflection means a ticket was never created. Resolution means the AI handled a created ticket to completion. Your AI agent report should track both:

Deflection: Conversations that ended at the bot without a ticket being created. This reduces queue volume.
Resolution: Tickets that were created, handled by AI, and resolved without human input. This reduces agent workload.

Together, they tell you the total human-effort savings from AI. Separately, they tell you where AI adds value at different points in the customer journey.

6. Intent coverage

Zendesk’s intelligent triage assigns intents to incoming requests. Track which intents your AI agent can handle and how well it handles each one.

Build an intent × resolution matrix:

Intent	AI conversations	Resolution rate	BSAT
Password reset	340	88%	4.2
Order status	280	72%	3.8
Billing question	150	31%	2.9
Bug report	90	12%	2.4

This matrix tells you exactly where AI is working and where it is not. Invest in improving AI for high-volume intents with low resolution rates. Accept that some intents (complex billing, bug reports) may always need human agents.

How to build the report in Zendesk Explore

Using the prebuilt AI dashboard

Zendesk provides a prebuilt AI agent analytics dashboard (available with the Advanced AI add-on). This dashboard includes:

Total conversations handled by AI agents
Automated resolution count and rate
Escalation count and rate
BSAT scores
Filtering by AI agent, channel, language, and intent

The prebuilt dashboard updates hourly and provides a reasonable starting point. Its main limitation is that it does not easily combine AI metrics with your broader support metrics (backlog, overall FRT, team CSAT) in a single view.

Building a custom Explore report

For deeper analysis, build a custom report:

Dataset: Use the Support: Tickets dataset filtered to tickets where the AI agent was involved. Zendesk tags AI-handled tickets, and the specific tag depends on your configuration.
Segmentation: Create a calculated attribute that categorizes tickets as “AI resolved,” “AI escalated,” or “Human only.” This becomes your primary segmentation for cross-metric analysis.
Metrics to include:
COUNT(Tickets) segmented by AI involvement
Median resolution time for AI-resolved vs. human-resolved
CSAT/BSAT segmented by AI involvement
Escalation count by intent category
Time dimension: Use weekly or monthly granularity. Daily is too noisy for AI performance trends; you need enough volume per period to see real patterns.

Limitations of Explore for AI reporting

Explore has gaps when it comes to AI agent analytics:

72-hour resolution delay: Conversations are not marked as “resolved” until 72 hours after the last interaction. Real-time monitoring requires the Zendesk admin dashboard, not Explore.
Limited escalation quality data: Explore tracks whether escalation happened but not how well the handoff preserved context.
No A/B comparison built in: If you want to compare AI performance across different bot configurations or knowledge base versions, you will need to build custom attributes or use an external tool.

Connecting AI metrics to your broader dashboard

AI agent performance should not live in a separate silo. Connect it to your existing ops metrics:

Question	Combine	With
Is AI reducing queue load?	AI deflection + resolution count	Total ticket volume trend
Is AI affecting response speed?	FRT on AI-escalated tickets	Overall first response time
Is AI affecting quality?	BSAT and CSAT on AI conversations	Team-wide CSAT
Is AI reducing cost?	AI resolutions × avg cost per human resolution	Cost per ticket trend
Are the right tickets reaching humans?	Intent distribution of escalated tickets	Escalation rate by category

The most important trend to watch: as AI resolution rate climbs, human agents should see their ticket mix shift toward more complex, higher-touch work. If agent handle time increases while volume decreases, that is the expected outcome — AI is handling the simple tickets, leaving harder ones for humans.

Common mistakes

Treating containment as resolution. A bot that says “I could not help, goodbye” has a high containment rate and a zero resolution rate. Always pair containment with resolution and BSAT.
Comparing AI CSAT to overall CSAT without controlling for ticket complexity. AI handles the simplest tickets. Comparing its CSAT to the team average (which includes complex, emotional issues) is not a fair comparison. Compare AI CSAT to human CSAT on the same intent categories.
Ignoring the “gave up” cohort. Some customers interact with the bot, get no resolution, and leave without escalating or rating the experience. These invisible failures do not show up in BSAT or escalation metrics. Track conversation abandonment rate — conversations where the customer stopped responding without resolution.
Optimizing for resolution rate at the expense of escalation quality. Making the bot less likely to escalate increases resolution rate on paper, but it also increases the chance of bad resolutions. Set escalation thresholds based on confidence, not on a target resolution percentage.
Not reviewing AI performance by intent. An overall 50% resolution rate might hide a 90% rate on password resets and a 10% rate on billing. Aggregate numbers mask the specific improvements you can make.

What to review and when

Cadence	What to check	Action
Weekly	Resolution rate trend, escalation rate	Spot regressions early
Monthly	BSAT by intent, intent coverage gaps	Prioritize knowledge base improvements
Quarterly	Total cost savings (AI resolutions × cost per human ticket), conversation quality audit	Report ROI, adjust AI strategy

Bring AI performance data into your weekly support ops review alongside backlog, first response time, and CSAT. AI is not a separate program — it is part of how your team delivers support.

FAQ

What Zendesk plan do I need for AI agent reporting? The prebuilt AI analytics dashboard requires the Advanced AI add-on. Basic AI agent functionality is available on Suite Professional and above. Custom Explore reports on AI-handled tickets are possible on any Explore-enabled plan, but the depth of data depends on your AI add-on tier.

How long before I can measure AI agent ROI? Give the bot at least 4–6 weeks after deployment to gather enough data for meaningful analysis. Resolution rate will fluctuate as the bot learns and as you refine its knowledge base. Early numbers are directional, not definitive.

Should I compare AI performance across channels? Yes. AI agents often perform differently on chat (synchronous, short messages) vs. email (asynchronous, longer context). Build your report with a channel filter so you can see where AI adds the most value.

What resolution rate should I target? There is no universal target. Start by measuring your current rate, then set incremental goals. A jump from 25% to 35% automated resolution in a quarter is meaningful. Context matters more than benchmarks — a 30% rate on complex B2B tickets is excellent; a 30% rate on password resets suggests the bot needs work.

Can I track AI agent performance outside Zendesk Explore? Yes. Any tool that reads Zendesk ticket data and AI interaction metadata can build these reports. TicketBoard, for example, surfaces AI resolution and escalation data alongside your standard ops metrics without requiring the Explore interface.

Track AI and human performance in one dashboard — start free