The Problem With Tracking Conversations Like Pageviews

Published: (February 26, 2026 at 12:48 AM EST)
9 min read
Source: Dev.to

Source: Dev.to

Here’s why event‑based analytics was never built for conversational AI products – and what to do instead.

The Problem With Tracking Conversations Like Pageviews

Picture this. You’re a PM at an AI startup, six months post‑launch. You open the dashboard on a Monday morning and everything looks… fine?

  • Session count is up 20 % week‑over‑week.
  • Average session length is 4 min 30 s.
  • DAU is climbing.

You screenshot it and drop it in the investor‑update Slack channel.

Then you look at retention.

MetricValue
Week 4 retention12 %
Week 8 retention4 %

Users are showing up, having conversations, and disappearing. The metrics say engagement is strong. The business says something is very wrong.

Here’s the thing nobody tells you when you ship your first AI product: you’ve been tracking conversations like pageviews, and that’s why your dashboard lies to you every single morning.

Person staring at metrics dashboard looking confused
^ every AI PM on Monday morning when the numbers look good but retention is falling off a cliff

The Pageview Was Built for a World Where Content Sits Still

The pageview metric was invented in the mid‑90s to answer one question: Did someone look at this thing? That’s it.

  • A newspaper prints a story.
  • Did you open it? Click. Pageview logged.
  • The content doesn’t change based on what you do. It just sits there. You either consumed it or you didn’t.

This mental model spread everywhere: clicks, sessions, time‑on‑page, bounce rate, page depth. All of it was built on the same foundational assumption:

The product is a static artifact and the user is moving through it.
Engagement = consumption. More clicks = more engagement. More engagement = more value.

That assumption held for 25 years. It made analytics what it is today.

Then we shipped products where the product itself responds to what the user says.

The entire premise collapsed, and most teams haven’t noticed yet.

  • A conversation is NOT a static artifact.
  • It’s a two‑sided, dynamic exchange with internal structure.
  • The meaning lives in the sequence: what was asked, how it was answered, what happened next, whether the user got what they actually came for.

None of that shows up in a naïve event stream.

What You Lose When You Log conversation_started and conversation_ended

When you instrument your AI product the way you’d instrument a web app, the event log looks like this:

conversation_started   { "user_id": 123, "timestamp": "10:04:12" }
message_sent           { "user_id": 123, "turn": 1 }
message_sent           { "user_id": 123, "turn": 2 }
message_sent           { "user_id": 123, "turn": 3 }
conversation_ended     { "user_id": 123, "duration": "4m32s" }

Looks reasonable. Now consider three very different stories that produce exactly that log.

StoryNarrative
AUser asks a coding question, gets a perfect answer on the first try, asks a follow‑up, gets a clarification, says “thanks” and leaves satisfied. Done in 4 min.
BUser asks a coding question, gets a wrong answer, rephrases, gets another wrong answer, rephrases again, gets frustrated, closes the tab mid‑sentence. Session ends because the browser closed.
CUser gets stuck in a hallucination loop. The AI keeps confidently answering the wrong question. User tries three different phrasings, eventually gives up, goes to Stack Overflow, solves the problem in 90 s, and never returns.

All three produce the same event log—yet the outcomes are completely different:

  • Story A: product‑market fit.
  • Story B: quality problem.
  • Story C: churn‑in‑progress.

Your dashboard shows three “successful” 4‑minute sessions.

Surprised Pikachu
^ me realizing all three of those users look identical in Amplitude

The Metrics That Fall Out of Event Tracking Are Measuring the Wrong Thing

Metrics like MAU, DAU, session length, conversation depth all share the same assumption: more activity = more value. They’re proxies for engagement, and engagement was a useful proxy when the product was a website.

For a conversational AI product, that proxy breaks in a specific way.

ProblemWhy It Happens
Stuck‑in‑a‑loop userHighly engaged (many messages, long session, high turn count) → flagged as a power user, but actually one bad conversation away from canceling.
Efficient power userShort, efficient conversations → low session time, low turn count → flagged as a casual user, but they just got real product value.

We track millions of conversations across products using Agnost, and this pattern is consistent: the users with the highest session time are not the happiest users; sometimes they’re the most frustrated ones.

Quick Comparison

What Most Teams TrackWhat It Actually Tells You
Session count – “How many conversations started?”Task completion rate – “Did the user accomplish what they came for?”

The right column requires you to understand conversation structure: not just that turns happened, but what kind of turns they were, what the user was trying to do, and whether they succeeded.

The Right Mental Model: A Conversation Is a Task Attempt

Reframe everything:

  • A conversation is not a page view.
  • It’s not even a session in the traditional sense.
  • A conversation is a task attempt.

Your user shows up with an intent, tries to accomplish something, and either succeeds or fails.

The question isn’t “Did they have a conversation?”
It’s “Did they complete the task they came for?”

This is how you evaluate a support agent: you don’t track how many tickets they opened; you track how many tickets they resolved.

What to Track Instead

MetricDefinitionWhy It Matters
Task Completion RatePercentage of conversations that end with the user achieving their goal.Direct measure of product value.
Error / Hallucination RateFrequency of AI responses that are factually incorrect or nonsensical.Early warning of quality issues.
Turn‑EfficiencyAverage number of turns needed to complete a task.Indicates how quickly users get value.
User Satisfaction (post‑conversation survey or implicit signal)Rating or sentiment after a conversation ends.Captures true user happiness.
Churn‑Signal ScoreComposite of long sessions, repeated re‑phrasing, and negative sentiment.Predicts users about to leave.

Takeaway

  1. Stop treating conversations like pageviews.
  2. Shift from activity‑based proxies to outcome‑based metrics.
  3. Instrument your product to capture task intent, success/failure, and quality signals.

When you do, your dashboard will finally tell the truth you need to see every Monday morning. 🚀

Conversation‑Native Analytics: Why the Shift Matters

The conversation is the product. Your analytics should reflect that.

The Problem with Traditional Event‑Based Tracking

  • You don’t track how long each call lasted.
  • You only track whether the customer’s problem was solved.
  • The same applies to AI products: the conversation is merely the medium through which the task‑completion attempt happens, yet most dashboards still focus on events (clicks, page views, session length) rather than outcomes.

When you make the shift from “event = success” to “conversation = success,” a lot of things become clearer:

  • Short conversations can be a great outcome (efficient resolution) or a terrible one (user gave up immediately). Context determines which.
  • Long conversations can indicate deep engagement or a frustration spiral. Again, context matters.

You need an analytics layer that understands which is which.

What Conversation‑Native Analytics Actually Looks Like

This isn’t hypothetical. The raw data layer already exists; it just hasn’t been productized for AI teams.

A conversation has a structure:

  1. Intent – what the user wants to achieve.
  2. Sequence of attempts – each turn tries to address that intent.
  3. Outcome – resolution (or failure).

Every turn is evidence about how well that structure is working.

Practical signals to track

  • Resolution signals

    • Positive:
      • Conversation ends with the user getting what they wanted.
      • Positive sentiment at close.
      • Follow‑up that builds on a previous answer.
      • User returns within 48 h for a related question.
    • Negative (failure):
      • User rephrases the same question three times.
      • Drop‑off mid‑thread.
      • Immediate return to the start of the flow.
  • Repetition detection – Same question phrased differently → AI didn’t understand → quality failure. Should appear in metrics, not be hidden by average session length.

  • Drop‑off by position – Where in the conversation do users give up?

    • Example: 40 % of conversations end after turn 3 with no resolution signal → a specific, fixable problem at turn 3.
  • Intent clusters – Group conversations by what users are trying to do, not by which feature they touched. Reveals hidden patterns (e.g., 30 % of support chats ask the same question your docs don’t answer).

None of this is magic. It’s just applying the right model to data you already have. The raw conversation data is there; the question is whether your analytics layer knows how to read it.

Visual metaphor

Hackerman meme – person coding intensely
“Building conversation‑native analytics after realizing your entire dashboard has been lying to you.”

Where This Is All Going

The teams that are winning with AI products share one habit:

  • Stopped optimizing for engagement → started optimizing for resolution.
  • Measure success at the conversation level, not the session level.
  • Track:
    • Task‑completion rate.
    • First‑turn resolution rate.
    • Exact drop‑off points in a typical conversation.

They’ve adopted tools that treat the conversation as the primary unit of analysis, not an aggregate of events.

The rest of the industry is still screenshotting MAU charts and wondering why retention is hard.

The shift is coming. Conversational AI is forcing a new analytics paradigm the same way mobile forced a new web‑analytics paradigm a decade ago. The first teams to master it will gain a meaningful advantage—not only in product quality but also in diagnosing and fixing problems before they surface in churn numbers.

Wrapping It Up

  • If your core product loop is a conversation, your core analytics primitive should be a conversation, not an event.
  • Event‑based tracking gives you a number that tells you almost nothing about whether your product actually works (think counting movie frames).
  • Conversation = unit.
  • Resolution = metric.
  • Task completion = outcome you optimize for.

Everything else is noise.

Agnost was built to solve exactly this problem: analytics designed from the ground up for conversational products, so you can stop guessing and start knowing.
👉 Try Agnost here

TL;DR

Event‑based analytics tracks whether conversations happened.
Conversation‑native analytics tracks whether they worked.
You need the second one.

Reading time: ~7 min

0 views
Back to Blog

Related posts

Read more »