Engineering Context for Local and Cloud AI: Personas, Content Intelligence, and Zero-Prompt UX

Published: (December 20, 2025 at 09:55 AM EST)
9 min read
Source: Dev.to

Source: Dev.to

Introduction

In the previous article we covered how DocuMentor AI’s hybrid architecture seamlessly adapts between Chrome’s local Gemini Nano and cloud AI. We built a system that automatically routes tasks based on capabilities and performance constraints.

But having a robust execution layer is only half the battle. The other half? Engineering the context that goes into those models.

Most AI tools give you a powerful model and a blank text box, then expect you to figure out what to ask. It’s like handing someone a professional camera and saying “take a good photo” – technically possible, but the burden is entirely on the user.

DocuMentor takes a different approach: zero‑prompt UX through intelligent context engineering. Users never write prompts. They click a feature (Quick Scan, Deep Analysis, Cheat Sheet), and the extension handles the rest—assembling the right persona elements, extracting the right page sections, and shaping everything into a request the AI can’t misinterpret.

This article breaks down how that works: the philosophy behind zero‑prompt design, the persona system that personalizes every response, and the content‑intelligence layer that knows exactly what to send to the AI.


The Zero‑Prompt Philosophy

Most technical‑documentation tools and AI‑powered browsers present the same UX pattern: a blank chat input with a placeholder like “Ask me anything about this page.”

Ask Me Anything

On the surface this seems user‑friendly, but if you don’t know what to ask it creates three problems:

  1. Cognitive load – Users must think about how to phrase their question.
  2. Intent ambiguity – Small wording changes lead to wildly different answers.
  3. Generic responses – Without context about the user, the AI gives one‑size‑fits‑all answers.

I built DocuMentor to solve my own problem: I spend hours each week scanning documentation, blog posts, and API references trying to extract what I need quickly. Sometimes I want a TL;DR to decide if it’s worth reading. Other times I need a cheat sheet for future reference. And sometimes I just want to know “Should I care about this?”

These are specific, recurring needs. Why should I have to articulate them from scratch every time?

Feature‑First Design

Instead of a blank chat box, DocuMentor exposes four purpose‑built features:

FeatureWhat It Does
Quick ScanInstant insights: TL;DR, “should I read this?”, related resources, page architecture
Deep AnalysisComprehensive overview, code patterns, video recommendations, learning resources with reasoning
Cheat SheetCondensed, actionable summary optimized for quick lookup
AskMeTargeted chat: select text or images and ask specific questions

Each feature represents a pre‑crafted intent. Users don’t have to think about how to ask; they just pick the outcome they want. The extension then crafts the prompt, selects the right page sections, and applies the user’s persona.

This isn’t just about convenience. It’s about eliminating ambiguity. When a user clicks Quick Scan, there’s zero room for misinterpretation. The AI knows exactly what format to return, what level of detail to provide, and what the user cares about.


Persona‑Driven Personalization

After building the initial feature set, I realized something critical: none of these features should return generic answers.

A “Should I read this?” recommendation means nothing without knowing who’s asking. A senior AI engineer doesn’t need an intro to neural networks, whereas a junior frontend developer does. Same feature, same page, completely different answers.

That’s when I introduced the persona system—a user profile that shapes every AI response.

What’s in a Persona

ComponentDescription
RoleAI/ML Engineer, Frontend Developer, Backend Engineer, etc.
SeniorityBeginner, Intermediate, Senior
SkillsProgramming languages, frameworks, concepts – each with a proficiency level (Beginner, Intermediate, Advanced)
Learning GoalsWhat the user wants to master right now (e.g., “Master LangGraph for production AI agents”)
Learning PreferencesText, video, or mixed

Persona components diagram

Figure: The five components of a DocuMentor persona.

The challenge wasn’t just collecting this information—it was knowing which elements matter for which features.

Mapping Persona Elements to Features

FeatureRelevant Persona Elements
Quick ScanRole, Seniority, Skills, Learning Goals
Deep AnalysisRole, Seniority, Skills, Learning Goals, Learning Preferences
Cheat SheetRole, Seniority, Skills
AskMeAll elements (depends on the specific query)

For example, learning preferences are irrelevant for cheat sheets (the user already decided they want text), while skills and goals are critical for “Should I read this?” recommendations. Sending irrelevant persona data adds noise and wastes tokens—especially on local AI with tight context limits.


Conclusion

Zero‑prompt UX, combined with a finely tuned persona system, lets DocuMentor deliver precise, context‑aware answers without forcing users to craft prompts. By mapping only the necessary persona attributes to each feature, we keep token usage efficient while still providing highly personalized output.

The next sections (not shown here) dive into the content‑intelligence layer, the prompt‑generation pipeline, and performance considerations for hybrid local/cloud execution. Stay tuned!


Persona‑Driven Recommendations

Scenario: You’re a Junior Front‑end Developer learning React and you land on an article about advanced state‑management patterns.

DocuMentor’s “Should I read this?” feature might say:

Yes, read this.
This covers useReducer and Context API patterns that will level up your React skills. It assumes familiarity with useState, which you have. The examples are practical and match your learning goal: mastering React for production apps.


Scenario: You’re a Senior Backend Engineer who knows React but isn’t focused on front‑end work.

DocuMentor’s recommendation:

Skip this.
You already understand these patterns from your React experience. This won’t advance your current goal (mastering distributed systems). If you need a refresher later, the cheat‑sheet feature has you covered.

The same page, the same feature, but completely different recommendations because the persona tells the AI who is asking and why they care.

This isn’t personalization for its own sake. It’s about respecting the user’s time. Generic AI tools waste time by forcing you to read irrelevant content or by hiding important insights. Persona‑driven AI acts like a knowledgeable colleague who knows your background and priorities.


Content Intelligence: Strategic Page Decomposition

Early on I made the naive mistake most AI developers make: I fed the entire page HTML to the model, assuming it could “figure it out.”
That failed spectacularly:

ProblemWhy it hurts
Context overflowRaw HTML easily exceeds Chrome AI’s ~4 K token limit
Noise drowning signalAds, navigation, footers, and JavaScript compete with the actual content
HallucinationsSmall models like Gemini Nano get confused by irrelevant information

First fix → Content extraction

I used Mozilla’s Readability library (with a custom fallback for pages where Readability fails) to extract clean, readable text.

Even after cleaning, a new problem emerged: not every feature needs the same information.

FeatureInformation needed
Summaries & cheat sheetsFull article content
Video recommendationsOnly a summary of the page
“Learn Resources” suggestionsPage links & navigation context (no article body)

Sending everything to every feature wastes tokens, increases latency, and reduces relevance.

Solution: Strategic page decomposition.

DocuMentor’s purpose‑driven sections

  • Main content – Core article text (extracted via Readability)
  • Table of contents – Page structure & hierarchy
  • Page links – URLs embedded in the content
  • Code blocks – Extracted separately for pattern analysis
  • Breadcrumbs & navigation – Metadata about where the page fits in the documentation

Page Decomposition

Figure: Page sections are strategically routed to different features based on what information is actually relevant.

Feature‑to‑section mapping

FeatureContent Sections Used
SummaryMain content
Cheat SheetMain content + code blocks + page links
Video RecommendationsSummary only
Learn ResourcesSummary + page links + breadcrumbs + navigation
Code Patterns (Deep Analysis)Code blocks + surrounding context

Concrete example: Video recommendations

A naïve approach would send the full 10 K‑word article to the model, then ask it to find relevant YouTube videos. That would:

  • Burn most of Chrome AI’s token budget on a single feature
  • Slow down the response (the model processes 10 K words before calling the YouTube API)
  • Risk quota errors on low‑VRAM devices

DocuMentor’s optimized flow

  1. Generate a summary of the page (≈200‑300 words).
  2. Send the summary plus the user persona to the AI.
  3. AI creates an optimal YouTube search query based on the topic and the user’s learning goals.
  4. Extension calls the YouTube Data API (outside the AI).
  5. AI ranks the top 10 results for relevance to the user’s goals and the page summary.
  6. Return the top 3 videos with personalized descriptions.

Result: ~10× faster and ~1/10th the tokens compared with sending the full content. Because the AI only sees relevant information (summary + persona), the recommendations are more accurate.

This pattern repeats across every feature: content intelligence isn’t about giving the AI more information—it’s about giving it the right information.


Adaptive Prompting Across Providers

One final layer of context engineering: how you shape the request matters as much as what you send.

DocuMentor runs on two AI providers:

ProviderCharacteristics
Gemini Nano (local)• Simple, directive instructions
• One reasoning task per prompt
• Defensive output parsing (often returns malformed JSON)
Gemini 2.0 Flash (cloud)• Rich, multi‑step instructions
• Tool‑calling support
• Reliable structured output

The persona and content sections stay the same, but the prompt framing changes based on the model’s reasoning capacity.

Example: Video recommendations

ProviderPrompt strategy
Gemini Nano (local)Sequential decomposition – each step is a separate AI call:
1️⃣ Generate search query → 2️⃣ Call API → 3️⃣ Rank results → 4️⃣ Format output
Gemini Flash (cloud)Single tool‑augmented call – the model receives the summary, persona, and a single instruction to generate a query, fetch results via the YouTube tool, rank, and format all in one request

By adapting prompts to each provider’s strengths, DocuMentor maximizes accuracy, speed, and token efficiency across both local and cloud environments.

How It Works

LL: the model generates the query, calls the YouTube tool, ranks results, and formats the output—all in one request.

Users never see this complexity. They click “Video Recommendations,” and the system automatically routes to the appropriate provider and prompt strategy.


What’s Next

This is just the first version of DocuMentor’s context‑engineering system. Two areas I’m exploring for future iterations:

1. User‑customizable feature prompts

Let users add personalized instructions to individual features. For example:

  • “In summaries, always include a brief definition of core concepts.”
  • “For video recommendations, prioritize short tutorials under 15 minutes.”
  • “When suggesting resources, focus on official documentation over blog posts.”

This would let users fine‑tune the experience without overthinking every request.

2. Dynamic personas

Right now, personas are static. But a full‑stack developer might want to view a page as a frontend engineer one day and a backend engineer the next, depending on context.

Future versions could let users switch personas per page or even infer persona adjustments based on the content type (e.g., automatically apply a security‑focused lens when reading about authentication).

The goal remains the same: personalization without overthinking. AI should adapt to you, not the other way around.


Final Thoughts

Building effective AI features isn’t just about picking the right model or writing clever prompts. It’s about engineering the context that goes into those prompts:

  • Zero‑prompt UX – Features replace chat boxes, eliminating user guesswork.
  • Persona‑driven personalization – Every response adapts to role, skills, goals, and preferences.
  • Content intelligence – Strategic decomposition ensures features get exactly what they need.

The result: an AI tool that feels less like a chatbot and more like a knowledgeable colleague who understands what you’re trying to accomplish.

If you want to see this in action, try DocuMentor AI on a technical article or documentation page. And if you find it useful, the best way to support this work is to leave a review and share it with someone who might benefit.

I’d also love to hear from you: What other aspects of building DocuMentor would you like to hear about? Drop a comment or reach out—your feedback shapes what I write next.


Back to Blog

Related posts

Read more »