Engineering Context for Local and Cloud AI: Personas, Content Intelligence, and Zero-Prompt UX
Source: Dev.to
Introduction
In the previous article we covered how DocuMentor AI’s hybrid architecture seamlessly adapts between Chrome’s local Gemini Nano and cloud AI. We built a system that automatically routes tasks based on capabilities and performance constraints.
But having a robust execution layer is only half the battle. The other half? Engineering the context that goes into those models.
Most AI tools give you a powerful model and a blank text box, then expect you to figure out what to ask. It’s like handing someone a professional camera and saying “take a good photo” – technically possible, but the burden is entirely on the user.
DocuMentor takes a different approach: zero‑prompt UX through intelligent context engineering. Users never write prompts. They click a feature (Quick Scan, Deep Analysis, Cheat Sheet), and the extension handles the rest—assembling the right persona elements, extracting the right page sections, and shaping everything into a request the AI can’t misinterpret.
This article breaks down how that works: the philosophy behind zero‑prompt design, the persona system that personalizes every response, and the content‑intelligence layer that knows exactly what to send to the AI.
The Zero‑Prompt Philosophy
Most technical‑documentation tools and AI‑powered browsers present the same UX pattern: a blank chat input with a placeholder like “Ask me anything about this page.”

On the surface this seems user‑friendly, but if you don’t know what to ask it creates three problems:
- Cognitive load – Users must think about how to phrase their question.
- Intent ambiguity – Small wording changes lead to wildly different answers.
- Generic responses – Without context about the user, the AI gives one‑size‑fits‑all answers.
I built DocuMentor to solve my own problem: I spend hours each week scanning documentation, blog posts, and API references trying to extract what I need quickly. Sometimes I want a TL;DR to decide if it’s worth reading. Other times I need a cheat sheet for future reference. And sometimes I just want to know “Should I care about this?”
These are specific, recurring needs. Why should I have to articulate them from scratch every time?
Feature‑First Design
Instead of a blank chat box, DocuMentor exposes four purpose‑built features:
| Feature | What It Does |
|---|---|
| Quick Scan | Instant insights: TL;DR, “should I read this?”, related resources, page architecture |
| Deep Analysis | Comprehensive overview, code patterns, video recommendations, learning resources with reasoning |
| Cheat Sheet | Condensed, actionable summary optimized for quick lookup |
| AskMe | Targeted chat: select text or images and ask specific questions |
Each feature represents a pre‑crafted intent. Users don’t have to think about how to ask; they just pick the outcome they want. The extension then crafts the prompt, selects the right page sections, and applies the user’s persona.
This isn’t just about convenience. It’s about eliminating ambiguity. When a user clicks Quick Scan, there’s zero room for misinterpretation. The AI knows exactly what format to return, what level of detail to provide, and what the user cares about.
Persona‑Driven Personalization
After building the initial feature set, I realized something critical: none of these features should return generic answers.
A “Should I read this?” recommendation means nothing without knowing who’s asking. A senior AI engineer doesn’t need an intro to neural networks, whereas a junior frontend developer does. Same feature, same page, completely different answers.
That’s when I introduced the persona system—a user profile that shapes every AI response.
What’s in a Persona
| Component | Description |
|---|---|
| Role | AI/ML Engineer, Frontend Developer, Backend Engineer, etc. |
| Seniority | Beginner, Intermediate, Senior |
| Skills | Programming languages, frameworks, concepts – each with a proficiency level (Beginner, Intermediate, Advanced) |
| Learning Goals | What the user wants to master right now (e.g., “Master LangGraph for production AI agents”) |
| Learning Preferences | Text, video, or mixed |

Figure: The five components of a DocuMentor persona.
The challenge wasn’t just collecting this information—it was knowing which elements matter for which features.
Mapping Persona Elements to Features
| Feature | Relevant Persona Elements |
|---|---|
| Quick Scan | Role, Seniority, Skills, Learning Goals |
| Deep Analysis | Role, Seniority, Skills, Learning Goals, Learning Preferences |
| Cheat Sheet | Role, Seniority, Skills |
| AskMe | All elements (depends on the specific query) |
For example, learning preferences are irrelevant for cheat sheets (the user already decided they want text), while skills and goals are critical for “Should I read this?” recommendations. Sending irrelevant persona data adds noise and wastes tokens—especially on local AI with tight context limits.
Conclusion
Zero‑prompt UX, combined with a finely tuned persona system, lets DocuMentor deliver precise, context‑aware answers without forcing users to craft prompts. By mapping only the necessary persona attributes to each feature, we keep token usage efficient while still providing highly personalized output.
The next sections (not shown here) dive into the content‑intelligence layer, the prompt‑generation pipeline, and performance considerations for hybrid local/cloud execution. Stay tuned!
Persona‑Driven Recommendations
Scenario: You’re a Junior Front‑end Developer learning React and you land on an article about advanced state‑management patterns.
DocuMentor’s “Should I read this?” feature might say:
Yes, read this.
This coversuseReducerand Context API patterns that will level up your React skills. It assumes familiarity withuseState, which you have. The examples are practical and match your learning goal: mastering React for production apps.
Scenario: You’re a Senior Backend Engineer who knows React but isn’t focused on front‑end work.
DocuMentor’s recommendation:
Skip this.
You already understand these patterns from your React experience. This won’t advance your current goal (mastering distributed systems). If you need a refresher later, the cheat‑sheet feature has you covered.
The same page, the same feature, but completely different recommendations because the persona tells the AI who is asking and why they care.
This isn’t personalization for its own sake. It’s about respecting the user’s time. Generic AI tools waste time by forcing you to read irrelevant content or by hiding important insights. Persona‑driven AI acts like a knowledgeable colleague who knows your background and priorities.
Content Intelligence: Strategic Page Decomposition
Early on I made the naive mistake most AI developers make: I fed the entire page HTML to the model, assuming it could “figure it out.”
That failed spectacularly:
| Problem | Why it hurts |
|---|---|
| Context overflow | Raw HTML easily exceeds Chrome AI’s ~4 K token limit |
| Noise drowning signal | Ads, navigation, footers, and JavaScript compete with the actual content |
| Hallucinations | Small models like Gemini Nano get confused by irrelevant information |
First fix → Content extraction
I used Mozilla’s Readability library (with a custom fallback for pages where Readability fails) to extract clean, readable text.
Even after cleaning, a new problem emerged: not every feature needs the same information.
| Feature | Information needed |
|---|---|
| Summaries & cheat sheets | Full article content |
| Video recommendations | Only a summary of the page |
| “Learn Resources” suggestions | Page links & navigation context (no article body) |
Sending everything to every feature wastes tokens, increases latency, and reduces relevance.
Solution: Strategic page decomposition.
DocuMentor’s purpose‑driven sections
- Main content – Core article text (extracted via Readability)
- Table of contents – Page structure & hierarchy
- Page links – URLs embedded in the content
- Code blocks – Extracted separately for pattern analysis
- Breadcrumbs & navigation – Metadata about where the page fits in the documentation

Figure: Page sections are strategically routed to different features based on what information is actually relevant.
Feature‑to‑section mapping
| Feature | Content Sections Used |
|---|---|
| Summary | Main content |
| Cheat Sheet | Main content + code blocks + page links |
| Video Recommendations | Summary only |
| Learn Resources | Summary + page links + breadcrumbs + navigation |
| Code Patterns (Deep Analysis) | Code blocks + surrounding context |
Concrete example: Video recommendations
A naïve approach would send the full 10 K‑word article to the model, then ask it to find relevant YouTube videos. That would:
- Burn most of Chrome AI’s token budget on a single feature
- Slow down the response (the model processes 10 K words before calling the YouTube API)
- Risk quota errors on low‑VRAM devices
DocuMentor’s optimized flow
- Generate a summary of the page (≈200‑300 words).
- Send the summary plus the user persona to the AI.
- AI creates an optimal YouTube search query based on the topic and the user’s learning goals.
- Extension calls the YouTube Data API (outside the AI).
- AI ranks the top 10 results for relevance to the user’s goals and the page summary.
- Return the top 3 videos with personalized descriptions.
Result: ~10× faster and ~1/10th the tokens compared with sending the full content. Because the AI only sees relevant information (summary + persona), the recommendations are more accurate.
This pattern repeats across every feature: content intelligence isn’t about giving the AI more information—it’s about giving it the right information.
Adaptive Prompting Across Providers
One final layer of context engineering: how you shape the request matters as much as what you send.
DocuMentor runs on two AI providers:
| Provider | Characteristics |
|---|---|
| Gemini Nano (local) | • Simple, directive instructions • One reasoning task per prompt • Defensive output parsing (often returns malformed JSON) |
| Gemini 2.0 Flash (cloud) | • Rich, multi‑step instructions • Tool‑calling support • Reliable structured output |
The persona and content sections stay the same, but the prompt framing changes based on the model’s reasoning capacity.
Example: Video recommendations
| Provider | Prompt strategy |
|---|---|
| Gemini Nano (local) | Sequential decomposition – each step is a separate AI call: 1️⃣ Generate search query → 2️⃣ Call API → 3️⃣ Rank results → 4️⃣ Format output |
| Gemini Flash (cloud) | Single tool‑augmented call – the model receives the summary, persona, and a single instruction to generate a query, fetch results via the YouTube tool, rank, and format all in one request |
By adapting prompts to each provider’s strengths, DocuMentor maximizes accuracy, speed, and token efficiency across both local and cloud environments.
How It Works
LL: the model generates the query, calls the YouTube tool, ranks results, and formats the output—all in one request.
Users never see this complexity. They click “Video Recommendations,” and the system automatically routes to the appropriate provider and prompt strategy.
What’s Next
This is just the first version of DocuMentor’s context‑engineering system. Two areas I’m exploring for future iterations:
1. User‑customizable feature prompts
Let users add personalized instructions to individual features. For example:
- “In summaries, always include a brief definition of core concepts.”
- “For video recommendations, prioritize short tutorials under 15 minutes.”
- “When suggesting resources, focus on official documentation over blog posts.”
This would let users fine‑tune the experience without overthinking every request.
2. Dynamic personas
Right now, personas are static. But a full‑stack developer might want to view a page as a frontend engineer one day and a backend engineer the next, depending on context.
Future versions could let users switch personas per page or even infer persona adjustments based on the content type (e.g., automatically apply a security‑focused lens when reading about authentication).
The goal remains the same: personalization without overthinking. AI should adapt to you, not the other way around.
Final Thoughts
Building effective AI features isn’t just about picking the right model or writing clever prompts. It’s about engineering the context that goes into those prompts:
- Zero‑prompt UX – Features replace chat boxes, eliminating user guesswork.
- Persona‑driven personalization – Every response adapts to role, skills, goals, and preferences.
- Content intelligence – Strategic decomposition ensures features get exactly what they need.
The result: an AI tool that feels less like a chatbot and more like a knowledgeable colleague who understands what you’re trying to accomplish.
If you want to see this in action, try DocuMentor AI on a technical article or documentation page. And if you find it useful, the best way to support this work is to leave a review and share it with someone who might benefit.
I’d also love to hear from you: What other aspects of building DocuMentor would you like to hear about? Drop a comment or reach out—your feedback shapes what I write next.