AI Browser Updates: How Far Have We Come?

Published: 1 week ago (December 6, 2025 at 11:29 PM EST)

3 min read

Source: Dev.to

Source: Dev.to

📊 Current Status

✅ Received valuable feedback and suggestions
✅ Developers are starting to follow and explore the project
✅ Cross‑platform support (Mac, Windows) running stably

🎉 What We’ve Built Recently

1. History Playback + Continue Conversations

Previous pain point: History was read‑only, couldn’t continue.

Now:

✅ Click any historical task to replay the full execution (with typewriter effects)
✅ Play/pause/speed control
✅ Continue the conversation from where you left off
✅ Preview attached files directly

Technical Implementation

We built a PlaybackEngine that breaks message streams into atomic fragments (AtomicFragment)—the smallest replayable units. This allows precise control over playback progress and speed. Task data is persisted via IndexedDB for offline viewing. When resuming, we restore the complete execution context (workflow, steps, attachments, etc.) to ensure seamless continuation.

2. Human Interaction Capability

Scenario: AI encounters situations requiring human decisions.

Solution:

✅ AI can ask questions during execution
✅ After you respond, AI continues
✅ Useful for login confirmations, option selections, etc.

Example:

Task: Help me collect data from a login‑required website

AI: Login required. Are you logged in?
You: Yes, already logged in
AI: Got it, continuing data collection...

Technical Implementation

Based on the eko framework’s HumanInteraction message type, AI can initiate interaction requests during execution. We established a bidirectional communication channel between the main and renderer processes via Electron IPC. When AI needs to ask, the workflow pauses and waits for user response. After answering via IPC, the agent resumes execution. The process includes full state management and error handling.

3. Voice Input Support

Features:

✅ Voice input for tasks (no typing needed)
✅ Offline speech recognition with Vosk
✅ Auto‑switch recognition models based on language

Technical Implementation

Vosk’s local offline engine is used by default—no internet required, protecting user privacy. The appropriate model (Chinese/English) loads automatically. Future support for Microsoft Azure and iFlytek cloud services is planned.

4. Multi‑Language Internationalization

Support:

✅ Chinese/English interface switching
✅ Complete translation coverage
✅ Date/time localization

Technical Implementation

Built on i18next + react-i18next. Translation resources are organized by module (main.json, history.json, agent-config.json, etc.) with namespace isolation. Language switching uses Zustand global state—no page refresh needed. Date/time formatting leverages date-fns locale functionality. Adding new languages only requires new JSON translation files.

5. Agent Configuration System

Features:

✅ Customize agent prompts
✅ Manage MCP tools (CRUD)
✅ Configure different agent capabilities

This makes AI Browser highly flexible and customizable.

6. Toolbox Page

Improvements:

✅ Centralized access to all system features
✅ Clearer navigation
✅ One‑click jump to config, scheduled tasks, history, etc.

🗺️ What’s Next

Phase 1 (Near‑term, 1‑2 weeks)

Task Working Directory Isolation – each task gets an independent working directory to avoid file interference.
Windows Background Running Optimization – reduce resource usage and improve stability.
Generated File Download Support – direct and batch download of AI‑generated files.
Playback Speed Control – fast‑forward/slow‑motion for history playback.

Phase 2 (Mid‑term, 2‑4 weeks)

Performance Optimization – virtual scrolling for long conversations, memory improvements, faster startup.
Multi‑Language Enhancement – auto‑detect system language, dynamic download of offline language packages, configurable online speech recognition (Microsoft, iFlytek).
Theme Customization – dark mode, multiple color schemes, user‑defined colors.

Phase 3 (Long‑term, 1‑2 months)

Visual Workflow Editor – adjust workflow steps, save/import specific workflows for scheduled tasks.
Plugin Marketplace – official MCP tool library (HTTP, stdio, SSE), community plugin sharing, one‑click install/update.
More Agent Support – ShellAgent (command execution), EmailAgent (email send/receive), NotionAgent (Notion operations).

🤔 What We Need

⭐️ Stars – helps the project gain visibility, attract contributors, and motivates continued development.
💬 Feedback and Suggestions – share your use cases, problems, and feature ideas via GitHub Issues or comments.
🤝 Code Contributions – submit PRs for bug fixes, new features, or documentation improvements.

📌 Quick Links

GitHub:
Download:
Configuration Guide:
Issue Tracker:

AI Browser Updates: How Far Have We Come?

📊 Current Status

🎉 What We’ve Built Recently

1. History Playback + Continue Conversations

Technical Implementation

2. Human Interaction Capability

Technical Implementation

3. Voice Input Support

Technical Implementation

4. Multi‑Language Internationalization

Technical Implementation

5. Agent Configuration System

6. Toolbox Page

🗺️ What’s Next

Phase 1 (Near‑term, 1‑2 weeks)

Phase 2 (Mid‑term, 2‑4 weeks)

Phase 3 (Long‑term, 1‑2 months)

🤔 What We Need

📌 Quick Links

Related posts

g

How to Debug “Target Endpoint Not Reachable” | Networking & Server Troubleshooting

Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent

FoodFacts API - AI-Powered Nutrition & Recipe REST API

📊 Current Status

🎉 What We’ve Built Recently

1. History Playback + Continue Conversations

Technical Implementation

2. Human Interaction Capability

Technical Implementation

3. Voice Input Support

Technical Implementation

4. Multi‑Language Internationalization

Technical Implementation

5. Agent Configuration System

6. Toolbox Page

🗺️ What’s Next

Phase 1 (Near‑term, 1‑2 weeks)

Phase 2 (Mid‑term, 2‑4 weeks)

Phase 3 (Long‑term, 1‑2 months)

🤔 What We Need

📌 Quick Links

Related posts

g

How to Debug “Target Endpoint Not Reachable” | Networking & Server Troubleshooting

Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent

FoodFacts API - AI-Powered Nutrition & Recipe REST API

Phase 1 (Near‑term, 1‑2 weeks)

Phase 2 (Mid‑term, 2‑4 weeks)

Phase 3 (Long‑term, 1‑2 months)