AI Browser Updates: How Far Have We Come?
Source: Dev.to
📊 Current Status
- ✅ Received valuable feedback and suggestions
- ✅ Developers are starting to follow and explore the project
- ✅ Cross‑platform support (Mac, Windows) running stably
🎉 What We’ve Built Recently
1. History Playback + Continue Conversations
Previous pain point: History was read‑only, couldn’t continue.
Now:
- ✅ Click any historical task to replay the full execution (with typewriter effects)
- ✅ Play/pause/speed control
- ✅ Continue the conversation from where you left off
- ✅ Preview attached files directly
Technical Implementation
We built a PlaybackEngine that breaks message streams into atomic fragments (AtomicFragment)—the smallest replayable units. This allows precise control over playback progress and speed. Task data is persisted via IndexedDB for offline viewing. When resuming, we restore the complete execution context (workflow, steps, attachments, etc.) to ensure seamless continuation.
2. Human Interaction Capability
Scenario: AI encounters situations requiring human decisions.
Solution:
- ✅ AI can ask questions during execution
- ✅ After you respond, AI continues
- ✅ Useful for login confirmations, option selections, etc.
Example:
Task: Help me collect data from a login‑required website
AI: Login required. Are you logged in?
You: Yes, already logged in
AI: Got it, continuing data collection...
Technical Implementation
Based on the eko framework’s HumanInteraction message type, AI can initiate interaction requests during execution. We established a bidirectional communication channel between the main and renderer processes via Electron IPC. When AI needs to ask, the workflow pauses and waits for user response. After answering via IPC, the agent resumes execution. The process includes full state management and error handling.
3. Voice Input Support
Features:
- ✅ Voice input for tasks (no typing needed)
- ✅ Offline speech recognition with Vosk
- ✅ Auto‑switch recognition models based on language
Technical Implementation
Vosk’s local offline engine is used by default—no internet required, protecting user privacy. The appropriate model (Chinese/English) loads automatically. Future support for Microsoft Azure and iFlytek cloud services is planned.
4. Multi‑Language Internationalization
Support:
- ✅ Chinese/English interface switching
- ✅ Complete translation coverage
- ✅ Date/time localization
Technical Implementation
Built on i18next + react-i18next. Translation resources are organized by module (main.json, history.json, agent-config.json, etc.) with namespace isolation. Language switching uses Zustand global state—no page refresh needed. Date/time formatting leverages date-fns locale functionality. Adding new languages only requires new JSON translation files.
5. Agent Configuration System
Features:
- ✅ Customize agent prompts
- ✅ Manage MCP tools (CRUD)
- ✅ Configure different agent capabilities
This makes AI Browser highly flexible and customizable.
6. Toolbox Page
Improvements:
- ✅ Centralized access to all system features
- ✅ Clearer navigation
- ✅ One‑click jump to config, scheduled tasks, history, etc.
🗺️ What’s Next
Phase 1 (Near‑term, 1‑2 weeks)
- Task Working Directory Isolation – each task gets an independent working directory to avoid file interference.
- Windows Background Running Optimization – reduce resource usage and improve stability.
- Generated File Download Support – direct and batch download of AI‑generated files.
- Playback Speed Control – fast‑forward/slow‑motion for history playback.
Phase 2 (Mid‑term, 2‑4 weeks)
- Performance Optimization – virtual scrolling for long conversations, memory improvements, faster startup.
- Multi‑Language Enhancement – auto‑detect system language, dynamic download of offline language packages, configurable online speech recognition (Microsoft, iFlytek).
- Theme Customization – dark mode, multiple color schemes, user‑defined colors.
Phase 3 (Long‑term, 1‑2 months)
- Visual Workflow Editor – adjust workflow steps, save/import specific workflows for scheduled tasks.
- Plugin Marketplace – official MCP tool library (HTTP, stdio, SSE), community plugin sharing, one‑click install/update.
- More Agent Support – ShellAgent (command execution), EmailAgent (email send/receive), NotionAgent (Notion operations).
🤔 What We Need
- ⭐️ Stars – helps the project gain visibility, attract contributors, and motivates continued development.
- 💬 Feedback and Suggestions – share your use cases, problems, and feature ideas via GitHub Issues or comments.
- 🤝 Code Contributions – submit PRs for bug fixes, new features, or documentation improvements.
📌 Quick Links
- GitHub:
- Download:
- Configuration Guide:
- Issue Tracker: