Codex /goal and OpenGUI: long-running tasks need state

Published: (May 4, 2026 at 09:55 PM EDT)
6 min read
Source: Dev.to

Source: Dev.to

Long‑running agents tend to fail in the second half

The first step is often fine – fix a CI failure, open an app, tap a button, search for a keyword. Models can produce a reasonable first action. The trouble starts around step 10: what has already happened, where the task is stuck, what the original boundary was, and when the task is allowed to stop. Those details slide out of context.

Codex CLI 0.128.0 added /goal. The release note describes a persisted‑goal workflow: app‑server APIs, model tools, runtime continuation, and TUI controls for create, pause, resume, and clear. Simon Willison compared it to OpenAI’s version of a Ralph loop: set a goal for Codex, then let it keep executing, checking, and correcting until the goal is done or the budget runs out.

In the context of long‑running tasks, the change is about where the goal lives. It moves from text in a single prompt to state that can be resumed, paused, cleared, and referenced again later.

Why coding agents need a goal

Take a CI failure. The immediate failure may be one broken test. The agent changes the test, then the implementation, then adjusts a type because the code now feels awkward. Each step can be justified, but the final diff is much larger than the original problem.

Code generation is rarely the hard part here. The run has no stable constraint attached to it. The original goal may have been as small as:

/goal 修复当前 failing tests,保持 diff 尽量小,最后跑完 npm test

or:

/goal 处理这个 PR 的 review comments,不改无关文件,最后给出改动摘要

That kind of goal carries the target, the boundary, and the acceptance condition. It tells the agent where to go, what not to touch, and when to stop.

Without that state, the agent is easily pulled around by the current error. A type looks awkward, so it changes the type. A test is hard to write, so it changes the test. The structure feels messy, so it refactors. Each local move can make sense, while the whole task drifts.

On phones, the hard part is screen state

OpenGUI works on a different kind of long‑running task: letting AI operate a real Android phone.

Repository:

In a codebase, state can still land in files, tests, and diffs. On a phone, state is a live screen.

For example, ask the phone to:

  1. Open X.
  2. Search for discussions about mobile AI agents.
  3. Collect the main points.
  4. Summarize what people care about.

As a sentence this looks simple, but on the phone it becomes a series of state checks:

  • Is the app open?
  • Is this the home page?
  • Is the search box focused?
  • Did the results finish loading?
  • Did a login prompt, permission prompt, or follow‑recommendation appear?

The loop of screenshot → tap → screenshot can only carry short tasks. If the screen does not change, the system must decide whether the tap missed, the network is slow, the page is loading, or the action has no visible feedback. If the page jumps somewhere else, it must decide whether to go back, retry, or continue from the new page.

So a goal on mobile has to answer a few concrete questions:

  • Which step is the task on?
  • Does the current screen support the next step?
  • Where to recover after a failure?
  • When can the run end?

OpenGUI turns the goal into a state flow

I ran OpenGUI and read through the source. It connects the backend graph, device connection, and Android‑side action execution instead of leaving phone automation as a script.

  • Backend entry point: server/apps/backend/src/modules/graph-agent/graph/mobile-agent.graph.ts
  • Plan Supervisor: splits a complex plan into executable subtasks.
  • Executor subgraph: executor.graph.ts runs actions on the device.
  • Result handling: the supervisor receives the execution result and decides whether to continue, retry, re‑plan, or hand off to Summarizer.

On Android, actions are applied to the real device:

  • client/core_accessibility/.../GestureService.kt executes GUI actions such as taps and typing.
  • The device keeps a WebSocket connection to the backend; client/core_network/.../StandbySocketManager.kt handles the standby connection.
  • Feishu/Lark, Telegram, and REST API can act as remote task entry points, turning the phone from a local demo device into a worker that can receive work.

OpenGUI spreads the goal across several pieces of state:

Piece of stateDescription
Plan documentThe high‑level plan written by the model
Current subtaskThe specific action being executed
Device screenshotVisual feedback after each action
Execution resultSuccess/failure, error classification
Failure classificationReason for a failure (e.g., missed tap, network timeout)
Final summaryHuman‑readable report of what was accomplished

After each device action, the backend gets fresh device state and decides the next move. A simple script assumes the page will follow the expected order; OpenGUI assumes the page may change, so the executor must keep reporting real state back to the backend.

The cost

Putting the goal into a graph makes the system heavier.

  • Maintain task state.
  • Keep WebSocket connections alive.
  • Handle device standby.
  • Send execution results and screenshots back.
  • Design state transitions for continue, retry, cancel, and summarize.
  • Model calls and screenshot analysis also cost money.

The longer the task runs, the more that cost becomes an engineering concern rather than a small detail.

But on mobile it is hard to avoid this cost. Real apps show pop‑ups, hang on loading screens, misread taps, and send users to completely different pages. A prompt loop alone quickly turns into a screenshot‑based while true.

OpenGUI puts that complexity into the system. A bad tap becomes an execution result for the supervisor to consume. The device keeps reporting state. It behaves more like a worker than a screen being clicked. The design is heavier, but it gives long‑running tasks a place to be debugged, recovered, and reviewed.

Potential first use cases

  • Community research (e.g., gathering opinions from mobile forums)
  • Mobile flow testing (automated UI regression)
  • Ops tasks on phones (e.g., bulk configuration changes)
  • App‑level demonstrations or tutorials

Overview

Only workflows that web automation cannot reach require a dedicated execution system. These tasks may not need the most powerful model, but they must be able to:

  • Keep following the goal continuously
  • Detect failures promptly
  • Send state updates back to the controller

In coding agents, Codex uses the /goal endpoint to store the goal as recoverable state. On mobile devices, OpenGUI ties together task progress, device feedback, and failure handling into a cohesive state flow. A long‑running agent therefore needs to:

  1. Track the overall run, not just the next step
  2. Maintain persistent state across interruptions
  3. React to errors and adjust its plan accordingly

References

  • OpenAI Codex 0.128.0 release
  • Simon Willison – “Codex Goals” (30 Apr 2026)
  • OpenGUI (Core‑Mate)
0 views
Back to Blog

Related posts

Read more »