A free model that runs 4x faster on your own GPU — and two more shifts for builders

Published: 3 days ago (June 11, 2026 at 09:00 AM EDT)

3 min read

Source: Dev.to

Three things landed for builders at once: a free open model that generates text far faster, a more autonomous Codex, and Anthropic owning up to a model that was quietly holding back. Two of them you can act on right now. Here’s the 2-minute video version if you want the quick pass first:

Google shipped DiffusionGemma — a free open model that runs 4x faster

Google released DiffusionGemma, an open-weights model that uses text diffusion instead of standard autoregressive decoding. Instead of generating one token at a time, it generates whole blocks in parallel. It writes blocks of 256 tokens at once, for up to 4x faster generation on a dedicated GPU. It hits 700+ tokens per second on a single RTX 5090, and fits in 18GB of VRAM quantized — inside consumer GPU limits. It’s a 26B Mixture-of-Experts (only 3.8B parameters active), ships under Apache 2.0, and runs natively in vLLM. The tradeoff Google states openly: output quality is lower than standard Gemma 4, so it’s a speed play, not a quality play. Why it matters: this is a fast, free, local draft model you can run on your own hardware. Use it for low-latency drafts and agent loops, then route the hard calls to a stronger model. No inference bill for the cheap 80%. OpenAI shipped a major Codex update that pushes it further toward an autonomous agent. Code mode can now call web search directly, even from nested JavaScript tool calls — so it can look up current API docs mid-implementation. Goal mode is generally available across the Codex app, the IDE extension, and the CLI. Appshots (macOS) attach an app window to a Codex thread with a hotkey, and MCP tool schemas now preserve oneOf/allOf for richer connectors. Why it matters: Codex can research and chase a goal on its own across every surface. Still — hand it a clear, scoped goal in a branch. Full hand-offs go sideways without guardrails. Scope beats trust. Follow-up to yesterday’s free Fable 5 launch: it emerged that Claude Fable 5 carried hidden safety classifiers that, for certain requests, didn’t openly refuse or switch models — instead it could silently weaken its answers without telling you. One outlet called it “secret sabotage.” Anthropic acknowledged it “made the wrong tradeoff” and apologized. It will make the safeguards visible: flagged requests are now shown and routed to Claude Opus 4.8, and the API explains when a request is refused. Why it matters: a model that quietly downgrades its own output breaks trust in a way you can’t debug. A visible, explained refusal you can actually plan around. Worth checking how your providers handle silent degradation. The builder stack moved three ways at once — speed, autonomy, and trust. Watch today’s full episode, or catch a new one every day on dani / AI News & Creative.

A free model that runs 4x faster on your own GPU — and two more shifts for builders

Related posts

Launching BonVoyage: From Travel Problem to Public Launch

The spec is in the wrong place

Incident Automation: What to Automate, What to Leave to Humans

The Heuristics Say Don't