Running Gemma 4 Locally with Ollama and OpenCode

Published: 0 month ago (April 5, 2026 at 08:08 PM EDT)

4 min read

Source: Dev.to

Source: Dev.to

First steps

The usual first step with getting Gemma 4 running on Ollama is to pull the model:

ollama pull gemma4:e4b

See the available models and select the correct version for your system.
The e4b variant is a good starting point if your hardware can support it.

Verify the model is available:

ollama list

Testing

Run the model to ensure it works as expected:

ollama run gemma4:e4b

Ask a simple question or just say “Hello”, then use /bye to exit.

Immediately run ollama ps. You should see something like this:

NAME          ID              SIZE   PROCESSOR   CONTEXT   UNTIL
gemma4:e4b    c6eb396dbd59    10 GB  100% GPU    4096      4 minutes from now

Pay close attention to the CONTEXT value.
If you see 4096, Ollama is using the default 4 K context window. This will bite you when you try to work with the model in OpenCode (e.g., the model repeatedly says “Just let me know what you want to do”). The cause is that system prompts consume most of the available context, truncating your prompt.

Increase the context window

OpenCode can specify a context size, but it doesn’t work reliably with Ollama‑based models. Instead, set a larger context window inside Ollama:

ollama run gemma4:e4b

# Inside the Ollama prompt, run:
 /set parameter num_ctx 32768
 /save gemma4:e4b-32k
 /bye

Then confirm the new model exists:

ollama list

Notes about num_ctx:

It should be a power of 2 (e.g., 32768 for a 32 K window).
Smaller windows (16 K) may be sufficient; larger ones (64 K, 128 K) increase VRAM/memory usage.
Choose a size that fits your hardware.

The /save model name is arbitrary – add a suffix like -32k to indicate the context size.

Using the model in OpenCode

Assuming you’re already familiar with OpenCode, add the new model to opencode.json (either globally at ~/.config/opencode/opencode.json or in your project folder):

{
  "$schema": "https://opencode.ai/config.json",
  "default_agent": "plan",
  "compaction": {
    "auto": true,
    "prune": true,
    "reserved": 8192
  },
  "provider": {
    "ollama": {
      "name": "Ollama",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "gemma4:e4b-32k": {
          "name": "gemma4:e4b-32k",
          "_launch": true,
          "id": "gemma4:e4b-32k",
          "tool_call": true,
          "options": {
            "temperature": 0.1
          },
          "maxTokens": 16384
        },
        "ministral-3:8b-OC32k": {
          "_launch": true,
          "id": "ministral-3:8b-OC32k",
          "name": "Ministral 3 8B (32k)",
          "options": {
            "temperature": 0.1
          },
          "maxTokens": 16384
        }
      }
    }
  }
}

Important fields

Field	Meaning
name	Friendly label shown in the OpenCode TUI (e.g., “Gemma 4 (32k)”).
id	Identifier used on the command line, e.g., `opencode run PROMPT --model gemma4:e4b-32k`.
_launch	Usually set to `true`; required when you run `ollama launch opencode`.
tool_call	Must be `true` for Gemma to handle tool calls; otherwise it stops at the “task” tool.
options	Model‑specific options (e.g., `temperature`).
maxTokens	Maximum output length you expect from a prompt.

Running OpenCode

With opencode.json configured, launch OpenCode:

cd path/to/project
opencode

The OpenCode TUI appears. Open the model‑selection menu (Ctrl‑P or Ctrl‑X m), filter by “ollama”, and you should see gemma4:e4b-32k. Select it.

You are now using the local Gemma model. Interact with it as needed—ask questions, run tasks, or build your projects. Performance depends on your hardware; a system with ~16 GB VRAM typically sees a 1‑2 second response delay after the model is loaded.

Observations

I’ve only just started using Gemma 4 for real‑world tasks. It is capable, but it sometimes needs a bit more explicit guidance than other models. Adjusting the context window and tuning options (e.g., temperature) helps achieve smoother interactions.

The default Zen Big Pickle model is not an apples‑to‑apples comparison either. Still, so far the model is capable. The real test will be when I try to use it with my market analysis project.

Conclusion

I hope this has been helpful. Let me know if my information is incorrect or can be improved.

What’s your experience with running LLMs locally? Let me know in the comments.

Running Gemma 4 Locally with Ollama and OpenCode

First steps

Testing

Increase the context window

Using the model in OpenCode

Important fields

Running OpenCode

Observations

Conclusion

Related posts

Google Announces Gemma 4 Open AI Models, Switches To Apache 2.0 License

Making OpenClaw remember what it's doing after compaction

The trick to AI coding memory isn't a bigger instruction file — it's smaller, layered knoledge

Gemma 4 VRAM Requirements: The hardware guide I wish I had