Running Gemma 4 Locally with Ollama and OpenCode
Source: Dev.to
First steps
The usual first step with getting Gemma 4 running on Ollama is to pull the model:
ollama pull gemma4:e4bSee the available models and select the correct version for your system.
The e4b variant is a good starting point if your hardware can support it.
Verify the model is available:
ollama listTesting
Run the model to ensure it works as expected:
ollama run gemma4:e4bAsk a simple question or just say “Hello”, then use /bye to exit.
Immediately run ollama ps. You should see something like this:
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gemma4:e4b c6eb396dbd59 10 GB 100% GPU 4096 4 minutes from nowPay close attention to the CONTEXT value.
If you see 4096, Ollama is using the default 4 K context window. This will bite you when you try to work with the model in OpenCode (e.g., the model repeatedly says “Just let me know what you want to do”). The cause is that system prompts consume most of the available context, truncating your prompt.
Increase the context window
OpenCode can specify a context size, but it doesn’t work reliably with Ollama‑based models. Instead, set a larger context window inside Ollama:
ollama run gemma4:e4b
# Inside the Ollama prompt, run:
/set parameter num_ctx 32768
/save gemma4:e4b-32k
/byeThen confirm the new model exists:
ollama listNotes about num_ctx:
- It should be a power of 2 (e.g., 32768 for a 32 K window).
- Smaller windows (16 K) may be sufficient; larger ones (64 K, 128 K) increase VRAM/memory usage.
- Choose a size that fits your hardware.
The /save model name is arbitrary – add a suffix like -32k to indicate the context size.
Using the model in OpenCode
Assuming you’re already familiar with OpenCode, add the new model to opencode.json (either globally at ~/.config/opencode/opencode.json or in your project folder):
{
"$schema": "https://opencode.ai/config.json",
"default_agent": "plan",
"compaction": {
"auto": true,
"prune": true,
"reserved": 8192
},
"provider": {
"ollama": {
"name": "Ollama",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"gemma4:e4b-32k": {
"name": "gemma4:e4b-32k",
"_launch": true,
"id": "gemma4:e4b-32k",
"tool_call": true,
"options": {
"temperature": 0.1
},
"maxTokens": 16384
},
"ministral-3:8b-OC32k": {
"_launch": true,
"id": "ministral-3:8b-OC32k",
"name": "Ministral 3 8B (32k)",
"options": {
"temperature": 0.1
},
"maxTokens": 16384
}
}
}
}
}Important fields
| Field | Meaning |
|---|---|
| name | Friendly label shown in the OpenCode TUI (e.g., “Gemma 4 (32k)”). |
| id | Identifier used on the command line, e.g., opencode run PROMPT --model gemma4:e4b-32k. |
| _launch | Usually set to true; required when you run ollama launch opencode. |
| tool_call | Must be true for Gemma to handle tool calls; otherwise it stops at the “task” tool. |
| options | Model‑specific options (e.g., temperature). |
| maxTokens | Maximum output length you expect from a prompt. |
Running OpenCode
With opencode.json configured, launch OpenCode:
cd path/to/project
opencodeThe OpenCode TUI appears. Open the model‑selection menu (Ctrl‑P or Ctrl‑X m), filter by “ollama”, and you should see gemma4:e4b-32k. Select it.
You are now using the local Gemma model. Interact with it as needed—ask questions, run tasks, or build your projects. Performance depends on your hardware; a system with ~16 GB VRAM typically sees a 1‑2 second response delay after the model is loaded.
Observations
I’ve only just started using Gemma 4 for real‑world tasks. It is capable, but it sometimes needs a bit more explicit guidance than other models. Adjusting the context window and tuning options (e.g., temperature) helps achieve smoother interactions.
The default Zen Big Pickle model is not an apples‑to‑apples comparison either. Still, so far the model is capable. The real test will be when I try to use it with my market analysis project.
Conclusion
I hope this has been helpful. Let me know if my information is incorrect or can be improved.
What’s your experience with running LLMs locally? Let me know in the comments.