Part 2 — GenAI Is Not Magic: Understanding LLMs Like a Systems Engineer

Published: (December 28, 2025 at 09:39 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

From Software Engineer to GenAI Engineer: A Practical Series for 2026

Large language models are often introduced as something fundamentally new—a breakthrough, a leap, a category shift. From a systems perspective, they’re more familiar: probabilistic components with clear constraints, predictable failure modes, and operational costs. Seeing them this way makes much of the confusion around GenAI disappear.

Determinism vs. Non‑Determinism

Traditional software systems are deterministic: given the same input, you expect the same output. When that doesn’t happen, something is wrong.

LLMs break this assumption by design. Even with the same prompt, model, and data, outputs can vary. This is not a bug; it’s a property of how these models generate text. For engineers, correctness can no longer be defined as strict equality. It must be defined in terms of acceptability, bounds, and constraints.

Tokens, Context, and Cost

LLMs don’t operate on raw text; they operate on tokens. From a systems point of view, tokens behave more like memory than strings:

  • Context is finite
  • Cost scales with token count
  • Latency grows as context grows
  • Truncation happens silently

When context becomes a constrained resource, prompt design shifts from wording to resource management.

Hallucinations: Expected Behavior, Not Bugs

Hallucinations aren’t random. An LLM generates the most likely continuation of a sequence based on its training. When it lacks information, it fills the gap with something statistically plausible. This is expected behavior for a component optimized for fluency, not truth.

  • Asking the model to “be accurate” doesn’t work.
  • Confidence is not a signal of correctness.
  • Grounding and validation must live outside the model.

Hallucinations aren’t fixed by better prompts; they’re constrained by system design.

Temperature as a Systems Lever

Temperature is often described as a “creativity dial,” but that framing is misleading.

  • Lower temperatures reduce variance.
  • Higher temperatures increase variance.

In production systems, temperature is a reliability control: higher variance increases risk, lower variance increases repeatability. Treating temperature as an aesthetic choice instead of a systems lever is a common early mistake.

Context Window Size: An Architectural Constraint

The context window isn’t just a model feature; it’s an architectural constraint that determines:

  • How much information the model can reason over at once.
  • Whether retrieval is required.
  • How often summarization happens.
  • How state is carried forward.

When the context window is exceeded, the system degrades quietly rather than failing loudly. Good architectures are designed around this limit.

Limits of Prompt Engineering

Prompt engineering works well early on because it’s cheap and flexible. It stops working when:

  • Prompts grow uncontrollably.
  • Behavior becomes brittle.
  • Changes introduce side effects.
  • Multiple use cases collide.

At that point, prompts are no longer instructions; they’re configuration. Like any configuration, they need versioning, validation, and isolation.

A Practical View of an LLM

An LLM can be thought of as a non‑deterministic function that:

  • Accepts a bounded context.
  • Produces a probabilistic output.
  • Optimizes for likelihood, not correctness.
  • Incurs cost and latency proportional to input size.

Framed this way, LLMs stop feeling mysterious and become components with trade‑offs that can be reasoned about.

Treating LLMs as System Components

When LLMs are treated as system components:

  • Raw output is no longer trusted.
  • Validation layers become necessary.
  • Retries and fallbacks are expected.
  • Critical logic moves outside the model.

This is where GenAI engineering starts to resemble backend engineering again.

The next post will explore why prompt engineering alone doesn’t scale and why it’s more useful to treat prompts as configuration rather than a skillset.

Back to Blog

Related posts

Read more »

GLM-4.7-Flash

Article URL: https://huggingface.co/zai-org/GLM-4.7-Flash Comments URL: https://news.ycombinator.com/item?id=46679872 Points: 69 Comments: 11...