Unsigned Sizes: A Five Year Mistake

Published: (May 2, 2026 at 02:40 PM EDT)
10 min read

Source: Hacker News

“A quick note for readers who don’t follow C3: it’s a systems language in the C tradition. Specifics below are C3’s, but the trade‑offs apply to any language that has to pick a type for sizes and lengths.”

Why C3 is moving to signed integers by default

C3 is moving to signed integers by default, but why are we doing that? Isn’t unsigned more correct for sizes at least? Let’s try to answer that.

The bugs of unsigned

Since the early days, C3 has been using unsigned sizes. While the name of the unsigned type has changed over time – from usize to usz (after the unification with the uptrdiff type) – its position as the default has been unchallenged.

However, unsigned has known pitfalls, the most well‑known being:

for (uint x = 10; x >= 0; x--)   // Infinite loop!
{

}

That bug is so easy to run into that C3 explicitly rejects x >= 0 for unsigned types outside of macros.

Another classic C bug is:

uint a = 0;
int  b = -1;

if (a > b) { … }

In C, both operands are promoted to unsigned, turning b into a huge unsigned value and causing the comparison to fail. For this reason, C3 implements safe unsigned/signed comparisons that don’t convert both sides and are safe regardless of the operand types.

C, of course, allows implicit conversions between unsigned and signed. While this is a source of bugs, I felt that with some safety measures it could mostly be kept.

It’s easy to think that the bugs above are unrelated quirks: the loop that never terminates, the broken comparison, the conversions that need to be fixed just‑so… they all stem from one earlier decision – that unsigned should be the default for sizes. Most of this post is really about that decision.

A pertinent question

You might reasonably ask, “Why not just require that signed/unsigned conversion be explicit?”

The reason, it turns out, lies with unsigned sizes.

If sizes are unsigned – as they are in C, C++, Rust, Zig, and C3 – then anything involving indexing into data will need to be either all unsigned or require casts. With C’s loose semantics the problem is largely swept under the rug, but for Rust it meant you’d regularly need to cast back and forth when dealing with sizes.

There are two approaches to casts:

  1. Liberally sprinkle them throughout the codebase with the idea that “it’s an explicit conversion, so it’s obvious what happens”.
  2. Minimize casts, using them only to signal that something out of the ordinary is happening: “here be dragons”.

The former is easier to define, but it essentially silences warnings. For example, suppose the code originally cast a u16 to u32. Later the variable type changes to u64; the cast now silently truncates the value. Casts become a way to “silence all warnings”.

The main idea of “it’s an explicit conversion” is also undermined when casts are inserted mechanically wherever the compiler demands them, rather than after a careful examination of each case.

On the other hand, minimizing casts is more challenging: we need rules that correctly allow safe implicit casts while requiring explicit casts for unsafe ones.

C3 takes the second approach – casts should mean something. But why did it allow unsigned ↔ signed conversions at all? Isn’t that unsafe?

It turns out that as long as you only use addition, subtraction, and multiplication, the conversion is mostly safe if signed integers are two’s‑complement. Since conversions would need to happen often (remember: unsigned sizes!), the trade‑off to make them implicit was natural.

The best‑laid plans

C3 has largely kept the current conversion semantics since 2021, and they worked reasonably well for five years without triggering any serious undesirable behavior – until an innocent question about (foo + a) % 2 turned those assumptions on their head.

To remove foot‑guns, C3 changed the rule so that int + uint promotes to int instead of unsigned. This made many cases silently signed, which tended to be the correct thing in most situations.

But consider (foo + a) % 2 where foo happens to be greater than INT_MAX. Suddenly we get incomprehensible results; the right answer is (foo + a) % 2U instead.

This was unacceptable, not because it was hard to fix, but because it was so surprising. Almost everywhere else you could simply ignore whether an underlying conversion was signed or unsigned – it just worked. But / and %? Here the solution broke down. Because it “just worked” elsewhere, it was fairly opaque which sub‑expression was signed or unsigned. The convenience turned a minor issue into a big one.

The immediate reaction was to patch it: issue an error on “unsigned / signed” and “unsigned % signed”. However, more issues were lurking in the shadows.

The tricky wrap

If you write a ring buffer, how do you make sure that calculating offsets wraps correctly?

The naïve solution is:

index = (start + offset) % length;

This works as long as offset is positive. What about negative values? A common simple solution is:

index = ((start + offset) % length + length) % length;

Since offset is negative we can assume signed numbers; barring extremely large offsets (causing signed overflow) this works.

Now remember how we started with unsigned sizes? Using unsigned everywhere leads to code that looks like this:

index = ((start - offset_back) % length + length) % length;

That is completely wrong – but also hard to detect. It will sometimes wrap correctly, but mostly not.

The correct code for unsigned arithmetic should be something like:

index = (start + length - (offset_back % length)) % length;

Regardless of the rules we apply to unsigned ↔ signed conversions, there is simply no way for the compiler to tell us that the first “offset_back” example is broken for unsigned.

The unsigned size

It seems hard to solve the problem with unsigned, so perhaps we’re making a faulty assumption.

Look back in time: C was originally designed around signed integers, with the int type at its core. This all changed when the type of sizeof was standardized to the unsigned size_t.

That single change single‑handedly introduced… (the original text ends here).

Unsigned arithmetic is a common thing in C code. Finding this new shiny thing, people started to use `unsigned` to encode “this value can’t be negative” and talked about how using `unsigned` helped since it allowed them to express larger sums.

That didn’t mean it was without problems. In fact, the problems were so significant that in the 90s Java decided to drop unsigned types entirely in its design. Java’s reaction was perhaps a little extreme, but it did achieve the goal of making a large set of common bugs – related to unsigned – just go away.

Go should give us pause: it’s a low‑level language, created as a reaction to problems in C++, by people who knew exactly what unsigned sizes cost – and they picked signed sizes.

With any bounded integers, problems arise when we close in on the boundaries. For a 32‑bit signed int that is approximately ±2 billion, for an unsigned 32‑bit integer it’s 0 … ≈4 billion. The “unsafe” boundaries for unsigned lie so much closer than for signed integers – there is simply no contest.

This is exactly why we see problems for things like the case with %.

But what about the range? While it’s true that you get twice the range, surprisingly often the code in the range above INT_MAX is quite bug‑ridden. Any code doing something like

(2U * index) / 2U

in this range will have quite the surprise coming. It’s worse than that: overflow for signed values generally produces an invalid, negative number – but unsigned overflow often produces a plausible number, just the wrong one. Not to mention that on modern 64‑bit machines you’ll run out of memory before you can use a full signed 64‑bit integer.

Isn’t it valuable to be in the right range by design?

The answer seems to be no, judging from work on verification frameworks, as unsigned only encodes modulo behaviour and actual ranges. It might be argued that you can make unsigned overflow an error (this is indeed what Rust does), but that removes useful properties of unsigned arithmetic:

(a + b) - c   ==   a + (b - c)   // true when unsigned arithmetic wraps

If overflow is not allowed, the equality no longer holds – a trap in itself.

So we have unsigned quite frequently used, more or less by historical accident. It’s error‑prone and silently hides errors. Maybe the solution isn’t trying to make it more ergonomic?

Signed first

As you might have anticipated, C3 has adopted signed sizes for types and lengths. Since unsigned now becomes more rare, we don’t need any implicit conversions between unsigned and signed. Comparisons between unsigned and signed? – also gone.

When doing this change I started removing unrelated uint and ulong usages as well, and I discovered code that seemed suspicious or just plain wrong. Also, the code simply got cleaner with just int and signed sizes everywhere. This is where I realized I had been internalizing the cost of using unsigned: after a while working in C or C++, you get the habit of looking for possible problems due to unsigned, and using patterns that are less obvious but are sure to work for both unsigned and signed variables.

I’m a bit embarrassed about how long it took for me to change this, and it’s a testament to how deeply ingrained the habit was. I just assumed unsigned sizes were the way to go, and that the problem was simply to improve ergonomics and eliminate as many pitfalls as possible. This despite both Go and Java showing the way with signed sizes.

Even after deciding on the change, converting from unsigned to signed felt awkward and wrong at first, as if I was doing something forbidden – that’s how far gone I was. But seeing how each change both made the code easier to reason about and more correct, I couldn’t deny the evidence.

Some notes on the changes in C3

  • This change was discussed in the C3 Discord before it was implemented and got the affectionate name “iszmageddon”, a reference to the isz type (roughly corresponding to ssize_t) becoming the default type of sizes.
  • To more clearly promote the signed size, it was renamed just sz, giving version 0.8.0 the asymmetric pair sz / usz. This makes it easy to remember which one is preferred. Consequently the change was renamed “szmageddon”.
  • Originally the implicit conversion between signed ↔ unsigned was mainly left intact, but it was later completely dropped.

Discuss this article on Hacker News.

0 views
Back to Blog

Related posts

Read more »

I Think Ruby Isn’t Dynamic Enough…

Introduction This is, admittedly, more of a personal ramble than a technical article. For the past few years I have become something of a Crystal believer. Loo...

Installing Elixir with ASDF

I'm getting into Elixirhttps://elixir-lang.org/, but before I could start doing anything I had to install it. Since I use asdfhttps://asdf-vm.com/ to manage lan...