I Assumed Java Streams Had Minimal Overhead. They Didn’t
Source: Dev.to
Background
Many of us have heard the mantra that idiomatic code comes at little to no cost and that the benefit to legibility vastly outweighs any performance impact. I recently evaluated this claim more rigorously. By examining the JIT output, I was surprised at how much additional assembly was generated to support the abstractions I use.
Benchmark Setup
I created a simple benchmark to compare a classic for loop with the Streams API when aggregating 1,000,000 double values.
// Classic for‑loop
double retVal = 0;
for (double val : values) {
retVal += val;
}
// Streams version
double result = Arrays.stream(values).sum();
The benchmark was written with JMH, which runs each implementation enough times to let the JIT optimize the code.
- JDK: OpenJDK 17 (HotSpot)
- Architecture: ARM64
- JMH configuration: 5 warm‑up iterations, 5 forks
- Setup: The input array was pre‑generated, and the results were consumed to prevent dead‑code elimination.
Results
On my machine, the for loop outperformed the Streams implementation by roughly five times in operations per second. Disassembly of the generated code showed that the Streams version produced hundreds more instructions and significantly more memory loads/stores, establishing a clear correlation between memory‑access instructions and lower throughput.
Discussion
While this is a toy benchmark and not representative of all real‑world scenarios, it isolates the cost of the abstraction itself. The Streams API offers superior readability, but that legibility comes with a measurable overhead in this context.
Takeaway
Not every piece of code needs to be hand‑optimized, and the readability of streams can be valuable. However, when you are working in a performance‑critical section, even the choice of iteration strategy—for loop vs. Streams—can make a substantial difference.