- 1. Basic truth: Stream vs For — this is not an equal battle
- 2. Why for is faster
- 3. Why Stream is Slower
- 4. Boxing — the main killer of performance
- 5. When Stream is NOT worse
- 6. When Stream is severely losing
- 7. JIT and why the results "float"
- 8. CPU cache and call order
- 9. Practical Rule for Selection
- Use for if:
- Use Stream if:
- 10. Final Conclusion
- ⚔️ Stream vs For in Java — maximum detailed comparative table
- ⚡ Stream vs Loop - Code Example with Comments
Stream vs For in Java: how to write the fastest code possible
In Java, performance is often determined not by the "beauty of the code," but by how it interacts with memory, the JIT compiler, and CPU cache. Let's analyze why the usual for is often faster than Stream, and how to write truly fast code.
1. Basic truth: Stream vs For — this is not an equal battle
When comparing Stream and for, it is important to understand: Stream is an abstraction over iteration.
- for — direct access to memory
- Stream — pipeline + lambdas + additional calls
Each layer of abstraction adds overhead.
2. Why for is faster
The optimal loop looks like this:
long sum = 0;
for (int i = 0; i < data.length; i++) {
int x = data[i];
if (x > 100) {
sum += x;
}
}
Reasons for high speed:
- No objects
- No lambda calls
- No pipeline
- Linear memory access (cache-friendly)
- JIT easily optimizes and vectorizes
3. Why Stream is Slower
Stream looks simple:
Arrays.stream(data)
.filter(x -> x > 100)
.sum();
But inside it happens:
- creation of IntPipeline
- lambda calls
- iterator model
- chain of operations
Even with JIT optimizations, there is still overhead.
4. Boxing — the main killer of performance
The most common mistake:
List<Integer>
Problem:
- every int → Integer (boxing)
- load on GC
- poor cache locality
Correct:
int[] data
5. When Stream is NOT worse
Stream can be almost as fast if:
- IntStream is used
- a simple chain of operations
- no collect()
- no boxing
long sum = Arrays.stream(data)
.filter(x -> x > 100)
.sum();
6. When Stream is severely losing
- List<Integer> (boxing)
- complex pipeline chains
- collect() in collection
- small arrays (overhead > work)
7. JIT and why the results "float"
Java Virtual Machine dynamically:
- compiles hot methods
- inlines calls
- optimizes loops
- changes behavior at runtime
Therefore:
Stream can be faster than Loop in one execution and slower in another.
8. CPU cache and call order
The order of execution affects:
- cache warm-up
- branch prediction
- memory prefetch
This explains why the results may swap places.
9. Practical Rule for Selection
Use for if:
- performance-critical hot path
- working with arrays
- minimal latency is important
Use Stream if:
- readability is important
- micro-performance is not critical
- simple data processing
10. Final Conclusion
The main principle of Java performance:
Closer to memory — faster code.
In Java, performance is determined not by style (for vs stream), but by:
- memory
- allocations
- boxing/unboxing
- cache locality
- JIT inline decisions
⚠️ Stream is worse when:
📦 List<Integer> (boxing)
🔗 many pipeline operations
🧠 complex lambdas
🚫 short-circuit logic breaks
⚡ small datasets (overhead > work)
⚡ When Stream is NOT worse
Stream can be nearly as fast if:
🔢 primitive stream (IntStream)
🧩 simple chain
🚫 no collect()
🚫 no boxing
⚙️ JIT inlines everything
Stream is convenience. For is control and speed.
⚔️ Stream vs For in Java — maximum detailed comparative table
| Criterion | For loop | Stream | Comment (what really happens inside the JVM) |
|---|---|---|---|
| 🏎️ Execution speed | Very high | Average / high (depends on the case) | For is closer to machine code. Stream adds pipeline overhead, even after JIT optimizations. |
| 🧠 Abstraction overhead | Minimal | High | Stream = Iterator + Spliterator + Pipeline + Lambda chain. Each layer = potential overhead. |
| 📦 Boxing / Unboxing | No (with int[] / long[]) | Often present (if List<Integer>) | Boxing = creating Integer objects → load on GC + cache miss. |
| 💾 Cache locality | Excellent | Average / poor | For works linearly through the array → CPU prefetch is effective. Stream can break locality through the pipeline. |
| ⚙️ JIT optimization | Highly optimizable | Optimizable, but more complex | Loop can be easily inlined and vectorized (SIMD). Stream requires analysis of the call chain. |
| 🔥 Inlining | Almost always | Partially | Stream pipeline can hinder complete inlining of the chain. |
| 🧩 Lambda overhead | No | Yes | Lambda can be inlined, but not always. Sometimes invokedynamic remains. |
| 🚀 SIMD / Vectorization | Often possible | Rarely | JVM finds it easier to vectorize a simple loop than a Stream pipeline. |
| 🧾 Readability | Average | High | Stream is better read with complex data processing logic. |
| ⚡ Small datasets | Very fast | Slower (overhead outweighs performance) | Stream overhead does not pay off with small data. |
| 📊 Large datasets | Very fast | Almost comparable | When the work dominates overhead — the difference decreases. |
| 🧪 GC pressure | Low | Medium / high | Stream may create intermediate objects → more GC cycles. |
| 🔁 Short-circuit (break/continue) | Full control | Limited | Stream poorly models complex break/continue scenarios. |
| 🧱 Pipeline complexity | Linear code | Chain of operations | Stream builds an execution graph → scheduling overhead. |
| ⚠️ Predictability | Very high | Average | Loop behavior is stable. Stream depends on JIT optimizations. |
⚡ Conclusion:
- For = control + maximum performance
- Stream = expressiveness + convenience
🔥 In Java, speed is determined not by style, but by:
- memory
- allocations
- boxing/unboxing
- cache locality
- JIT inline decisions
⚡ Stream vs Loop - Code Example with Comments
import java.util.Arrays;
public class Test {
static int[] data;
public static void main(String[] args) {
int size = 20_000_000;
data = new int[size];
// =========================
// INIT DATA (once)
// =========================
for (int i = 0; i < size; i++) {
data[i] = i;
}
// =========================
// WARMUP (very important for JVM)
// =========================
// JVM has NOT fully optimized the code yet
// JIT (Just-In-Time compiler) now:
// - analyzes hot methods
// - may replace bytecode with native code
// - performs inline optimizations
for (int i = 0; i < 5; i++) {
streamSum(); // warming up Stream pipeline
loopSum(); // warming up the usual loop
}
// =========================
// TEST ORDER 1: STREAM -> LOOP
// =========================
System.out.println("=== ORDER 1: STREAM -> LOOP ===");
long t1 = System.nanoTime();
// Stream pipeline:
// Arrays.stream -> IntPipeline -> lambda filter -> sum
// intermediate objects are created (even if optimized by JIT)
long r1 = streamSum();
long t2 = System.nanoTime();
// Loop:
// direct access to the array
// without objects, without pipeline
long t3 = System.nanoTime();
long r2 = loopSum();
long t4 = System.nanoTime();
System.out.println("Stream result = " + r1);
System.out.println("Stream time = " + (t2 - t1) / 1_000_000 + " ms");
System.out.println("Loop result = " + r2);
System.out.println("Loop time = " + (t4 - t3) / 1_000_000 + " ms");
// =========================
// TEST ORDER 2: LOOP -> STREAM
// =========================
System.out.println("\n=== ORDER 2: LOOP -> STREAM ===");
long t5 = System.nanoTime();
// now Loop runs FIRST
// CPU cache + branch predictor can already be "warmed up"
long r3 = loopSum();
long t6 = System.nanoTime();
long t7 = System.nanoTime();
// Stream is now second
// it may win or lose due to cache state
long r4 = streamSum();
long t8 = System.nanoTime();
System.out.println("Loop result = " + r3);
System.out.println("Loop time = " + (t6 - t5) / 1_000_000 + " ms");
System.out.println("Stream result = " + r4);
System.out.println("Stream time = " + (t8 - t7) / 1_000_000 + " ms");
// =========================
// IMPORTANT MOMENT
// =========================
// Even if the code is the same:
// - CPU cache changes
// - JIT may already "inline" the loop
// - branch predictor gets trained
// - GC may have occurred between measurements
}
// =========================
// STREAM VERSION
// =========================
static long streamSum() {
return Arrays.stream(data)
// lambda -> can be:
// - inline
// - or called via invokedynamic
.filter(x -> x > 100)
.sum();
}
// =========================
// LOOP VERSION
// =========================
static long loopSum() {
long sum = 0;
// direct for-each:
// - no objects
// - no pipeline
// - minimal overhead
for (int x : data) {
if (x > 100) {
sum += x;
}
}
return sum;
}
}
//Response
//=== ORDER 1: STREAM -> LOOP ===
//Stream result = 542889414
//Stream time = 14 ms
//Loop result = 199999989994950
//Loop time = 9 ms
//=== ORDER 2: LOOP -> STREAM ===
//Loop result = 199999989994950
//Loop time = 9 ms
//Stream result = 542889414
//Stream time = 15 ms
// ---------------- Summary
//✔ JVM has already warmed up
//✔ order is not important
//✔ results are stable
//✔ Loop is faster than Stream in this case
Оставить комментарий
Useful Articles:
New Articles: