Stream vs For in Java: how to write the fastest code possible

In Java, performance is often determined not by the "beauty of the code," but by how it interacts with memory, the JIT compiler, and CPU cache. Let's analyze why the usual for is often faster than Stream, and how to write truly fast code.

1. Basic truth: Stream vs For — this is not an equal battle

When comparing Stream and for, it is important to understand: Stream is an abstraction over iteration.

for — direct access to memory
Stream — pipeline + lambdas + additional calls

Each layer of abstraction adds overhead.

2. Why for is faster

The optimal loop looks like this:


long sum = 0;
for (int i = 0; i < data.length; i++) {
    int x = data[i];
    if (x > 100) {
        sum += x;
    }
}

Reasons for high speed:

No objects
No lambda calls
No pipeline
Linear memory access (cache-friendly)
JIT easily optimizes and vectorizes

3. Why Stream is Slower

Stream looks simple:


Arrays.stream(data)
    .filter(x -> x > 100)
    .sum();

But inside it happens:

creation of IntPipeline
lambda calls
iterator model
chain of operations

Even with JIT optimizations, there is still overhead.

4. Boxing — the main killer of performance

The most common mistake:


List<Integer>

Problem:

every int → Integer (boxing)
load on GC
poor cache locality

Correct:


int[] data

5. When Stream is NOT worse

Stream can be almost as fast if:

IntStream is used
a simple chain of operations
no collect()
no boxing


long sum = Arrays.stream(data)
    .filter(x -> x > 100)
    .sum();

6. When Stream is severely losing

List<Integer> (boxing)
complex pipeline chains
collect() in collection
small arrays (overhead > work)

7. JIT and why the results "float"

Java Virtual Machine dynamically:

compiles hot methods
inlines calls
optimizes loops
changes behavior at runtime

Therefore:

Stream can be faster than Loop in one execution and slower in another.

8. CPU cache and call order

The order of execution affects:

cache warm-up
branch prediction
memory prefetch

This explains why the results may swap places.

9. Practical Rule for Selection

Use for if:

performance-critical hot path
working with arrays
minimal latency is important

Use Stream if:

readability is important
micro-performance is not critical
simple data processing

10. Final Conclusion

The main principle of Java performance:

Closer to memory — faster code.

In Java, performance is determined not by style (for vs stream), but by:
- memory
- allocations
- boxing/unboxing
- cache locality
- JIT inline decisions

⚠️ Stream is worse when:

📦 List<Integer> (boxing)
🔗 many pipeline operations
🧠 complex lambdas
🚫 short-circuit logic breaks
⚡ small datasets (overhead > work)

⚡ When Stream is NOT worse

Stream can be nearly as fast if:

🔢 primitive stream (IntStream)
🧩 simple chain
🚫 no collect()
🚫 no boxing
⚙️ JIT inlines everything

Stream is convenience. For is control and speed.

⚔️ Stream vs For in Java — maximum detailed comparative table

Criterion	For loop	Stream	Comment (what really happens inside the JVM)
🏎️ Execution speed	Very high	Average / high (depends on the case)	For is closer to machine code. Stream adds pipeline overhead, even after JIT optimizations.
🧠 Abstraction overhead	Minimal	High	Stream = Iterator + Spliterator + Pipeline + Lambda chain. Each layer = potential overhead.
📦 Boxing / Unboxing	No (with int[] / long[])	Often present (if List<Integer>)	Boxing = creating Integer objects → load on GC + cache miss.
💾 Cache locality	Excellent	Average / poor	For works linearly through the array → CPU prefetch is effective. Stream can break locality through the pipeline.
⚙️ JIT optimization	Highly optimizable	Optimizable, but more complex	Loop can be easily inlined and vectorized (SIMD). Stream requires analysis of the call chain.
🔥 Inlining	Almost always	Partially	Stream pipeline can hinder complete inlining of the chain.
🧩 Lambda overhead	No	Yes	Lambda can be inlined, but not always. Sometimes invokedynamic remains.
🚀 SIMD / Vectorization	Often possible	Rarely	JVM finds it easier to vectorize a simple loop than a Stream pipeline.
🧾 Readability	Average	High	Stream is better read with complex data processing logic.
⚡ Small datasets	Very fast	Slower (overhead outweighs performance)	Stream overhead does not pay off with small data.
📊 Large datasets	Very fast	Almost comparable	When the work dominates overhead — the difference decreases.
🧪 GC pressure	Low	Medium / high	Stream may create intermediate objects → more GC cycles.
🔁 Short-circuit (break/continue)	Full control	Limited	Stream poorly models complex break/continue scenarios.
🧱 Pipeline complexity	Linear code	Chain of operations	Stream builds an execution graph → scheduling overhead.
⚠️ Predictability	Very high	Average	Loop behavior is stable. Stream depends on JIT optimizations.

⚡ Conclusion:

- For = control + maximum performance
- Stream = expressiveness + convenience

🔥 In Java, speed is determined not by style, but by:
- memory
- allocations
- boxing/unboxing
- cache locality
- JIT inline decisions

⚡ Stream vs Loop - Code Example with Comments


import java.util.Arrays;

public class Test {

    static int[] data;

    public static void main(String[] args) {

        int size = 20_000_000;
        data = new int[size];

        // =========================
        // INIT DATA (once)
        // =========================
        for (int i = 0; i < size; i++) {
            data[i] = i;
        }

        // =========================
        // WARMUP (very important for JVM)
        // =========================
        // JVM has NOT fully optimized the code yet
        // JIT (Just-In-Time compiler) now:
        // - analyzes hot methods
        // - may replace bytecode with native code
        // - performs inline optimizations
        for (int i = 0; i < 5; i++) {
            streamSum(); // warming up Stream pipeline
            loopSum();   // warming up the usual loop
        }

        // =========================
        // TEST ORDER 1: STREAM -> LOOP
        // =========================
        System.out.println("=== ORDER 1: STREAM -> LOOP ===");

        long t1 = System.nanoTime();

        // Stream pipeline:
        // Arrays.stream -> IntPipeline -> lambda filter -> sum
        // intermediate objects are created (even if optimized by JIT)
        long r1 = streamSum();

        long t2 = System.nanoTime();

        // Loop:
        // direct access to the array
        // without objects, without pipeline
        long t3 = System.nanoTime();

        long r2 = loopSum();

        long t4 = System.nanoTime();

        System.out.println("Stream result = " + r1);
        System.out.println("Stream time   = " + (t2 - t1) / 1_000_000 + " ms");

        System.out.println("Loop result   = " + r2);
        System.out.println("Loop time     = " + (t4 - t3) / 1_000_000 + " ms");


        // =========================
        // TEST ORDER 2: LOOP -> STREAM
        // =========================
        System.out.println("\n=== ORDER 2: LOOP -> STREAM ===");

        long t5 = System.nanoTime();

        // now Loop runs FIRST
        // CPU cache + branch predictor can already be "warmed up"
        long r3 = loopSum();

        long t6 = System.nanoTime();

        long t7 = System.nanoTime();

        // Stream is now second
        // it may win or lose due to cache state
        long r4 = streamSum();

        long t8 = System.nanoTime();

        System.out.println("Loop result   = " + r3);
        System.out.println("Loop time     = " + (t6 - t5) / 1_000_000 + " ms");

        System.out.println("Stream result = " + r4);
        System.out.println("Stream time   = " + (t8 - t7) / 1_000_000 + " ms");


        // =========================
        // IMPORTANT MOMENT
        // =========================
        // Even if the code is the same:
        // - CPU cache changes
        // - JIT may already "inline" the loop
        // - branch predictor gets trained
        // - GC may have occurred between measurements
    }

    // =========================
    // STREAM VERSION
    // =========================
    static long streamSum() {
        return Arrays.stream(data)
                // lambda -> can be:
                // - inline
                // - or called via invokedynamic
                .filter(x -> x > 100)
                .sum();
    }

    // =========================
    // LOOP VERSION
    // =========================
    static long loopSum() {

        long sum = 0;

        // direct for-each:
        // - no objects
        // - no pipeline
        // - minimal overhead
        for (int x : data) {
            if (x > 100) {
                sum += x;
            }
        }

        return sum;
    }
}

//Response
//=== ORDER 1: STREAM -> LOOP ===
//Stream result = 542889414
//Stream time   = 14 ms
//Loop result   = 199999989994950
//Loop time     = 9 ms

//=== ORDER 2: LOOP -> STREAM ===
//Loop result   = 199999989994950
//Loop time     = 9 ms
//Stream result = 542889414
//Stream time   = 15 ms
// ---------------- Summary
//✔ JVM has already warmed up
//✔ order is not important
//✔ results are stable
//✔ Loop is faster than Stream in this case

Stream vs For in Java: how to write the fastest code possible

1. Basic truth: Stream vs For — this is not an equal battle

2. Why for is faster

3. Why Stream is Slower

4. Boxing — the main killer of performance

5. When Stream is NOT worse

6. When Stream is severely losing

7. JIT and why the results "float"

8. CPU cache and call order

9. Practical Rule for Selection

Use for if:

Use Stream if:

10. Final Conclusion

⚔️ Stream vs For in Java — maximum detailed comparative table

⚡ Stream vs Loop - Code Example with Comments

Оставить комментарий

Useful Articles:

New Articles:

Stream vs For in Java: how to write the fastest code possible

1. Basic truth: Stream vs For — this is not an equal battle

2. Why for is faster

3. Why Stream is Slower

4. Boxing — the main killer of performance

5. When Stream is NOT worse

6. When Stream is severely losing

7. JIT and why the results "float"

8. CPU cache and call order

9. Practical Rule for Selection

Use for if:

Use Stream if:

10. Final Conclusion

⚔️ Stream vs For in Java — maximum detailed comparative table

⚡ Stream vs Loop - Code Example with Comments

Оставить комментарий

Useful Articles:

New Articles:

Subscribe to Lesnih.com Updates 💻

Subscribe to Lesnih.com Updates 💻