Low-level mechanisms | Go ↔ Java

In this article, we will examine the key low-level mechanisms of Go, comparing them to similar tools in Java. The article is intended for Java developers who want to gain a deeper understanding of Go, as well as for Go developers, who wish to see how their familiar mechanisms are structured in Java. We will cover atomic operations, memory management, the scheduler's work, unsafe tools, and much more. For each topic, there will be an explanation under the hood, code examples in Go and Java, usage tips, and practical applications.

atomic.CompareAndSwap

atomic.CompareAndSwap - lock-free atomic 🔐⚛️, compares and swaps value without races

Compare-And-Swap (CAS) is an atomic operation that allows safely modifying a variable's value without using locks. It compares the current value with the expected one, and if they match, it writes a new value. In Go, it is implemented in the sync/atomic package, and in Java, through classes from java.util.concurrent.atomic. Under the hood, CAS is a CPU instruction (e.g., CMPXCHG) that executes atomically at the processor level. This means that no other thread can interfere during the operation execution. Additionally, CAS acts as a memory barrier, preventing the CPU from reordering operations. However, CAS is not perfect. Under high contention, a "spin loop" arises, where a thread repeatedly tries to perform the operation. This can load the CPU. There is also the ABA problem: a value may change A→B→A, and CAS does not notice it. In Java, this is solved via AtomicStampedReference, while in Go, through additional version fields.


import "sync/atomic"

var counter int32 = 0

for {
    old := atomic.LoadInt32(&counter) // read current value
    // try to update the value atomically
    if atomic.CompareAndSwapInt32(&counter, old, old+1) {
        break // success
    }
    // else repeat (spin)
}

// Java
import java.util.concurrent.atomic.AtomicInteger;

AtomicInteger counter = new AtomicInteger(0);

while (true) {
    int old = counter.get(); // read value
    // CAS operation
    if (counter.compareAndSet(old, old + 1)) {
        break; // success
    }
    // else repeat
}


ASCII diagram of CAS:

Thread1: read A
Thread2: change A -> B
Thread2: change B -> A
Thread1: CAS(A -> C) ✔ (ABA problem)

Description: thread 1 thinks the value has not changed,
but in reality, it changed twice.

Use CAS only for simple operations, such as counters or flags. Under the hood, each failed attempt is an expensive operation involving memory access and re-executing CPU instructions. Under high contention, this can lead to significant CPU load and reduced performance. It is also important to remember the ABA problem – if you are working with complex structures, it is better to use locks or additional version control mechanisms. CAS is good where operations are short and the likelihood of conflict is low.

CAS is widely used in high-load systems: in Java — in ConcurrentHashMap, atomic counters, lock-free data structures; in Go — in runtime and sync packages. The main advantage is the absence of locks and high performance under low contention. Disadvantages include complexity of implementation, the possibility of livelock, and increased CPU load during contention. In real systems (e.g., financial systems or message queues), CAS is used to minimize delays and increase throughput.

stack vs heap escape

stack vs heap escape - data is escaping from the stack 🏃‍♂️➡️🏔️, the variable goes to heap from stack

Escape analysis is a compiler mechanism that determines where to place variables: on the stack or in the heap. In Go, this happens at compile time. If a variable is used only within a function, it is placed on the stack. If it "escapes" (for example, returned as a pointer), it goes to the heap. In Java, the situation is different: logically all objects are created in the heap, but the JIT compiler can optimize them and place them on the stack or even completely eliminate them (scalar replacement). This means that at runtime Java can behave similarly to Go. Under the hood, the stack is just a pointer that moves up/down, which is very fast. The heap requires the work of the garbage collector (GC), which adds overhead. Therefore, minimizing heap allocation is key to high performance.


func create() *int {
    x := 42 // stack first
    return &x // escape → heap
}

// Java
public Integer create() {
    Integer x = 42; // heap (but can be optimized by JIT)
    return x;
}


ASCII diagram:

[Stack] -> fast access
[Heap ] -> GC, allocations, pauses

Go:
x -> stack
&x -> heap

Java:
x -> heap (logically)
JIT → stack (optionally)

In Go, try to write code so that objects do not "escape" to the heap. This reduces the load on the GC and improves latency. Under the hood, each heap allocation requires the participation of the garbage collector, which can stop program execution (stop-the-world). In Java, avoid creating a large number of temporary objects, especially in hot loops. Although JIT can optimize such cases, this cannot be fully relied upon.

Escape analysis is used in optimizing high-load services. In Go, this directly impacts performance as it reduces pressure on the GC. In Java, it indirectly affects — through decreasing the number of objects in the heap. It is applied in backend services, data processing, microservices. The plus — reduction of GC and improvement in performance. The minus — the need to write less obvious code and consider compiler implementation details.

sync.Pool

sync.Pool - recycling objects from the pool ♻️, reuses objects reducing memory allocations

sync.Pool is a mechanism for reusing objects in Go that helps reduce the number of allocations and the load on the GC. It is important to understand that this is not a full-fledged pool, but a cache: objects can be removed by the garbage collector at any moment. Under the hood, sync.Pool uses local caches for each P (processor), which minimizes locks. This makes Get/Put operations very fast. In Java, an analog can be implemented through ThreadLocal or custom object pools. However, the JVM uses TLAB (Thread Local Allocation Buffer), which makes allocations in the heap very fast, reducing the need for pools.


import "sync"

var pool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 1024)
    },
}

buf := pool.Get().([]byte)
pool.Put(buf)

// Java
ThreadLocal<byte[]> pool = ThreadLocal.withInitial(() -> new byte[1024]);

byte[] buf = pool.get(); // get buffer
// use buffer

Use sync.Pool only for short-lived objects. Under the hood, the GC can clean the pool at any moment, so you cannot rely on it as a guaranteed storage. This is an optimization, not a resource management mechanism. In Java, one should not abuse the object pool, as the JVM already optimizes allocations.

sync.Pool is used for buffers, serialization, processing network data. This helps reduce the number of allocations and the load on the GC. In Java, similar tasks are often solved through ThreadLocal. Advantages — reduced GC and increased performance. Disadvantages — management complexity and lack of storage guarantees. In high-load systems, this is critical for latency.

goroutine stack splitting

goroutine stack splitting - the goroutine stack grows like an accordion 🪗, dynamically increasing the goroutine stack as needed

Goroutine is a lightweight thread in Go that uses a small stack (~2KB) and can grow dynamically. This is called stack splitting. When the stack overflows, the runtime allocates a new larger stack and copies the data. In Java, each thread has a fixed stack (usually around 1MB), which is allocated at creation. This makes creating a large number of threads expensive. Under the hood, Go uses an M:N scheduler: many goroutines are distributed over a small number of OS threads. This allows thousands and even millions of goroutines to run. In Java, a 1:1 model is used — each Thread corresponds to an OS thread.


go func() {
    var arr [10000]int // may cause stack growth
    _ = arr
}()

// Java
new Thread(() -> {
    int[] arr = new int[10000]; // heap, but stack is fixed
}).start();


ASCII diagram:

Go (M:N):
G1 G2 G3 G4
 \ |  | /
   M1 M2  (OS threads)

Java (1:1):
Thread1 -> OS thread
Thread2 -> OS thread

In Go, do not be afraid to create thousands of goroutines — the runtime is optimized for that. However, avoid blocking operations. In Java, use thread pools instead of creating a large number of threads. Under the hood, each Thread is an expensive OS resource.

Goroutines are used in network servers, request processing, background tasks. In Java, the equivalent is ExecutorService. The advantages of Go are high scalability and low overhead. Disadvantages — complexity of debugging and management. In Java — stability and predictability, but a higher cost of threads.

memory consistency model

memory consistency model - memory with different times ⏳🧠, rules of visibility of changes between threads

The memory model defines how changes made by one thread (or goroutine) become visible to others. This is a critically important concept because modern processors and compilers can reorder instructions (instruction reordering) for optimization. Go uses the Go Memory Model, which guarantees correctness only when using synchronization primitives: channels, mutex, atomic operations. Java uses the Java Memory Model (JMM), which introduces the concepts of happens-before, volatile, synchronized. Under the hood, the CPU uses caches and write buffers, so writing to a variable may not be immediately visible to other threads. Memory barriers (e.g., with volatile or atomic) force the processor to synchronize the cache with main memory. Without proper synchronization, race conditions and "invisible" changes are possible. For example, one thread may never see an update to a variable if there is no happens-before relationship.


import "sync/atomic"

var flag int32 = 0

// writer goroutine
atomic.StoreInt32(&flag, 1) // guarantees visibility

// reader goroutine
if atomic.LoadInt32(&flag) == 1 {
    // safely read
}

// Java
public class Example {
    // volatile guarantees visibility + ordering
    volatile int flag = 0;

    public void writer() {
        flag = 1; // write with memory barrier
    }

    public void reader() {
        if (flag == 1) {
            // guaranteed to see the update
        }
    }
}


ASCII diagram:

Thread1 (write)        Thread2 (read)
   flag = 1                if flag == 1
      |                        ^
      v                        |
   CPU cache ---------> Memory barrier ---------> CPU cache

Description:
without memory barrier, data may stay in the cache and not reach main memory.

Never rely on "intuitive" thread behavior. Without synchronization, program behavior is undefined. Under the hood, the CPU may reorder instructions, and the compiler may optimize code in ways that your expectations do not match reality. Use atomic, mutex, or volatile/synchronized. In Go, it's especially important to use channels or the sync package, as without them there are no guarantees of happens-before. In Java, volatile is not just "visibility," it is also a memory barrier that prohibits reorder.

The memory model is critical in multithreaded applications: servers, queues, caches. For example, double-checked locking without volatile in Java breaks. In Go, incorrect use of shared variables leads to race condition. The benefits of proper synchronization are correctness and predictability. The downsides are overhead due to memory barriers. However, this overhead is significantly less than the cost of bugs in production. In high-load systems (e.g., message brokers), a proper understanding of the memory model is a prerequisite.

scheduler preemption

scheduler preemption - forced CPU selection ⚡🧵, the scheduler interrupts the goroutine to switch

Preemption is the ability of the scheduler to interrupt the execution of a task to give the CPU to other tasks. In Go, starting from version 1.14, asynchronous preemption is implemented: the runtime can stop a goroutine at almost any moment. This is important for fairness and preventing CPU "sticking". In Java, preemption is managed by the JVM and the operating system. Threads are scheduled by the OS scheduler, and the JVM only provides hints (for example, Thread.yield()). Under the hood, the Go runtime uses signals and special safe points to interrupt goroutines. This makes the system more responsive. In Java, the park/unpark mechanism and OS-level scheduling are used.


import "runtime"

func worker() {
    for {
        // infinite loop
        runtime.Gosched() // yield CPU to other goroutines
    }
}

// Java
public class Example {
    public void worker() {
        while (true) {
            // infinite loop
            Thread.yield(); // hint scheduler
        }
    }
}


ASCII diagram:

Go scheduler:
[G1] [G2] [G3]
  |    |    |
   ----M-----
      CPU

Java:
Thread1 -> OS scheduler
Thread2 -> OS scheduler

Do not rely on yield/Gosched as a mechanism for controlling logic. This is just a hint to the scheduler. Under the hood, there is no guarantee that another thread will get the CPU. In Go, it is better to use channels and synchronization. In Java, use ExecutorService and blocking queues. Preemption is a tool for fairness, not for controlling logic.

Preemption is used for load balancing. In Go, it allows running thousands of goroutines without starvation. In Java, it depends on the OS scheduler. Pros — even distribution of CPU. Cons — overhead of context switching. In high-load systems (e.g., web servers), proper scheduler setup is critical for stability.

unsafe basics

unsafe basics - bypassing safety rules 🚧⚠️, direct memory access without guarantees

unsafe is the ability to bypass the type system and work directly with memory. In Go, this is the unsafe package, in Java — sun.misc.Unsafe (now being replaced by VarHandle). Under the hood, unsafe allows you to read and write memory directly, bypassing type checks and GC. This provides maximum performance but fully shifts responsibility to the developer. Any mistake can lead to memory corruption, leaks, or program crashes. Also, unsafe can break between runtime versions, as it depends on internal implementation.


import "unsafe"

var x int = 10

ptr := unsafe.Pointer(&x)
p := (*int)(ptr)

*p = 42 // directly change memory

// Java
import sun.misc.Unsafe;
import java.lang.reflect.Field;

Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
Unsafe unsafe = (Unsafe) f.get(null);

int[] arr = new int[1];
long offset = unsafe.arrayBaseOffset(int[].class);

// direct access to array memory
unsafe.putInt(arr, offset, 42);

Use unsafe only in extreme cases. Under the hood, you disable type safety and interfere with the operation of GC. This can lead to hard-to-detect bugs. In Java, sun.misc.Unsafe is already considered deprecated — use VarHandle. In Go, unsafe is often used only in runtime and system libraries.

unsafe is used in high-performance data structures, serialization, runtime. The plus — maximum performance. The minus — high risk of errors. It is rarely used in production and only by experienced developers. For example, in systems like Netty or low-latency frameworks.

cache locality

cache locality - data lives nearby in cache 🧠📦, fast access to nearby memory

Cache locality is the principle of placing data in memory so that the CPU can process it efficiently. The processor works with cache lines (~64 bytes), and if data is located sequentially, it loads faster. In Go and Java, this is equally important. Under the hood, the CPU uses L1/L2/L3 caches. If data is located randomly, a cache miss occurs — an expensive operation. It is also important to consider false sharing — when different threads work with different variables, but they are in the same cache line.


arr := make([]int, 1000)

for i := 0; i < len(arr); i++ {
    arr[i] = i // sequential access
}

// Java
int[] arr = new int[1000];

for (int i = 0; i < arr.length; i++) {
    arr[i] = i; // sequential access
}


ASCII diagram of cache line:

| x | y | z | w |  (64 bytes)

Thread1 → x
Thread2 → y

→ false sharing → slowdown

Structure data so that access is sequential. Under the hood, the CPU loads entire cache lines, so sequential access is faster. Avoid false sharing — use padding or separate data between threads.

Cache locality is critical in high-performance systems: games, databases, data processing. Pros — significant acceleration. Cons — design complexity. In Java, this is often solved through arrays, in Go — through slices and structures.

allocation cost optimization

allocation cost optimization - memory allocation savings 💸🧠, reduces costs for object creation

Memory allocations are one of the most expensive operations in high-load systems. Despite the fact that modern runtimes (Go and JVM) are heavily optimized, each allocation still creates pressure on GC, increases latency, and can lead to pauses. In Go, allocations are divided into stack and heap. Stack allocation is almost free - it's just a pointer shift. Heap allocation requires work from the allocator and subsequent processing by GC. Therefore, Go developers actively monitor escape analysis to minimize heap. In Java, all objects are logically created in the heap, but thanks to TLAB (Thread Local Allocation Buffer), allocations are very fast - it's just a pointer bump within the thread-local area. However, GC still has to handle objects, especially short-lived ones (young generation). Under the hood: - Go uses an allocator with mcache/mcentral/mheap. - JVM uses generational GC (Eden, Survivor, Old). The main goal of optimization is to reduce the number of objects and their lifespan.


func process() {
    // bad: create a new slice every time
    data := make([]int, 0, 1000)

    for i := 0; i < 1000; i++ {
        data = append(data, i)
    }
}


public void process() {
    // bad: create a new list every time
    List<Integer> data = new ArrayList<>(1000);

    for (int i = 0; i < 1000; i++) {
        data.add(i);
    }
}


ASCII allocation scheme:

Go:
stack -> cheap
heap  -> GC pressure

Java:
TLAB -> fast alloc
heap -> GC cleanup

Description:
even if allocation is fast, GC still has to process it.

Minimize allocations in hot code paths. Under the hood, each heap allocation adds work for the garbage collector, which can cause stop-the-world pauses. In Go, try to keep data on the stack and reuse objects (e.g., through sync.Pool). In Java, avoid unnecessary boxing/unboxing (e.g., Integer instead of int) and creating temporary objects in loops. Even if allocation is "fast," the cumulative effect on GC can be critical under high load.

Allocation optimization is used in high-load services, streaming systems, low-latency applications (e.g., trading systems). In Go, this is expressed in the use of buffers, preallocation, and pools. In Java - in the use of primitives, object reuse, and tuning GC. Pros: reduced latency, decreased GC pauses, increased throughput. Cons: code complexity, loss of readability, risk of premature optimization. It is important to understand the balance: do not optimize everything indiscriminately, but only bottlenecks.

runtime.Gosched

runtime.Gosched - voluntary CPU transfer 🤝⚡, goroutine yields the thread to others

runtime.Gosched is a function that allows the current goroutine to voluntarily yield the CPU to other goroutines. This is not a blockage or sleep — it is just a signal to the scheduler: "I am ready to yield". Under the hood, the Go scheduler operates on an M:N model: a number of goroutines (G) are distributed across OS threads (M) through processors (P). When Gosched is called, the current goroutine is placed back in the queue, and the scheduler can choose another. In Java, the equivalent is Thread.yield(), but it is less predictable as it depends on the OS scheduler. Important: Gosched does not guarantee a switch — it is just a hint. However, in a tight loop, it can prevent starvation of other goroutines.


import "runtime"

func worker() {
    for i := 0; i < 10; i++ {
        // doing work
        runtime.Gosched() // yield CPU
    }
}


public void worker() {
    for (int i = 0; i < 10; i++) {
        // doing work
        Thread.yield(); // hint scheduler
    }
}


ASCII diagram:

[G1 running] -> Gosched -> queue
                ↓
          [G2 starts]

Description:
the goroutine voluntarily goes into the execution queue.

Do not use Gosched as a synchronization mechanism. Under the hood, it is just a hint to the scheduler, and there is no guarantee that another goroutine will actually get the CPU. Use it only in rare cases — for example, when implementing lock-free structures or busy-wait loops. In Java, Thread.yield() is generally considered unreliable and rarely used. It is better to use blocking primitives (channels, locks).

Gosched is used in runtime, low-level libraries, and some lock-free algorithms. For example, if a thread is running in a loop for a long time, Gosched helps not to fully block the CPU. Pros: improvement in fairness. Cons: unpredictability and dependence on the scheduler. It is rarely used in production code, more often in infrastructure components.

runtime.LockOSThread

runtime.LockOSThread - attach the thread to the goroutine 🔗🧵, fixes execution to one OS thread

runtime.LockOSThread is a mechanism that "attaches" the current goroutine to a specific OS thread. Usually, the Go runtime freely moves goroutines between threads, but sometimes a fixed binding is required. This is necessary when interacting with native libraries (C, OpenGL, GUI) that require all calls to occur from a single thread. Under the hood, the Go scheduler stops moving the goroutine between M (OS threads). This breaks the M:N model and can reduce performance. In Java, the equivalent is simply using Thread directly, as each Thread is already bound to an OS thread (1:1 model).


import "runtime"

func main() {
    runtime.LockOSThread() // bind goroutine to OS thread

    // all operations are now on one thread
    doNativeCall()
}


public static void main(String[] args) {
    // each Thread in Java is already an OS thread
    Thread t = new Thread(() -> {
        // perform native calls
        doNativeCall();
    });
    t.start();
}


ASCII diagram:

Go:
G1 -> M1 (locked)

Java:
Thread1 -> OS thread1

Description:
in Go, this is an exception to the model, in Java — standard behavior.

Use LockOSThread only when necessary. Under the hood, you break the flexibility of the scheduler, and this can lead to decreased scalability. This is justified only when working with C libraries, GUI, or system APIs. In ordinary business logic, this is almost never needed.

Applied in graphical applications (OpenGL), system libraries, integration with C/C++. Pros: correctness of working with APIs that require thread affinity. Cons: loss of scalability, debugging complexity. In Java, this is the standard model, so such problems are fewer, but there is also less flexibility.

runtime.GC

runtime.GC - garbage collection in memory 🧹🧠, automatically cleans up unused objects

runtime.GC in Go is a built-in garbage collector that is responsible for automatically freeing unused memory. Under the hood, Go uses three main approaches: memory trigger (heap growth), timer trigger (periodic GC), and concurrent scanning. Go GC is multithreaded and incremental, which helps minimize pauses during garbage collection. Similarly, HotSpot GC in Java (e.g., G1 or ZGC) works using compaction, generational areas (Young/Old), and stream parallelism. The main difference is that Go is focused on low latency and short pauses, while Java is often more flexible in configuration and optimization for specific loads.

Example code Go/Java


// Go: Manual garbage collection call
package main

import (
    "fmt"
    "runtime"
)

func main() {
    data := make([]int, 1_000_000) // create an array
    fmt.Println("Before GC:", len(data))
    runtime.GC() // explicit call to garbage collector
    fmt.Println("After GC:", len(data))
}


// Java: Manual garbage collection call
public class GCDemo {
    public static void main(String[] args) {
        int[] data = new int[1_000_000]; // create an array
        System.out.println("Before GC: " + data.length);
        System.gc(); // explicit call to garbage collector
        System.out.println("After GC: " + data.length);
    }
}

Don't rely on explicit GC calls in production code: it does not guarantee immediate memory cleanup. In Go, runtime.GC() starts garbage collection, but the scheduler may delay some steps. In Java, System.gc() only suggests the JVM to start garbage collection. Under the hood, both mechanisms use multithreading, but Go tries to minimize pauses, while Java may use parallel and generational collection for throughput optimization.

runtime.GC and Java GC are applied in scenarios with intensive creation and destruction of objects, such as web servers, task queues, or systems with dynamic content. Pros: automatic memory management, fewer leaks. Cons: potential pauses, unpredictability of garbage collection runtime. In Go, optimizing allocation hotspots is important to minimize frequent collections, while in Java, choosing the right GC strategy (G1, ZGC, Shenandoah) for specific loads is crucial.

High-Concurrency Patterns

High-Concurrency Patterns - orchestra of thousands of tasks 🎻⚡, architectures for mass parallelism

High-concurrency patterns are approaches and design patterns for applications that allow efficient use of a large number of threads or goroutines. In Go, these are often channelized queues, worker pools, fan-in/fan-out schemes, and lock-free structures. In Java, these are ExecutorService, ForkJoinPool, ConcurrentHashMap, and atomic structures. Under the hood, these patterns manage task queues, minimize locks, and ensure load balancing between threads or CPU cores. The goal is high throughput and low latency with a large number of parallel operations.

Go/Java Code Example


// Go: Worker pool
package main

import (
    "fmt"
    "sync"
)

func worker(id int, jobs <-chan int, wg *sync.WaitGroup) {
    defer wg.Done()
    for j := range jobs {
        fmt.Println("Worker", id, "processing", j)
    }
}

func main() {
    jobs := make(chan int, 5)
    var wg sync.WaitGroup

    for w := 1; w <= 3; w++ {
        wg.Add(1)
        go worker(w, jobs, &wg)
    }

    for j := 1; j <= 5; j++ {
        jobs <- j
    }
    close(jobs)
    wg.Wait()
}


// Java: Worker pool with ExecutorService
import java.util.concurrent.*;

public class WorkerPool {
    public static void main(String[] args) throws InterruptedException {
        ExecutorService executor = Executors.newFixedThreadPool(3);

        for (int i = 1; i <= 5; i++) {
            final int job = i;
            executor.submit(() -> System.out.println("Worker executing " + job));
        }

        executor.shutdown();
        executor.awaitTermination(1, TimeUnit.MINUTES);
    }
}

Use high-concurrency patterns to increase throughput, but watch the balance between the number of threads/goroutines and CPU resources. In Go, an excessive number of goroutines can cause scheduler overhead, while in Java, an increased number of threads increases context switches and memory load. Under the hood, both models use task queues and schedulers, but Go is lighter in creating thousands of goroutines due to minimal stack size.

High-concurrency patterns are used in web servers, message brokers, event processing, tasks with high parallel load. Pros: high performance, scalability. Cons: debugging complexity, potential data races, need for proper synchronization. In Go, the focus is on channels and worker pools, in Java — on ExecutorService and concurrent data structures.

GMP Scheduler Model

GMP Scheduler Model - trio for goroutine management 🎛️🧵, Go model: Goroutine-M-P scheduling

GMP (Goroutine, Machine, Processor) — is a scheduling model for execution in Go. Goroutine (G) — a lightweight task, Machine (M) — OS system thread, Processor (P) — scheduling context with a task queue. The GMP scheduler distributes goroutines across M via P, using work-stealing and cooperative preemption. Similarly, in Java, Thread Pool + OS Thread works, but the JVM does not differentiate lightweight tasks and OS threads as precisely. Under the hood, GMP allows Go to efficiently manage thousands of goroutines, minimizing context switches and CPU idle time. The flow diagram of threads:


  +-------------------+
  | Goroutine Queue P1|
  +-------------------+
           |
           v
      +---------+
      |  Machine|
      +---------+
           |
           v
       CPU Core


  // Go: demonstration of scheduling multiple goroutines
  package main

  import "fmt"

  func main() {
      for i := 0; i < 5; i++ {
          go func(id int) {
              fmt.Println("Goroutine", id, "is running")
          }(i)
      }

      // Waiting for completion for the example
      var input string
      fmt.Scanln(&input)
  }


// Java: multithreaded execution of tasks
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class GMSDemo {
    public static void main(String[] args) {
        ExecutorService executor = Executors.newFixedThreadPool(3);

        for (int i = 0; i < 5; i++) {
            final int id = i;
            executor.submit(() -> System.out.println("Thread " + id + " is running"));
        }

        executor.shutdown();
    }
}

For effective use of GMP, it is important to properly balance the number of P and M to avoid CPU idle time and unnecessary OS threads. In Java, Thread Pool performs a similar function but manages threads at the JVM level without a dedicated lightweight scheduler. Under the hood, GMP uses task queues and steal logic for optimal loading of all processors.

GMP Scheduler is useful in high-load Go applications, where thousands of goroutines are created (servers, microservices, asynchronous processing). Pros: lightweight goroutines, minimal memory overhead, high scalability. Cons: complexity in tracking execution order. In Java, to achieve similar scales, ThreadPoolExecutor and fork/join framework are used, which requires more resources than lightweight goroutines.

Term	Go	Java	Comment
atomic.CompareAndSwap	sync/atomic	AtomicInteger / AtomicReference	Atomic operations for lock-free algorithms, work through CPU instructions, prevent race conditions.
stack vs heap escape	Escape analysis	All objects in heap, primitives in stack	Memory optimization and GC pressure. In Go, the runtime analyzes escape to reduce allocations in the heap.
sync.Pool	sync.Pool	ThreadLocal / Object Pool	Object pool for reuse, reduces GC load and allocations.
goroutine stack splitting	Dynamic goroutine stack adjustment	Thread stack is fixed	Go allows small goroutine stacks to grow as needed, Java stack is fixed and does not grow dynamically.
memory consistency model	Go memory model	Java Memory Model	Defines visibility of changes between threads, happens-before relationship.
scheduler preemption	Goroutine preemption	Thread preemption JVM	Go runtime schedules goroutines execution, can interrupt long operations; JVM uses OS threads preemption.
unsafe basics	unsafe package	sun.misc.Unsafe / VarHandle	Allows direct memory manipulation, bypass safety, increases the risk of errors and segmentation faults.
cache locality	Data structure optimization	Data layout, padding	Proper data placement speeds access through CPU cache lines, reduces false sharing.
allocation cost optimization	stack vs heap + sync.Pool	Object pooling, escape analysis	Allocation optimization reduces load on GC and speeds up the application.
runtime.Gosched	runtime.Gosched()	Thread.yield()	Allows yielding execution to the current thread, helps the scheduler distribute time between goroutine / thread.
runtime.LockOSThread	runtime.LockOSThread()	Thread affinity / native Thread	Binds goroutine to OS thread, used when calling native libraries where a specific thread is required.

Output / Conclusion

In this article, we explored the low-level mechanisms of Go and compared them to their counterparts in Java. The main conclusion: the performance and correctness of concurrent code depend on a proper understanding of the memory model, allocations, and the scheduler's behavior. Atomic operations allow for the creation of lock-free data structures, escape analysis and sync.Pool minimize the load on GC, and goroutine stack splitting and scheduler preemption ensure efficient execution of many lightweight threads. Unsafe tools provide additional capabilities but increase the risk of errors. Understanding cache locality and allocation cost helps optimize applications at the CPU level. For a Java developer, it is important to see, how Go provides more direct control over these mechanisms, while a Go developer can benefit from knowing how similar concepts are implemented in Java, allowing for the writing of portable and high-performance code.


  // Data flow schema and component interaction
  // Memory & Atomicity -> Allocation -> Scheduler -> Runtime/Unsafe
  // CAS -> value update -> cache -> goroutine execution -> optional unsafe

Low-level mechanisms | Go ↔ Java

atomic.CompareAndSwap

stack vs heap escape

sync.Pool

goroutine stack splitting

memory consistency model

scheduler preemption

unsafe basics

cache locality

allocation cost optimization

runtime.Gosched

runtime.LockOSThread

runtime.GC

Example code Go/Java

High-Concurrency Patterns

Go/Java Code Example

GMP Scheduler Model

Output / Conclusion

Оставить комментарий

Useful Articles:

New Articles:

Low-level mechanisms | Go ↔ Java

atomic.CompareAndSwap

stack vs heap escape

sync.Pool

goroutine stack splitting

memory consistency model

scheduler preemption

unsafe basics

cache locality

allocation cost optimization

runtime.Gosched

runtime.LockOSThread

runtime.GC

Example code Go/Java

High-Concurrency Patterns

Go/Java Code Example

GMP Scheduler Model

Output / Conclusion

Оставить комментарий

Useful Articles:

New Articles:

Subscribe to Lesnih.com Updates 💻

Subscribe to Lesnih.com Updates 💻