- First naive idea: just synchronized
- Second attempt: lock striping
- Problem #1 — Fixed Parallelism
- Problem №2 — resize
- Problem #3 — memory overhead
- Architecture of ConcurrentHashMap (JDK8)
- End-to-End put operation
- When CAS Does Not Help
- When a bucket turns into a tree
- Resize — the most difficult part
- Java Memory Model
- CPU Cache Effects
- Impact on GC
- Real production case
- Main trade-offs
- The most important engineering idea
- Engineering Intuition
- Main conclusion
When HashMap starts killing production: the engineering story of ConcurrentHashMap
Imagine a typical production service.
- 32 CPU
- hundreds of threads
- configuration / session / rate limits cache
- tens of thousands of operations per second
And somewhere inside — a regular Map.
At first, everything looks harmless.
Map<String, Session> sessions = new HashMap<>();
Then a couple of threads appear.
Then dozens.
Then production load begins.
And suddenly:
- latency spikes
- CPU goes to 100%
- strange freezes appear
- sometimes — infinite loops in HashMap
This is not a hypothesis. These are real production incidents that occurred in large Java systems before the advent of proper concurrent structures.
It is from such stories that ConcurrentHashMap was born.
But to understand why it is designed this way, one must go through the entire evolution:
HashMap
↓
synchronized Map
↓
Segmented ConcurrentHashMap (JDK7)
↓
CAS + bin locking (JDK8)
↓
tree bins + lock striping + resize migration
And this is one of the most beautiful engineering evolutions in the JVM.
Tree bins are a mechanism in HashMap and ConcurrentHashMap (starting from Java 8), whereby a bucket with a large number of collisions is automatically transformed from a linked list to a balanced red-black tree. This prevents performance degradation when there are many elements with the same hash: instead of linear complexity O(n) for search, insertion, and deletion operations, they are performed in O(log n). Such a transformation occurs when the number of elements in a bucket exceeds a specified threshold (usually 8), and serves as protection against both bad hashCode and potential hash collision attacks.
First naive idea: just synchronized
The most obvious protection for HashMap:
Map<K, V> map = Collections.synchronizedMap(new HashMap<>());
Or:
public synchronized V get(K key)
public synchronized V put(K key, V value)
This works.
But only until real concurrency appears.
What happens under load
32 threads
200k ops/sec
Each call:
lock(map)
→ read/write
unlock(map)
We get a global lock.
Diagram
Thread1 ─┐
Thread2 ─┤
Thread3 ─┤
Thread4 ─┤ → LOCK(map) → operation → unlock
Thread5 ─┤
Thread6 ─┘
All threads line up in a queue.
| parameter | effect |
|---|---|
| throughput | falls |
| latency | increases |
| CPU | idles |
| contention | huge |
Even get() blocks all other get().
This is a fundamental problem.
Second attempt: lock striping
The Java engineers decided:
"What if we divide the map into several independent parts?"
Thus, ConcurrentHashMap JDK7 was born.
Segments
ConcurrentHashMap
├─ Segment 0
├─ Segment 1
├─ Segment 2
└─ Segment 3
Each segment is a separate mini HashMap with its own lock.
segment = hash(key) % segments
Now:
Thread1 → Segment1 → lock
Thread2 → Segment2 → lock
Thread3 → Segment3 → lock
And they do not interfere with each other.
Operation code
Segment<K,V> s = segments[(hash >>> segmentShift) & segmentMask];
s.lock();
try {
s.put(key, value);
} finally {
s.unlock();
}
Performance improvement
| threads | synchronizedMap | segmented CHM |
|---|---|---|
| 1 | 1x | 1x |
| 8 | 0.7x | 5x |
| 32 | 0.3x | 18x |
But new problems arose.
Problem #1 — Fixed Parallelism
The number of segments is fixed.
16 segments
But the server:
64 threads
As a result again:
Thread1 → Segment3
Thread2 → Segment3
Thread3 → Segment3
Contention returns.
Problem №2 — resize
Resize — the most difficult part of any hash map.
capacity = capacity * 2
In the segmented model:
resize → lock entire segment
This is expensive.
Problem #3 — memory overhead
Each Segment contains:
ReentrantLock
table
metadata
On large maps, this turns into memory waste.
The JVM engineers decided:
We need a structure without segments.
Thus, ConcurrentHashMap JDK8 was born.
Architecture of ConcurrentHashMap (JDK8)
Node<K,V>[] table
Each bucket is managed separately.
table
├─ bin0
├─ bin1
├─ bin2
├─ bin3
Main idea
| operation | mechanism |
|---|---|
| empty bucket | CAS |
| collision | synchronized |
| resize | cooperative |
End-to-End put operation
map.put("user42", session);
Step 1 — hash
static final int spread(int h) {
return (h ^ (h >>> 16)) & HASH_BITS;
}
This protects against bad hashCode.
Step 2 — index
index = (n - 1) & hash
Bit mask is faster than modulo.
Step 3 — CAS insertion
if (casTabAt(tab, i, null,
new Node<>(hash, key, value)))
break;
CAS = Compare And Swap.
if (table[i] == null)
table[i] = node
But atomically at CPU level.
CAS is performed through Unsafe.compareAndSwapObject and maps to the CPU instruction cmpxchg.
Why this is important
CAS avoids lock.
Thread1 → CAS success
Thread2 → CAS fail → retry
No kernel switch.
When CAS Does Not Help
If the bucket is not empty:
bin0 → node → node → node
Local lock is used:
synchronized (f) {
// modify chain
}
Lock is only for one bucket.
When a bucket turns into a tree
If:
bucket size > 8
List turns into a Red-Black Tree.
O(log n)
Diagram
before
bucket
↓
N → N → N → N → N
after
N
/ \
N N
/ \ / \
Resize — the most difficult part
ConcurrentHashMap performs resize without global pause.
Cooperative migration
old table
├─ bin0
├─ bin1
├─ bin2
└─ bin3
Thread1 → bin0
Thread2 → bin1
Thread3 → bin2
Thread4 → bin3
Threads help move buckets.
ForwardingNode
ForwardingNode
This is a signal:
"this bucket has already moved"
Java Memory Model
volatile Node<K,V>[] table;
This ensures visibility between threads.
put → happens-before → get
CPU Cache Effects
CAS invokes:
invalidate cache line
If many threads hit one bucket:
cache ping-pong
Performance drops.
Impact on GC
Node
key
value
next
Millions of Node → load on GC scanning.
Real production case
Map<IP, Counter>
load:
150k req/sec
Replacement:
synchronized HashMap
↓
ConcurrentHashMap
| metric | before | after |
|---|---|---|
| latency p99 | 120ms | 8ms |
| CPU | 90% | 40% |
| throughput | 30k rps | 150k rps |
Main trade-offs
| solution | plus | minus |
|---|---|---|
| CAS | without lock | retry loops |
| bucket lock | simple | contention |
| tree bins | protection against hash attack | memory |
| cooperative resize | no stop-the-world | complexity |
The most important engineering idea
ConcurrentHashMap is a combination of three strategies:
optimistic (CAS)
pessimistic (locks)
structural (tree bins)
Each is used only when needed.
Engineering Intuition
If:
high concurrency
frequent reads
rare writes
ConcurrentHashMap is ideal.
But if:
writes dominate
CAS can turn into a retry storm.
Main conclusion
ConcurrentHashMap works quickly not because it is a completely lock-free structure. In fact, it actively uses locks inside. Its performance is achieved through another, much more important engineering idea — minimizing contention between threads.
Instead of one global lock that turns all operations into a sequential queue, ConcurrentHashMap breaks the work of the structure into many small independent sections. Each thread tries to work with the most isolated part of the data structure, which dramatically reduces the likelihood that two threads will block each other.
For this, several different synchronization strategies are combined inside the map:
- CAS (Compare-And-Swap) — used for fast optimistic operations, for example, for inserting an element into an empty bucket without using a lock. If the operation succeeds, the thread does not enter the critical section at all.
- Fine-grained locks — when collisions do occur, the lock is applied not to the entire map, but only to a specific bucket. This dramatically reduces the scope of the lock and allows other threads to continue working with the remaining parts of the structure.
- Tree bins — if too many elements accumulate in one bucket, the linked list automatically transforms into a balanced red-black tree. This protects the structure from performance degradation when there are many collisions and prevents operations from turning into a linear search.
- Cooperative resize — when increasing the table, the work of moving elements is distributed among threads. Several threads help move buckets into the new table, instead of stopping the entire data structure during the resize.
Thanks to the combination of these mechanisms, ConcurrentHashMap can operate efficiently at a very high degree of parallelism: dozens and hundreds of threads can perform read and write operations practically independently of each other.
This makes ConcurrentHashMap one of the best examples of evolutionary engineering in the JVM — a structure that gradually evolved from simply locking the entire map to a complex system of optimistic operations, local locks, and adaptive data structures.
Оставить комментарий
Useful Articles:
New Articles: