Go Doesn't Need a Complex Java GC

A look at the language design decisions and garbage collection strategies followed by the Go designers to avoid the need for a "billion dollar" garbage collector.

Jul 24, 2022

∙ Paid

Language design matters. Many developers are under the false impression that implementation and language designs are completely detached from each other. How often do you not hear a developer claim:

There are no fast or slow languages, only fast and slow implementations of a programming language.

That is only a half-truth. Nothing prevents developers from making an arbitrarily slow implementation of a language. Yet, language design sets a practical upper limit for how fast an implementation you can make of that language. Python is an excellent example. Due to its popularity and wide usage, enormous amounts of resources have been spent trying to optimize it. Sadly, Python remains a very slow language. The language has simply been designed in a way that makes optimizing the language for high performance challenging.

Ruby has been even harder to optimize, while for instance Lua has a really fast implementation such as LuaJIT.

Programming language design affects a lot about how you implement a language. In this story, I will contrast the design of Go with Java to explain why Go and Java have ended up with such profoundly different strategies around memory allocation. Both languages use garbage collection, but that is where the similarity ends.

In my previous story, I covered why the design of Java encourage the development of highly sophisticated and complex garbage collectors. In tis story we will instead focus on why Go gets away with a significantly simpler garbage collector and why that is a sensible choice for Go.

Erik Explores

Why Is Java Stuck With Complex GCs?

Java comes with a variety of garbage collectors with different capabilities, complexities, and performance characteristics. However, to get good performance, Java programs usually need a complex garbage collector. Yet other high-performance programming languages, such as Go and Julia, work quite well with much simpler garbage collectors. Why is that…

2 years ago · Erik Engheim

In this article we will cover a number of topics. These are some of the topics:

Future Value Types in Java — Will Java get on equal footing with Go by adding value types?

Custom allocators in Go — Unlike Java you can easily make custom memory allocators in Go.
Problem with Bump Allocators — Java uses a very quick way of allocating memory often dubbed bump allocation. We look at how Go has a more clever tactic.
Escape Analysis in Go and Java — Both languages optimize memory use with this technique yet Go gets more milage out of it. Why?
Memory Fragmentation — Looking at why modern languages no longer need compacting garbage collectors to deal with memory fragmentation.
GC Pauses in Go and Java — We explain why Go manage to get near realtime performance from its garbage collector.
The importance of low latency — Why modern software needs low latency and why Java may have been tuned too much towards high throughput.

Adding Values Types to Java is Not Enough

In my previous article I discussed Project Valhalla, an experimental OpenJDK project, which among many things aims to add value types to Java.

Will Project Valhalla put Java on equal footing with Go in terms of managing memory? Not really. It will make Java more similar to C#.

However, this does not put C# and Java on equal footing with languages like Go and C/C++ in terms of memory management flexibility. Java does not support real pointers. In Go, I can write something like this:

// Go - pointer usage

var ptr *Point = &rect.Min // Store pointer to Min in ptr
*ptr = Point(2, 4)         // replace rect.Min

You can take the address of objects or fields of objects in Go, just like in C/C++ and store it in a pointer. You can then pass this pointer around and use it to modify the fields it points to. This means you can create large value objects in Go and pass it as a pointer to function to optimize performance. With C# the situation is a bit better as it has limited support for pointers. The previous Go example could be written in C# as:

// C# - pointer usage

unsafe void foo() {
   Rect* ptr = &rect.Min;
   *ptr = new Point(2, 4);
}

However C# pointer support comes with a number of caveats which does not apply to Go:

Code using points must be marked as unsafe. This creates code with less safety and more likely to crash.
Only pure value types allocated on the stack (all struct fields must be value types).
Within a fixed scope, where garbage collection has been turned off, using the fixed keyword.

Thus the normal and safe way to use value types in C# is to copy them, as this does not require defining unsafe or fixed code regions. But for larger value types this can create performance problems. Go does not have these issues. You can make pointers to objects managed by the garbage collector in Go. You don’t need to isolate code using pointers in Go, like in C#.

Custom Allocators in Go

With proper pointers, you can do a lot of things which is not possible when you only have value types. One example is to create secondary allocators. Here is my example of an Arena allocator created using Go generics.

// Hold list of free memory blocks
type Arena[T any] struct {
    blocks Stack[*T]
}

// Find a free memory block in Arena and allocate it
func (arena *Arena[T]) Alloc() *T {
    if arena.blocks.IsEmpty() {
        var blocks [32]T     // allocate 32 elements at a time
        for i, _ := range blocks {
            arena.blocks.Push(&blocks[i])
        }
    }
    b, _ := arena.blocks.Top()
    arena.blocks.Pop()
    return b
}

Why are such custom memory allocators useful?

If you look at microbenchmarks for algorithms making binary trees, you will typically see that Java has a big advantage over Go. That is because binary tree algorithms are usually used to test speed of garbage collector in allocating objects. Java is very fast at this because it uses what we call a bump pointer. It just increments a pointer while Go will search for a suitable place in memory to allocate the object. However, with an Arena allocator, you can quickly build a binary tree in Go as well.

import "golang.org/x/exp/constraints"

// Hold key-value pairs, which can be searched in O(log N) time
type Tree[K constraints.Ordered, V any] struct {
    Root      *TreeNode[K, V]
    allocator Arena[TreeNode[K, V]]
}

// Add a new key-value pair to search tree
func (tree *Tree[K, V]) NewNode(key K, value V) *TreeNode[K, V] {
    n := tree.allocator.Alloc()
    n.Key = key
    n.Value = value
    n.left = nil
    n.right = nil    
    
    return n
}

func (tree *Tree[K, V]) Insert(key K, value V) {
    n := tree.NewNode(key, value)    
    if tree.Root == nil {
        tree.Root = n
    } else {
        tree.Root.Insert(n)
    }
}

This code example demonstrates why having real pointers are beneficial. You cannot create a pointer to an element in a contiguous block of memory without them. In the Alloc method we create a contiguous block of 32 elements. We then store a pointer to to each element in this block on a stack holding a list of blocks available for allocation.

var blocks [32]T

for i, _ := range blocks {
    arena.blocks.Push(&blocks[i])
}

Storing these pointers is only possible because I can pick an arbitrary element blocks[i] and get the address to that element &blocks[i]. Java does not give you this possibility.

Problem with Java Bump Allocators

A bump allocator used by the Java GC is similar in that just like an Arena allocator you just increment a pointer to get the next value. Except you don’t have to build it yourself. This may seem more clever. But it causes several problems avoided in Go:

Sooner or later you need to do compaction, which involves moving data around and fixing pointers. An Arena allocator does not have to do that.
In a multithreaded program, a bump allocator requires locks (unless you use thread local storage). That kills their performance advantage, either because locks degrade performance or thread local storage will cause fragmentation, which needs to be compacted later.

Ian Lance Taylor, one of the creators of Go clarifies the problem with bump allocators:

In general it’s likely to be more efficient to allocate memory using a set of per-thread caches, and at that point you’ve lost the advantages of a bump allocator. So I would assert that, in general, with many caveats, there is no real advantage to using a compacting memory allocator for a multi-threaded program today.

Keep reading with a 7-day free trial

Subscribe to Erik Explores to keep reading this post and get 7 days of free access to the full post archives.