The Zig and Go Programming Showdown!

What building a toy assembler in Zig and Go taught me about each language

Oct 28, 2022

∙ Paid

The title of this story was supposed to be "Why I have given up on Zig," but instead Zig ended up impressing me more than I expected, while Go disappointed me. That experience illustrates well my relation with Zig over the last two years: I have a real problem deciding what I think about Zig. At times, it frustrates me, and I am ready to give up. Other times I think, "This stuff is really neat and clever!"

Before we dig in, you might ask: Why am I comparing Zig and Go at all? Aren't these two languages too different to compare? Are they competing in the same space at all? Indeed, they are different, and they don't completely overlap. But for my interests, I see a clear overlap. Allow me to clarify. My preferred language is really Julia, but being designed for JIT compilation makes it non-ideal for creating binaries for small command line utilities. For instance, I want my simple Calcutron-33 toy assembler to be easy to download and install for anybody. Anything requiring the installation of a virtual environment, runtime system or JIT compiler is a non-go.

Another interest I have is programming game engines and microcontrollers. I have a cool marble track built using Fischertechnik controlled by an Arduino microcontrollerwhich I have programmed in C/C++ with the standard Arduino IDE. Every time I use C/C++ I find that the language gets in my way. I want something better.

Arduino controlled Fischertechnik marble track

Using Julia isn't an option, as it isn't really designed for small systems. Go may not seem like a good choice either since it uses garbage collection. Yet, Go is a very different language from other GC languages such as C# and Java. Because it lets you control memory layout and use real pointers, you can limit your usage of garbage collection. That combination of features made it possible to make TinyGo:

TinyGo brings the Go programming language to embedded systems and to the modern web by creating a new compiler based on LLVM.
You can compile and run TinyGo programs on over 85 different microcontroller boards such as the BBC micro:bit and the Arduino Uno

Thus, since I played with Zig two years ago, I had pretty much decided to ditch Zig and focus on Go. Why not, when Go can cover most systems programming of interest to me? Yet, I kept seeing that you, my readers, seemed more interested in reading my Zig stories than my Go stories. Hence, this month, I decided to take Zig for another spin.

The best way to get a feel for a language is by implementing some non-trivial program. My choice was to finish my toy assembler Zacktron-33 implemented in about 1500 lines of Zig code. The corresponding Go version Calcutron-33 is about 1000 lines of Go code. Keep in mind that the Go version isn't as complete yet (27 October).

Initial Surprises

At first glance, Zig comes across as a much lower level programming language than Go. Here are som of the things which Zig lacks but which is present in Go:

Garbage collector
String type
Abstract interfaces
Builtin support for concurrency (Goroutines e.g.)
Closures
Exceptions (panic in Go)
Runtime reflection

Instead of a string type, Zig uses a primitive array of bytes, just like C. Yet, when working at the function level, the roles are swapped: Go feels like the low-level language and Zig feels like the high-level language. I believe that the reasons are that Zig is more sophisticated than Go in other important areas:

Compile-time evaluation of code, so called comptime
Enums and union types
Tagged unions (sum type)
Optional types (you must explicitly allow a variable or argument to take a nullvalue)

The end result is that Go excels in what I would call programming "in the large" while Zig excels at programming "in the small." What do I mean by that? All the stuff you would sketch out on a whiteboard such as classes, modules, and relationships between them is the level where Go has the edge. As soon as you drill down to the level of individual functions, Zig starts to shine.

I implemented my assembler in Go after my Zig solution. While writing the Go code I frequently found myself surprised by how clunky it was to handle errors and null values in Go compared to Zig. This kind of stuff is dealt with constantly while doing regular programming. Let me clarify how by walking through one single example of parsing the symbols found in assembly source code files.

Creating a Symbol Table

Most programming languages have to deal with symbols. The Calcutron-33 assembly language I invented is no exception. Let me show an example program in Calcutron-33 assembly. You will notice the labels loop and multiply which represent particular locations in program memory.

loop:
    INP x1        // read input into register x1
    INP x2        // store next input value in register x2
    CLR x3        // clear register x3
    
multiply:
    ADD x3, x1    // add register x1 to register x3
    DEC x2        // decrement register x2
    BGT x2, multiply
    OUT x3        // write register x3 to output
    
    BRA loop      // non-conditional jump to label loop

When parsing this assembly code, I have to keep track of what memory addresses these labels correspond to so that any instruction referencing these symbols will substitute a memory address.

We will explore the Go version first. I am using the Go Scanner type to scan one line at a time in the file. For each line, I use IndexRune to determine if there is a colon on that line. The end of each label is marked by a colon. IndexRune returns the index of the character you search for, or -1 if it cannot be found. Hence, the following code contains an if-statement checking if i >= 0 before recording the address of the symbol in a dictionary (map in Go).

func readSymTable(reader io.Reader) map[string]uint8 {
	scanner := bufio.NewScanner(reader)
	labels := make(map[string]uint8)
	address := 0
	for scanner.Scan() {
		line := strings.Trim(scanner.Text(), " \t")
		n := len(line)

		if n == 0 {
			continue
		}

		if i := strings.IndexRune(line, ':'); i >= 0 {
			labels[line[0:i]] = uint8(address)

			// is there anything beyond the label?
			if n == i+1 {
				continue
			}
		}
		address++
	}
	return labels
}

To couple of challenges stand out with this implementation:

scanner.Scan() could fail, but I am not dealing with that in my code. Go didn't force me or remind me to do that in any way.
It is easy to forget to check if the index i > 0. Nothing forces you to make that check. You simply have to remember to do it.

The Zig solution avoids the mentioned problems. It uses the readUntilDelimiterOrEoffunction instead of Go's Scan function to fetch individual lines. A key difference is that the former force you to handle errors while the latter doesn't. If an error happens during reading, an enum-like value of type error is returned from the readSymTablefunction. I will explain how this mechanism works in more detail shortly.

Secondly, in Zig we use the mem.indexOf function instead of strings.IndexRune to locate the index of the colon : character. In the following Zig code there is no need to check i > 0. Instead, a null is returned if the colon is absent. How is that an improvement?

fn readSymTable(allocator: Allocator, 
                   reader: anytype) !StringHashMap(u8) {

    var labels = StringHashMap(u8).init(allocator);
    errdefer labels.deinit();
    var address: u8 = 0;

    var buffer: [500]u8 = undefined;
    while (try reader.readUntilDelimiterOrEof(buffer[0..], '\n')) |tmp_line| {
        const line = mem.trim(u8, tmp_line, " \t");
        const n = line.len;

        if (n == 0) continue;

        if (mem.indexOf(u8, line, ":")) |i| {
            const label = try allocator.dupe(u8, line[0..i]);
            try labels.put(label, address);

            // is there anything beyond the label?
            if (n == i + 1) continue;
        }
        address += 1;
    }

    return labels;
}

Zig does not allow you to do anything with values which might contain a null until you explicitly unwrap that value.

Understanding indexOf

Looking at the function signature for indexOf below we can see that it returns a value of type ?usize. The added question mark indicates that the value could contain a null and must thus be handled in a special way. It is what we call an optional or nullable value.

fn indexOf(comptime T: type, haystack: []const T, needle: []const T) ?usize

usize is essentially an unsigned integer large enough represent any index into an array. By adding a the question mark ? we make the type optional. Had we not done that, then returning null would have produced a compile error. By making the return type optional (also called nullable) we are allowed to return null.

Because optional values are explicitly marked in the type system, the compiler can catch any code trying to use this returned value directly without unwrapping it first.

You cannot use an optional integer directly in any calculation or operation until it is unwrapped. In Zig, you can unwrap an optional value, maybe using an if-statement with bars.

if (maybe) |v| {
    // do stuff with unwrapped v value
}

In Zig values captured from unwrapping optionals, errors and many other things are captured with a variable enclosed within vertical bars, such as |i|. In Zig, it would be impossible to write the following code because the i would be of type ?usize which cannot be used as an index to specify a slice of the line string.

// This Zig code will not compile
const i = mem.indexOf(u8, line, ":")
const label = try allocator.dupe(u8, line[0..i]);

The benefit of the Zig approach is that "forgetting" to check if i is null simply isn't possible.

Understanding readUntilDelimiterOrEof

To explain how Zig can read lines of text, we need to get into how Zig deals with errors. You can define essentially an enum of error codes you want to use with a function.

const FileOpenError = error {
    AccessDenied,
    OutOfMemory,
    FileNotFound,
};

fn open(filename: []const u8) FileOpenError!File {
   // implementation of a file open function
}

The use of an exclamation mark ! creates a union type. The returned values from open can be either a FileOpenError enum value or a File object. When calling this function, you would write something like the following:

var file = open("foobar.text") catch |err| {
    // do some stuff
	return err 
}

If there is no error, the file variable is set to a valid file object, and we carry on executing the next few lines of code. However, if there is an error, the catch clause is called and the error object is captured in the err variable. You might want to try to handle the error here, but often it is better to simple return early from the function with the error value. Because that choice is so common, Zig has a shorthand for writing the catch and return code, through the use of the try keyword:

// will cause a return from enclosing function if open fails
var file = try open("foobar.text")

The Zig compiler will force you to handle error cases. You must either catch an error or forward it by returning it from the enclosing function using try. Now you can make more sense of our readSymTable function.

fn readSymTable(allocator: Allocator, 
                   reader: anytype) !StringHashMap(u8) {

It returns a value of type !StringHashMap(u8). We haven't specified the error enum used, which means the function accepts that any error value is returned. If no error occurs, our readSymTable returns a dictionary (hash map) with string keys and 8-bit unsigned integers as values. If we didn't allow the function to return an error object, then putting try in front of function call would fail. Function calls which potentially return an error value that is. It is a syntax error to put try in front of a call to a function which never returns error values.

Observe the while-loop in our readSymTable. Notice how the call to readUntilDelimiterOrEof is prefixed with try to allow a return from readSymTable in case an error occurs while reading the file. However, even if there is no error, there might not be any more lines to read, which is why we need |tmp_line| to unwrap the returned optional value.

while (try reader.readUntilDelimiterOrEof(buffer[0..], 
                                          '\n')) |tmp_line| {
    // process each line
}

I walked you through the readSymTable code example in detail to show how Zig can handle many error conditions which are simply easily ignored when writing Go code. In my actual implementation of Calcutron-33 in Zig (Zacktron-33) I experienced this benefit repeatedly for almost any function besides the most trivial ones.

The downside of this more advanced type system is complexity. You could see that I had to cover a lot to explain this relatively simple function. It means the readUntilDelimiterOrEof function get the complex return type !?[]u8. We have the [] to signal it is a slice. The ? to say the type is optional and the ! to indicate that an error might be returned instead.

fn readUntilDelimiterOrEof(self: Self, 
                            buf: []u8, 
                      delimiter: u8) !?[]u8

Correctness vs. Simplicity

Now, you may get why I have a hard time deciding what approach I like. In Go, it gets tedious to write code checking if the returned error object is nil or not. It is also a check you may not always remember to do. For instance in the following code forgetting to store the error object in err would not produce a compile error.

err := instruction.ParseOperands(labels, operands)
if err != nil {
    return 0, err
}

Dealing with errors in Zig is a lot smoother in my experience from working on Zacktron-33 and various other toy projects. But that smoothness and stronger correctness protections comes at a cost. As you saw in my readSymTable walkthrough, there are more concepts to keep straight in your head. Dealing with error objects and optionals at the same time as in the case with readUntilDelimiterOrEof made my head spin initially.

Also, in practice Go is not as bad as it may seem. In the typical Go case, the error object is one of two values, such as:

machincode, err := instruction.MachineInstruction()
if err != nil {
    return 0, err
}

In these cases you cannot pull out the value you want unless you also pull out the error object. And in Go, if you don't do anything with the returned err object that causes a compilation error. Thus, just like Zig forces you to unwrap values before you can use them, Go will in practice force you to deal with error objects as well in most cases. Problems arise in Go when the only object returned is an error object. In those cases, it is easy to forget to fetch and check the error object.

Nor is Zig error handling perfect. You can only return enum values, while Go lets you return a more complex error object, which could contain a variety of useful information. With Zig, you need to provide a second channel to supply attentional details about an error that occurred.

Manual Memory Allocation vs. Garbage Collection

Garbage collection is of course really neat to have, however for systems programming it is also a bit of a hassle. You don't want a garbage collector when implementing an operating system kernel or write a program for a microcontroller. Game engine developers would also likely prefer to manage memory allocation manually to avoid any sudden frame rate drops due to a garbage collection cycle kicking in. That applies to pretty much any realtime system.

Go, however, does a lot better with realtime system than you would expect from a language with a garbage collector. Go has been specifically designed to not produce large quantities of garbage and the language designers have always prioritized low latency.

To add a little relevant anecdote: I have had talks with guys at NASA who told me enthusiastically that they used Go for many realtime systems in house (not on their spacecraft, rovers, or anything like that).

For memory handling I think it is hard to call a winner because Zig makes manual memory management a lot sweeter than I am used to. Below is an example of how we allocate the dictionary to store symbols and their corresponding address. Notice the errdefer statement?

var labels = StringHashMap(u8).init(allocator);
errdefer labels.deinit();

Under normal circumstances, we would want to return the labels dictionary and not return it. However, if an error occurs we would not have managed to initialize the dictionary properly. In these cases, we want to make sure the dictionary is released to avoid leaking memory. The errdefer statement is only executed in cases where an error object is returned. Thus, the labels.deinit() statement is not executed in case a normal return happens.

Getting that kind of behavior correct in a C program would be much tricker. The designer of Zig, Andrew Kelley, has really put a lot of thought into getting manual memory management to work as smoothly as possible. All memory allocation has to happen with an allocator you supply. That applies to the whole Zig standard library. Thus while developing a Zig program you use a special allocator which detects memory leaks, double free and many other memory problems. Here is an example from my Zig assembler.

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

	// get command line arguments
    const args = try process.argsAlloc(allocator);
    defer process.argsFree(allocator, args);

    var filename: []const u8 = undefined;

    if (args.len == 2) {
        filename = args[1];
    } else {
        try stderr.print("Usage: assembler filename\n", .{});
        try stderr.print("\nAssembly of {s}:\n", .{filename});
        return
    }

    try assembleFile(allocator, filename, stdout);
}

You will notice that anything that needs memory will ask for an allocator as first argument. We use defer to deallocate allocated memory once we exit the function scope.

As an old school C developer, I actually quite like this total control over memory usage. It makes you think twice about using solutions which require allocating memory. It directly influenced my design decisions when writing Zig code in a way that led to lower memory usage.

One downside is that things like closures and functional programming with map, filter and reduce becomes quite difficult. On the other hand, that isn't your main concern in a systems programming language.

Thus, manual memory management, error handling and optionals have been mostly positive experiences, contributing to me liking Zig. Thus, the question is what exactly got me close to giving up?

What Made Me Nearly Give Up on Zig

Keep reading with a 7-day free trial

Subscribe to Erik Explores to keep reading this post and get 7 days of free access to the full post archives.