Why Does Carbon Need to Fix C++ Syntax?
Isn't C++ syntax perfectly fine? Why did the Carbon guys need to change it? A look at all the problems with C++ syntax and why we need to start fresh.
Judging by the many twitter comments I have read about Carbon, there is a significant number of C++ developers who are very displeased with the syntax of the Carbon programming language. A question I have seen repeatedly asked is:
If they are making a new language for C++ developers, then why did they make it look completely different? C++ syntax is perfectly fine and well known.
Actually, no, C++ syntax is not fine at all, but it is easy to internalize that fact and become blind to it once you spend enough years in the C++ trenches. When Chandler Carruth presented Carbon, he may not have done a good enough job of really explaining why C++ syntax is so problematic. I cannot promise that I will succeed either, but I will make an attempt at digging into the details of why C++ syntax is problematic and why we would want an upgrade. Let us start with a simple comparison:
What you will notice with Carbon is that most statements in Carbon have an introducer keyword such as fn
, var
, let
and class
which says what kind of statement follows. That is a deliberate design by Carbon. Let me quote from the first presentation of Carbon:
I don't know if anyone has worked on a C++ parser, which doesn't get to be a full compiler. It is incredibly difficult. We can make that better.
— Chandler Carruth
Since I began writing C++ code back in 1998 that is something I have seen first hand. While the Java and C# communities got spoiled by amazing IDEs and tools, us C++ developers have for many years had very poorly working tools. Command completion would very regularly break, especially on larger projects. Refactoring tools, in particular, have been limited compared to what Java and C# developers have had at their disposal.
The maintainer of the C++ highlighter for VS Code, explains why making tools for C++ is so hard:
The C++ syntax highlighter, at 19,000 lines, is not only the largest of any language but nearly four times larger than the 2nd largest syntax (Typescript at 5,000 lines).
Impressive? No, because it still can't even highlight custom types in variable declarations! The Rust highlighter does that without breaking a sweat because it's very similar to Carbon.
— Jeffrey Hykin
Even Java and C# are not ideal in terms of parsing, as they borrowed much syntax from C/C++. I noticed the benefits of languages using introducer keywords first time with Go. Go introduces functions with the func
keyword. That makes searching code effortless. I can more easily distinguish between usage and definitions. Here I am looking up calls to the MassFlow
function and the definition of the MassFlow
function.
The problem with C/C++ syntax is that you cannot determine what a statement is until you have parsed several tokens. That makes simply things like searching code with a regular expression more complicated than it needs to be. It makes it a lot harder to write parsers.
The Most Vexing Parse
A concrete example of the difficulty of parsing C++ code is referred to as the "most vexing parse". This peculiar term was first coined by Scott Meyers in his 2001 book Effective STL. The following example will illustrate the problem:
// C++ most vexing parse
void foo(double x) {
int bar(int(x));
}
The second line of code is ambiguous. It could be interpreted as a function declaration written as follows:
// Function bar taking and returning an integer.
int bar(int x);
C++ allows you to put parenthesis around the x
parameter in a function declaration. Thus, a C++ parser cannot easily distinguish between declaring a function bar
and declaring a variable bar
initialized with the value x
converted to an integer.
Here is a more elaborate example:
// C++ Unnamed temporary
struct Timer {};
struct TimeKeeper {
explicit TimeKeeper(Timer t);
int getTime();
};
int main() {
TimeKeeper time_keeper(Timer());
return time_keeper.getTime();
}
The first line in the main
function is ambiguous:
TimeKeeper time_keeper(Timer());
To C++ this line could look like the definition of a function time_keeper
returning TimeKeeper
object and taking a function object as parameter. You don't need to specify the name of a parameter when declaring a C++ function. int foo(int);
is a valid function signature.
This problem cannot be reproduced in Carbon for several reasons:
Carbon classes do not have constructors
Objects are initialized with assignment
The code below is an attempt at reproducing the earlier C++ code in Carbon. I had to invent some class functions Create
and Make
to be stand-ins for the constructors used in the C++ example.
// Carbon
class Timer {
fn Create() -> Self;
};
class TimeKeeper {
fn Make(t: Timer) -> Self;
fn getTime[me: Self]() -> int;
};
fn Main() -> i32 {
let time_keeper: auto = TimeKeeper.Make(Timer.Create());
return time_keeper.get_time();
}
We can notice some things in the Carbon syntax which may not be entirely obvious. Self
refers to the type of the enclosing class. Stuff inside square brackets, such as [me: Self]
refers to anything not passed explicitly. It is stuff that is induced. In this case, it is what C++ developers would know as the this
pointer. Go has a very similar approach. The getTime
method in Go would look like this:
// Go method example
func (me TimeKeeper) getTime() int
The square brackets are also used for other implicit data, such as type parameters. One of the Carbon example codes look like this:
// Carbon code showing function parameters
fn QuickSort[T:! Comparable & Movable](s: Slice(T)) {
if (s.Size() <= 1) {
return;
}
let p: i64 = Partition(s);
QuickSort(s[:p - 1]);
QuickSort(s[p + 1:]);
}
Here we are not specifying the this
(me) object but a type parameter T
which must satisfy both the Comparable
and Movable
interface. Carbon uses a single colon :
to specify the type of object, while :!
is used to indicate that we are specifying the interfaces a type parameter must adhere to.
Parsing Function Pointers in C++
The problem of placing return type first rather than last becomes apparent when you define higher-order functions taking functions as parameters. Look at this code quickly and tell me what the name of the function pointer parameter is?
// C++
int FindFirst(int xs[], int n, bool (*condition)(int x)) {
for(int i = 0; i < n; ++i) {
if (condition(xs[i])) {
return i;
}
}
return -1;
}
The name of the function pointer parameter is conditon
. I think most people will agree that is not something you can quickly read. It shows one of the issues with the C++ syntax.
I was going to show with great fanfare how much better this would look in Carbon, but unfortunately, I cannot find any mention of function pointers in the Carbon language specification. I will instead show what this would look like in Go and speculate on how you might do it in Carbon.
// Go
func FindFirst(xs []int, condition func(int) bool) int {
for i, x := range xs {
if condition(x) {
return i
}
}
return -1
}
The reason the Go code is easier to read is that the name of the arguments come first and the type second. It is consistent. With Carbon, I think they could make this even clearer because they use the colon to separate type and parameter name:
// Carbon - Assumption (similar to Rust)
fn FindFirst(xs: Slice(int), condition: fn(int) -> bool) -> i32
Actually, this code is almost exactly how you would write the FindFirst
function signature in Rust.
Getting Rid of Const References Mess
One of the most annoying things when writing C++ code is dealing with const correctness and references. Consider the geometry example code below.
// C++ class
class Circle {
public:
const Point& center() const;
float radius() const;
void setCenter(const Point& pos);
void setRadius(float radius);
bool inside(const Point& p) const;
bool intersect(const Circle& c) const;
private:
Point center;
float radius;
};
Occasionally, you want to pass an argument, such as the center of a circle, as a const reference, const Point& pos
, but other times you don't. I pass the radius as a copy of a float
. That is more efficient at the machine code level, where a single floating-point value can be passed through a microprocessor register. A reference in contrast is implemented as a pointer, which means getting to the actual desired value cause an extra indirection.
Carbon does away entirely with this problem by leaving it to the compiler to figure out what is best. In Carbon, we implement the previous class as:
// Carbon class
class Circle {
fn center[me: Self]() -> Point;
fn radius[me: Self]() -> f32;
fn setCenter[addr me: Self*](pos: Point);
fn setRadius[addr me: Self*](radius: f32);
fn inside[me: Self](p: Point);
fn intersect[me: Self](c: Circle);
var center: Point;
var radius: float;
}
You will notice that setCenter
and setRadius
look similar. Carbon will figure out the most optimal way to pass the arguments pos
and radius
to their respective functions. No need to worry about references and const. In Carbon, parameters are of the let
type by default, so they are essentially like const.
Every so often, you need to actually be able to modify a value in the caller. Carbon does not have references, so you use pointers instead. Self*
means that the me
argument is a pointer, which allows us to modify me
. One issue with the usage of pointers rather than references is that you have to take the address of an object.
// Carbon - Calling setRadius if it was defined as
// fn setRadius[me: Self*](radius: f32)
var circle: Circle = MakeRandomCircle();
var ptr: Circle* = &circle;
ptr.setRadius(5);
Taking the address every time to be able to mutate a circle object would be cumbersome. For that reason, we use add the addr
keyword in front of me
which instructs Carbon to take the address for us. That is why we can call setRadius
without taking the address:
// Carbon
var circle: Circle = MakeRandomCircle();
circle.setRadius(5);
Designing for Code Readability
Most developers never study or care about usability and user interface design. They should because the principles that apply to how you lay out a user interface has much in common with writing clear and easy to read code.
One important detail to be aware of is how humans read text. The only time you read individual letters in a word are when you are a child learning to read and write. Grownups read words by shape. We look at the shapes of words to identify them.
This fact implies that to quickly identify distinct choices, the shape of words and sentences should be different. Let us take an example to illustrate the issue. Consider a web page with a list of options:
I want to customize tools...
I want to have custom shows...
I want to do custom animations...
This is a bad design because the shape of the sentences are too similar, making visual scanning of options harder.
Non-subscribers can read the rest of the article on Medium.
Keep reading with a 7-day free trial
Subscribe to Erik Explores to keep reading this post and get 7 days of free access to the full post archives.