
This is a story about Doom 3ās source code and how beautiful it is. Yes, beautiful. Allow me to explain.
After releasing my video game Dyad I took a little break. I read some books and watched some movies Iād put off for too long. I was working on the European version of Dyad, but that time was mostly waiting for feedback from Sony quality assurance, so I had a lot of free time. After loafing around for a month or so I started to seriously consider what I was going to do next. I wanted to extract the reusable/engine-y parts of Dyad for a new project.
This article originally appeared January 14, 2013.
When I originally started working on Dyad there was a very clean, pretty functional game engine I created from an accumulation of years of working on other projects. By the end of Dyad I had a hideous mess.
In the final six weeks of Dyad development I added over 13k lines of code. MainMenu.cc ballooned to 24,501 lines. The once-beautiful source code was a mess riddled with #ifdefs, gratuitous function pointers, ugly inline SIMD and asm codeāI learned a new term: ācode entropy.ā I searched the internet for other projects that I could use to learn how to organize hundreds of thousands of lines of code. After looking through several large game engines I was pretty discouraged; the Dyad source code wasnāt actually that bad compared to everything else out there!
Unsatisfied, I continued looking, and found a very nice analysis of id Softwareās Doom 3 source code by the computer expert Fabien Sanglard.
I spent a few days going through the Doom 3 source code and reading Fabienās excellent article when I tweeted:
It was the truth. Iāve never really cared about source code before. I donāt really consider myself a āprogrammer.ā Iām good at it, but for me itās just a means to an end. Going through the Doom 3 source code made me really appreciate good programmers.
To put things into perspective: Dyad has 193k lines of code, all C++. Doom 3 has 601k, Quake III has 229k and Quake II has 136k. That puts Dyad somewhere in between Quake II and Quake III. These are large projects.
When I was asked to write this article, I used it as an excuse to read more source code from other games, and to read about programming standards. After days of research I was confused by my own tweet that started this whole thing: what would ānice lookingāāor ābeautifulā, for that matterāactually mean when referring to source code? I asked some programmer friends what they thought that meant. Their answers were obvious, but still worth stating:
- Code should be locally coherent and single-functioned: One function should do exactly one thing. It should be clear about what itās doing.
- Local code should explain, or at least hint at the overall system design.
- Code should be self-documenting. Comments should be avoided whenever possible. Comments duplicate work when both writing and reading code. If you need to comment something to make it understandable it should probably be rewritten.
Thereās an idTech 4 coding standard (.doc) that I think is worth reading. I follow most of these standards and Iāll try to explain why theyāre good and why specifically they make the Doom 3 code so beautiful.
Unified Parsing and Lexical Analysis
One of the smartest things Iāve seen from Doom is the generic use of their lexical analyzer[1] and parser [2]. All resource files are ascii files with a unified syntax including: scripts, animation files, config files, etc; everything is the same. This allows all files to be read and processed by a single chunk of code. The parser is particularly robust, supporting a major subset of C++. By sticking to a unified parser and lexer all other components of the engine neednāt worry about serializing data as thereās already code for that. This makes all other aspect of the code cleaner.
Const and Rigid Parameters
Doomās code is fairly rigid, although not rigid enough in my opinion with respect to const[3]. Const serves several purposes which I believe too many programmers ignore. My rule is āeverything should always be const unless it canāt beā. I wish all variables in C++ were const by default. Doom almost always sticks to a āno in-outā parameter policy; meaning all parameters to a function are either input or output never both. This makes it much easier to understand whatās happening with a variable when you pass it to a function. For example:
This function definition this makes me happy!
Just from a few consts I know many things:
- The idPlane that gets passed as an argument will not be modified by this function. I can safely use the plane after this function executes without checking for modifications of the idPlane.
- I know the epsilon wonāt be changed within the function, (although it could easily be copied to another value and scaled for instance, but that would be counter productive)
- front, back, frontOnPlaneEdges and backOnPlaceEdges are OUT variables. These will be written to.
- the final const after the parameter list is my favourite. It indicates idSurface::Split() wonāt modify the surface itself. This is one of my favourite C++ features missing from other languages. It allows me to do something like this:void f(const idSurface &s) {
s.Split(....);
}if Split wasnāt defined as Split(...) const; this code would not compile. Now I know that whatever is called f() wonāt modify the surface, even if f() passes the surface to another function or calls some Surface::method(). Const tells me a lot about the function and also hints to a larger system design. Simply by reading this function declaration I know surfaces can be split by a plane dynamically. Instead of modifying the surface, it returns new surfaces, front and back, and optionally frontOnPlaneEdges and backOnPlaneEdges.
The const rule, and no input/output parameters is probably the single most important thing, in my eyes, that separate good code from beautiful code. It makes the whole system easier to understand and easier to edit or refactor .
Minimal Comments
This is a stylistic issue, but one beautiful thing that Doom usually does is not over-comment. Iāve seen way too much code that looks like:
I find this extremely irritating. I can tell what this method does by its name. If its function canāt be inferred from its name, its name should be changed. If it does too much to describe it in its name, make it do less. If it really canāt be refactored and renamed to describe its single purpose then itās okay to comment. I think programmers are taught in school that comments are good; they arenāt. Comments are bad unless theyāre totally necessary and theyāre rarely necessary. Doom does a reasonable job at keeping comments to a minimum. Using the idSurface::Split() example, lets look at how itās commented:
// splits the surface into a front and back surface, the surface itself stays unchanged
// frontOnPlaneEdges and backOnPlaneEdges optionally store the indexes to the edges that lay on the split plane
// returns a SIDE_?
The first line is completely unnecessary. We learned all that information from the the function definition. The second and third lines are valuable. We could infer the second lineās properties, but the comment removes potential ambiguity.
Doomās code is, for the most part, judicial with its comments, which it makes it much easier to read. I know this may be a style issue for some people, but I definitely think there is a clear ārightā way to do it. For example, what would happen if someone changed the function and removed the const at the end? Then the surface *COULD* be changed from within the function and now the comment is out of sync with the code. Extraneous comments hurt the readability and accuracy of code thus making the code uglier.
Spacing
Doom does not waste vertical space:
Hereās an example from t_stencilShadow::R_ChopWinding():
I can read that entire algorithm on 1/4 of my screen, leaving the other 3/4s to understand where that block of code fits relative to its surrounding code. Iāve seen too much code like this:
This is going to be another point that falls under āstyle.ā I programmed for more than 10 years with the latter style, forcing myself to convert to the tighter way while working on a project about six years ago. Iām glad I switched.
The latter takes 18 lines compared to 11 in the first. Thatās nearly double the number of lines of code for the *EXACT* same functionality. It means that the next chunk of code doesnāt fit on the screen for me. Whatās the next chunk?
That code makes no sense without the previous for loop chunk. If id didnāt respect vertical space, their code would be much harder to read, harder to write, harder to maintain and be less beautiful.
Another thing that id does that I believe is ārightā and not a style issue is they *ALWAYS* use { } even when optional. I think itās a crime to skip the brace brackets. Iāve seen so much code like:
That is ugly code, itās worse than than putting { } on their own line. I couldnāt find a single example in idās code where they skipped the { }. Omitting the optional { } makes parsing this while() block more time consuming than it needs to be. It also makes editing it a pain, what if I wanted to insert an if-statement branch within the else if (c > d) path?
Minimal Templates
id did a huge no-no in the C++ world. They re-wrote all required STL[4] functions. I personally have a love-hate relationship with the STL. In Dyad I used it in debug builds to manage dynamic resources. In release I baked all the resources so they could be loaded as quickly as possible and donāt use any STL functionality. The STL is nice because it provides fast generic data structures; itās bad because using it can often be ugly and error prone. For example, letās look at the std::vector<T> class. Letās say I want to iterate over each element:
That does get simplified with C++11:
I personally donāt like the use of auto, I think it makes the code easier to write but harder to read. I might come around to the usage of auto in the coming years, but for now I think itās bad. Iām not even going to mention the ridiculousness of some STL algorithms like std:for_each or std::remove_if.
Removing a value from an std::vector is dumb too:
Gee, thatās going to be typed correctly by every programmer every time!
id removes all ambiguity: they rolled their own generic containers, string class, etc. They wrote them much less generic than the STL classes, presumably to make them easier to understand. Theyāre minimally templated and use id-specific memory allocators. STL code is so littered with template nonsense that itās impossible to read.
C++ code can quickly get unruly and ugly without diligence on the part of the programmers. To see how bad things can get, check out the STL source code. Microsoftās and GCCās[5] STL implementations are probably the ugliest source code Iāve ever seen. Even when programmers take extreme care to make their template code as readable as possible itās still a complete mess. Take a look at Andrei Alexandrescuās Loki library, or the boost librariesāthese are written by some of the best C++ programmers in the world and great care was taken to make them as beautiful as possible, and theyāre still ugly and basically unreadable.
id solves this problem by simply not making things overly generic. They have a HashTable<V> and a HashIndex class. HashTable forces key type to be const char *, and HashIndex is an int->int pair. This is considered poor C++ practice. They āshouldā have had a single HashTable class, and written partial specialization for KeyType = const char *, and fully specialized <int, int>. What id does is completely correct and makes their code much more beautiful.
This can be further examined by contrasting āgood C++ practiceā for Hash generation and how id does it.
It would be considered by many to be good practice to create a specific computation class as a parameter to the HashTable like so:
this could then be specialized for a particular type:
Then you could pass the ComputeHashForType as a HashComputer for the HashTable:
This is similar to how I did it. It seems smart, but boy is it ugly! What if there were more optional template parameters? Maybe a memory allocator? Maybe a debug tracer? Youād have a definition like:
Function definitions would be brutal!
What does that even mean? I canāt even find the method name without some aggressive syntax highlighting. Itās conceivable that thereād be more definition code than body code. This is clearly not easy to read and thus not beautiful.
Iāve seen other engines manage this mess by offloading the template argument specification to a myriad of typedefs. This is even worse! It might make local code easier to understand, but it creates another layer of disconnect between local code and the overarching system logic, making the local code not hint towards system design, which is not beautiful. For example, lets say there was code:
and
and you used both and did something like:
Itās possible that the StringHashTableās memory allocator, StringAllocator, wonāt contribute to the global memory, which would cause you confusion. Youād have to backtrack through the code, find out that StringHashTable is actually a typedef of a mess of templates, parse through the template code, find out that itās using a different allocator, find that allocator... blah blah, ugly.
Doom does the complete āwrongā thing according to common C++ logic: it writes things as non-generic as possible, using generics only when it makes sense. What does Doomās HashTable do when it needs to generate a hash of something? It calls idStr::GetHash(), because the only type of key it accepts is a const char *. What would happen if it needs a different key? My guess is theyād template the key, and force just call key.getHash(), and have the compiler enforce that all key types have an int getHash() method.
Remnants of C
I donāt know how much of idās original programming team is with the company anymore, but John Carmack at least comes from a C background. All id games before Quake III were written in C. I find many C++ programmers without a strong C background over-C++ize their code. The previous template example was just one case. Three other examples that I find often are:
- over-use set/get methods
- use stringstreams
- excessive operator overloading.
id is very judicial in all these cases.
Often one may create a class:
This is a waste of lines of code and reading time. It takes longer to write it, and read it compared to:
What if youāre often increasing var by some number n?
vs
The first example is much easier to read and write.
id doesnāt use stringstreams. A stringstream contains probably the most extreme bastardization of operator overloads Iāve ever seen: <<.
For example:
Thatās ugly. It does have strong advantages: you can define the equivalent of Javaās toString() method per class w/o touching a classā vtables, but the syntax is offensive, and id chose to not use. Choosing to use printf() instead of stringstreams makes their code easier to read, and thus I think itās the correct decision.
Much nicer!
The syntax for SomeClassā operator << would be ridiculous too:
[Side note: John Carmack has stated that static analysis tools revealed that their common bug was incorrect parameter matching in printf(). I wonder if theyāve changed to stringstreams in Rage because of this. GCC and clang both find printf() parameter matching errors with -Wall, so you donāt need expensive static analysis tools to find these errors.]
Another thing that makes the Doom code beautiful is the minimal use of operator overloads. Operator overloading is a very nice feature of C++. It allows you to do things like:
Without overloading these operations would be more time consuming to write and parse. Doom stops here. Iāve seen code that doesnāt. Iāve seen code that will overload operator ā%ā to mean dot product or operator Vector * Vector to do piece-wise vector multiplication. It doesnāt make sense to make the * operator for cross product because that only exists in 3D, what if you wanted to do:
some_2d_vec * some_2d_vec, what should it do? What about 4d or higher? idās minimal operator overloading leaves no ambiguity to the reader of the code.
Horizontal Spacing
One of the biggest things I learned from the Doom code was a simple style change. I used to have classes that looked like:
According to idās Doom 3 coding standard, they use real tabs that are 4 spaces. Having a consistent tab setting for all programmers allows them horizontally align their class definitions:
They rarely put the inline functions inside the class definition. The only time Iāve seen it is when the code is written on the same line as the function declaration. It seems this practice is not the norm and is probably frowned upon. This method of organizing class definitions makes it extremely easy to parse. It might take a little more time to write, since youād have re-type a bunch of information when defining the methods:
Iām against all extra typing. I need to get stuff done as fast as possible, but this is one situation where I think a little extra typing when defining the class more than pays for itself each time the class definition needs to be parsed by a programmer. There are several other stylistic examples provided in the Doom 3 Coding Standards (.doc) that contribute to the beauty of Doomās source code.
Method Names
I think Doomās method naming rules are lacking. I personally enforce the rule that all method names should begin with a verb unless they canāt.
For example:
is much better than:
Yes, itās Beautiful.
I was really excited to write this article, because it gave me an excuse to really think about what beautiful code is. I still donāt think I know, and maybe itās entirely subjective. I do think the two biggest things, for me at least, are stylistic indenting and maximum const-ness.
A lot of the stylistic choices are definitely my personal preferences, and Iām sure other programmers will have different opinions. I think the choice of what style to use is up to whoever has to read and write the code, but I certainly think itās something worth thinking about.
I would suggest everyone look at the Doom 3 source code because I think it exemplifies beautiful code, as a complete package: from system design down to how to tab space the characters.
Shawn McGrath is a Toronto-based game developer and the creator of the acclaimed PlayStation 3 psychedelic puzzle-racing game Dyad. Find out more about his game. Follow him on Twitter.
Footnotes
[1] A lexical analyzer converts the characters of source code, (in the relevent context), into a series of tokens with semantic significance. Source code may look like:
x = y + 5;
A lexical analyzer (or ālexerā for short), might tokenize that source as such:
x => variable
= => assignment operator
y => variable
+ => additional operator
5 => literal integer
; => end statement
This string of tokens is the first of many steps in converting source code to a running program. following lexical analysis the tokens are fed into a parser, then a compiler, then a linker, and finally a virtual machine, (in the case of compiled languages a CPU). There can be intermediate steps inserted between those main steps, but the ones listed are generally considered to be the most fundamental.
[2] A parser is (usually) the next logical step following lexical analysis in machine understanding of language, (computer language/source code in this context, but the same would apply for natural language). A parserās input is a list of tokens generated by a lexical analyzer, and outputs a syntactic tree: a āparse tree.ā
In the example: x = y + 5, the parse tree would look like:
[3] āconstā is a C++ keyword that ensures that a variable cannot be changed, or that a method will not change the contents of its class. āconstā is shortform for āconstant.ā Itās worth noting that C++ includes a workaround, either via const_cast[T] or a C-style cast: (T *). Using these completely breaks const, and for the sake of argument I prefer to ignore their existence and never use them in practice.
[4]STL stands for āstandard template libraryā Itās a set of containers, algorithms, and functions commonly used by C++ programmers. Itās supported by every major compiler vendor with varying levels of optimization and error reporting facilities.
[5]GCC - GNU Compiler Collection: a set of compiler supporting multiple programming languages. For the case of this article it refers to the GNU C/C++ compiler. GCC is a free compiler, with full source code available for free and works on a wide array of computers and operating systems. Other commonly used compilers include: clang, Microsoft Visual C++, IBM XL C/C++, Intel C++ Compiler.
