Coder Spends 1,200 Hours Piecing Together Diablo's Source Code

Illustration for article titled Coder Spends 1,200 Hours Piecing Together Diablo's Source Code

If not for a few oversights on Blizzard’s part, the original Diablo’s source code would likely be lost to time. However, thanks to those oversights and some serious dedication from one coder, people can now see (and tinker around with) what makes Diablo tick.


Diablo, of course, basically birthed the PC action-RPG genre as we know it today. Released at the tail end of 1996, it’s a classic in every sense of the word. People are, as you might expect, very interested in seeing how such a formative game was built. Source code—the lines of code written by programmers, which you don’t see in the final version—is a missing piece in that puzzle. Most developers do not release their games’ source code, and Blizzard has a reputation for being especially secretive about its games’ innermost workings.

Galaxyhaxz’s “Devilution” project is a labor of love for game preservation and modding. They say it took them more than 1,200 hours over the course of four months, and they made sure that even bugs, flaws, and sloppy code were present and accounted for. Galaxyhaxz wants people to be able to see how Diablo ended up the way it did.

“Devilution helps document the unused and cut content from the final game,” they wrote in the source code’s documentation. “Development of Diablo was rushed near the end—many ideas were scrapped and multiplayer was quickly hacked in. By examining the source, we can see various quirks of planned development.”

They also noted that having access to source code means the game can continually be updated to run on newer hardware—something that’s become more and more of an issue over time.

In rare instances, a game’s developer will release the source code publicly, such as id Software with the original Doom. But Blizzard doesn’t do this. In fact, last year when a fan came across a disc containing Starcraft’s source code, Blizzard went to great lengths to get it back from him, including giving the fan a free trip to BlizzCon.

Galaxyhaxz has reconstructed what they believe to be Diablo’s original source code from bits and pieces that have inadvertently leaked out of Blizzard in the past. They cited a 1997 expansion to the game called Hellfire and made by Synergistic Software, as well as the PlayStation port of the game by Climax Studios.


“A symbolic file was accidentally left on the Japanese port, which contained a layout of everything in the game,” said Galaxyhaxz. “This includes functions, data, types, and more! A beta version of the port also leaked, which contained yet another one of these files.”

Galaxyhaxz further explained that the regular PC release of Diablo actually contains a hidden debug build, which gave them access to debug tools and more code. They then stitched all these strings of information together to recreate Diablo’s source code.


“Combining these aspects not only makes reversing the game much easier, but it makes it far more accurate,” they said. “File names, function names, and even line numbers will be fairly close to the real deal.”

This should make it easier for people to modify the game if they so choose, whether that means improving on flaws that have been annoying players for decades or overhauling the whole thing.


Now, a potential stumbling block: Galaxyhaxz says they’re not sure if Blizzard’s lawyers will come a-knocking on their door or not. They believe that Devilution’s documentation is sufficient to count as an exception to DMCA rules, but it’s still a “grey area.” I reached out to Blizzard for an answer to that looming question, but as of publishing, they had yet to give me a concrete answer.

Assuming it all goes well, though, it’s hard not to wonder what’s next for someone dedicated enough to make all of this happen. For the moment, Galaxyhaxz only knows one thing for certain: it’s not gonna be Diablo II.


Diablo II is still supported, sold, and maintained by Blizzard,” they said. “Setting the legal implications aside, there’s about 8x as much code, and a chance Blizzard will remaster the game soon anyway.”

Kotaku senior reporter. Beats: Twitch, streaming, PC gaming. Writing a book about streamers tentatively titled "STREAMERS" to be published by Atria/Simon & Schuster in the future.



So people here might be wondering how this was actually done.

Computer programs are normally published as executable code - the stuff a processor actually executes, essentially just a series of numbers. For instance, on Intel/AMD processors, a byte of “3" means “add the contents of the register or memory address specified by the next byte(s) to the register or memory address specified by the byte(s) after that, and stick the result into that first register or address”. This is basically unreadable by humans. Nobody’s written code that way in at least fifty years.

We can use a relatively simple program, called a disassembler, to start turning that executable code into something a bit more understandable. This gives assembly code - essentially just turning those arbitrary numbers into text. That “3" might turn into “add eax, ebx”. This still doesn’t tell us what that addition is trying to accomplish - is it figuring out how much HP you have after leveling up? is it moving a character around? is it a loop counter in a draw routine? In the very old programs which were directly written in assembly, there would at least be comments describing the purpose, but that’s removed in the process of turning assembly code into executable code. And nobody’s written serious programs directly in assembly code in probably twenty years, save for really, *really* low-level operating system stuff.

The next level up from assembly language is a high-level language like C++, which is what Diablo used. This is halfway between a human language and math - code tends to look like “player.maxHP = 100 + player.level * player.class.hpPerLevel;”, at least when humans write it.

Executable code has no use for those labels anymore. After all, the computer doesn’t need to know that the programmer called it “maxHP”, all the processor needs to know is what memory address it’s in, and that’s just a big number. While there are programs to try to turn assembly back into high-level code, it normally can’t even guess at names. So you wouldn’t get “maxHP”, you’d get “l164926" or something.

However, there are cases where those names *will* get crammed into the executable code. Specifically, things called debugging symbols. This is used by debuggers, tools used by programmers to figure out why their program isn’t doing what they want. At their simplest, they let you pause the program, see what line of code it’s currently on, see what all the variables are set to, and then step through it line-by-line until you realize how you messed up. And to do this, they clearly need some sort of way to map what the processor knows (“I am about to execute the instruction starting at address 0xA842EF70") to what the programmer knows (“I think there’s a problem in the function renderPlayerShadow()“). So, in a debug build of the program, that data gets shoved into a section that’s never directly executed, but which the debugger knows to look at.

That’s wasted space in a final, retail build, and it actually can slow the program down a tiny bit, so normally the executable that gets sent out has that debugging info removed. But, apparently, there’s a few copies of Diablo.exe that didn’t have those debugging symbols stripped out - some from a Playstation port, and a partial debug build accidentally packed into a data file on the PC version.

By piecing things together, we now have a somewhat-readable equivalent to the original code. It is not actually the original code - the original comments (notes left by the programmers to explain what “AutoMapPosBits = (AutoMapScale << 6) / 100;” actually does) are completely gone, and not everything had a debugging symbol. If you look, you’ll still see plenty of variables with names like “int v25", not very useful.

But it’s close enough. Once you compile it back into executable code, it will work the same, and enough is labeled and named sensibly that a good programmer will be able to figure the rest out, and build on it. I expect there will be ongoing work to clean up the remaining cruft.