How To Balance An RPG

Editor's note: The following is a guest editorial by Obsidian game designer Josh Sawyer. If you're a professional in the video game industry and you'd like to write about some of your experiences, contact jason@kotaku.com.

Why is game balance important in a single-player game? It's a question many players often ask rhetorically, but there are many important reasons why balance should be a strong focus, even in RPGs that focus on single-player experiences. Balance isn't necessarily about seeing what character builds are more powerful when put head to head, but about understanding the different types of challenges those characters will face when going through the game.

Ideally, each type of character build has its own strengths and weaknesses throughout the game's content, but ultimately ALL character builds should feel viable in different ways. No player wants to spend 40 hours working toward a dead-end build. Similarly, few players want to accidentally discover that their fundamental character concept is an unspoken "easy mode" through the game.

RPGs, especially the RPGs we make at Obsidian, are about choice and consequence. That doesn't just apply to the narrative elements, but also gameplay: character creation, character building, and tactical application of skills and abilities in the wild. If we do our jobs well, players will feel the sting of character weaknesses and the satisfaction of character strengths over the course of the game. Challenge is a tricky thing to balance for a wide range of players, but ideally it builds by giving players short periods of stress and mild frustration caused by a mental obstacle. Players examine the obstacle, consider their options, make choices, and eventually overcome it, transforming stress into a sense of exhilaration at their own ingenuity.

But where does this process all start? For me, it begins with a common question I have with anything involving player choice.

What Sort of Decisions Do We Want the Player to Make?

By this I mean not only the choices players must make at an obvious level—Strength vs. Charisma, fighter vs. rogue, sword vs. axe—but also, the criteria that drive those decisions. These criteria could be as broad as deciding between a character class that does a lot of damage in combat vs. a class that is great at navigating conversations. Or, they could be as narrow as emphasizing attack speed over damage done on a Critical Hit.

There are two levels at which players generally make these sorts of decisions. The first is aesthetic and conceptual: "Wizards are cool." "Clubs are boring." "Being strong owns."

The second is mechanical/rational: "High damage is important." "Gotta have a healer." "Debuff effects can make a huge difference in fights."

Different players balance these desires differently, but ideally an aesthetic choice will always map to a viable build, and a viable build will map to something players will find cool for their character. When this doesn't happen, it can result in a lot of annoyance from players. They are either forced to play something they conceptually like that is mechanically bad or they have to veer away from their character concept to be mechanically viable. In an RPG, this is undesirable — so say I, at least. That's why this initial stage should only end after you've soberly asked yourself important questions about why players would want to pick any given option you're presenting them.

Take Out The Trash

"Trash" or "trap" options are a time-honored tradition in RPGs, both tabletop and computer. Trash options are choices that are intentionally designed to be bad, or that don't get enough attention during development and testing to actually be viable in the game.

It is now 2014 and, friends, I am here to tell you that trash options are bullshit.

In a computer RPG, any trash option that goes from designer's brain to the shipped product has probably gone through a few dozen cycles of implementation, testing, and revision. In the end, the trash option is the proverbial polished turd. Any seasoned RPG veteran that looks at it in detail realizes it's terrible and avoids it. Those who don't look closely or who aren't system masters may wind up picking it for their character under the mistaken impression that it's a viable choice. In any case, it's a bad option that the team spent a bunch of time implementing either for misguided schadenfreude or simple lack of attention.


It is now 2014 and, friends, I am here to tell you that trash options are bullshit.


While big RPGs always let a few of these trash options slip through unintentionally, the best way to avoid the problem on a large scale is simply to ask why well-informed players, acting with eyes wide open, would want to pick any given option over a different option in the first place. There should be a good conceptual/aesthetic reason as well as a good mechanical reason. If one of those falls short, keep hammering away until you feel you've justified their existence. Sometimes, it's not possible. In those cases, at least you've had the good fortune to realize you're stuck with trash early in development — whether it doesn't fit aesthetically or doesn't work mechanically — and can justly dump it before more effort goes into it.

As an example from Pillars of Eternity, we have maces and padded armor, two things that generally get short shrift in a lot of RPGs. In most RPGs, maces are slow and do poor damage with few elements in the "+" column. In Pillars of Eternity, they don't do any less damage than other one-handed weapons and they have the advantage of negating a portion of the armor on the target. Swords can do a variety of damage types, spears are inherently accurate, and battle axes do high Crit damage, but maces are a viable mechanical choice among their peers.

Padded armor suffers even worse in most RPGs: in many games, there are literally no worse options than padded. The suits are often aesthetically ugly and mechanically awful—the quintessence of a pure RPG trash option—and if players are forced to wear padded armor at the game's opening, they'll gladly ditch it as soon as anything else becomes available. In Pillars of Eternity, padded armor actually offers reasonably good protection. It can easily be argued that our padded armor is more protective than is realistic, but the first goal is not verisimilitude, but justifying the player's interest.

And, while heavier armor absorbs more damage, the heavier a suit of armor is in Pillars of Eternity, the longer it takes a character to recover from making an attack or casting a spell. A character in mail armor can absorb more damage than a character in padded, but the character in padded armor will perform more actions over a given period of time.

This fundamental tradeoff is both easy to grasp ("take less damage vs. do things faster") and has universal implications for all characters. All characters perform actions, and performing actions more quickly is always better. All characters also need to be protected from damage. A tradeoff like damage reduction vs. movement speed would have dramatically different implications for a melee-oriented barbarian than a long-range wizard.

We also intentionally avoided the classic RPG armor tradeoff of damage avoidance (i.e. dodging) vs. straight damage reduction. While it's easy to grasp conceptually, it's mechanically uninteresting and unengaging unless you get into spreadsheet-level minutiae of how the damage reduction curves play out over time. Spreadsheet gaming can be enjoyable on its own, but there should be a more obvious tradeoff that the player can directly observe in-game for the choice to feel meaningful.

Paper Theory

Sooner or later, the practical aspects of design demand some sort of mathematical framework to understand the scope and range of values in a system as well as how they interact. With the goals we've established, we start roughing in formulae and numbers, often in a spreadsheet that allows us to perform a variety of mathematical operations on the numbers for purposes of direct comparison and progression scaling. This process helps us understand the facts of how our system will work. We can see patterns start to emerge — sometimes good, but often bad — and revise them before we start implementing anything.

There is a dangerous tendency to over-design at this stage. We really just want to understand that the fundamentals of our systems are robust before we move to implementation. Even the most well-designed spreadsheet fails to capture the reality of how systems will function within the context of a game environment. It's important to remain flexible with values moving forward.


Even the most well-designed spreadsheet fails to capture the reality of how systems will function within the context of a game environment.


On Fallout: New Vegas and Pillars of Eternity, we used Excel spreadsheets to experiment with different damage and armor models. Even after implementation, we retained the spreadsheets for future experimentation and adjustments.

WARNING: As we develop systems, we are careful to observe how many inputs feed into our various formulae, or specific values that constitute an element of a formula. RPGs have a tendency to develop formulae with an excess of inputs, which makes later tuning extremely difficult. To curtail this, we try to keep our initial formulae as straightforward as possible. Not only does it make the system easier to tune, it also generally makes it easier for players to understand and harder to min-max into irrelevance.

Broad Strokes Implementation

After paper design is done and engineering has done an evaluation on the cost to implement everything on the list, we can start the process of implementation. At this stage, all tuning is done in broad, relative strokes. Differences are over-pronounced to ensure that they are strongly emphasized and easy to understand. If something is supposed to be fast, we make it really fast. If something is supposed to do a lot of damage, we make sure the values are significantly higher than the competition. It's easier to pull back when you've gone too far than it is to creep toward a goal in small increments. Only a small amount of balance tuning is done at this stage; we just want to make sure that the basic design of a class, weapon, spell, or creature is working as designed.

Quality Assurance

To determine if things are working as designed, we use people whose entire job revolves around ensuring that things are, in fact, working as designed. Quality Assurance (QA) runs thorough testing plans to make sure everything is ostensibly working as it is supposed to. Of course, just because something is working as it is supposed to doesn't mean it's fun to use or well-balanced with other options, but it's pointless to make any serious balance efforts if QA hasn't signed off on the core functionality of the content we've designed.

Field Testing

The first real balancing efforts happen after base functionality is established and everything can be seen in the proper interconnected context, i.e. the game. It's all well and good to test ideas out in a " grey box" test level, but proper balance won't start happening until all of the game's various systems are working together within the framework of actual game levels.

In an RPG, many individual systems work together to create scenarios — especially combat scenarios. Character level; class (if a class-based game); skills and skill levels; health, accuracy, defense, and damage reduction progression; equipped and consumable item progression; animation and hit frame timing; movement speed; access to additional party members and their abilities — and that doesn't even include the enemies' side of the equation and everything that goes into setting up their scenarios.

In the course of field testing, devs and QA start to see patterns and outliers emerge. The patterns typically deal with larger systemic issues and the outliers (the "always and nevers") are individual problems. If player damage output generally cannot outpace an increase in enemy health, there's a systemic problem. If a single weapon type continually fails to penetrate enemy armor across a broad spectrum of levels, we're dealing with an outlier.

System tuning

System-wide adjustments can affect the feeling of the game on a large scale. If we change attack speeds, the entire pace of combat and movement through environments will change with them. If we change how health progresses with character level, the middle and end portions of the game can shift dramatically in difficulty. If we adjust the frequency of item drops for certain types of ammunition, we can either starve the player's supply or create such a surplus that the drops become a cash crop.

For Pillars, because we were (hopefully) careful about limiting inputs into the system, we have a finite number of dials to adjust in any given subsystem. We always try to start with the biggest dial, i.e. the formulae or values that will have the most far-reaching effects. If that gets us in the right ballpark, we can use the smaller dials for fine-tuning.

Content Outliers - Always and Nevers

As we continue to play through the game, QA and developers will start to detect choices that are outliers. These choices will either always get picked or never get picked. Both present problems that should be corrected.

An option that always gets chosen has some clear and obvious advantage that the other options don't. In essence, the always-picked options are so good that they are making alternatives dramatically less appealing. If the game is balanced around the always-picked option, the game is imbalanced for anything but that option.


If the game is balanced around the always-picked option, the game is imbalanced for anything but that option.


A never-picked option is inherently flawed in some way: no one wants to use it due to its mechanical or aesthetic flaws (almost always mechanical — power-gamers have a tendency to overlook aesthetic problems if something's really strong) so it winds up being a trash option. Despite our best intentions and efforts, sometimes it happens.

With outliers, we first ensure that the choice is accomplishing the overall goal that it's supposed to — the identifying characteristic that makes a player decide to pick it in the first place. So if padded armor is supposed to be good because the character is faster while wearing it, is the difference in speed pronounced enough that it seems worthwhile? If it isn't, or if it's too pronounced, that's the first thing we tune. After that's done, we try it out to see if that single change removes it from outlier status.

Note that we typically tune one thing at a time. If we tune two things at once, we may not correctly identify what was causing the problem and might even introduce new issues.

If we haven't yet fixed the problem, we next look at the most egregiously offensive aspect of the thing that holds players back. Again, using padded armor as an example, maybe its Damage Threshold is just too low compared to all of the other options. Maybe it's certainly fast, but it offers as much protection as a wet paper sack. So we bring the Damage Threshold up, test it, and continue the process.

In the end, we not only want to make the option viable, we want to maintain its concept and spirit. We could take a sledgehammer and tune up its attack speed so it attacks incredibly quickly with low-damage hits. That might make it viable, but we've strayed significantly from what most people would expect a sledgehammer to be good at: slow, heavy hits.

N.B.: Tuning before launch can happen in a slightly different way than tuning post-launch. Before we launch, we can easily cut the heads off of the proverbial tall blades of grass. If something is much too powerful, there's no practical problem with simply cutting the power down dramatically. Once a game has gone out into the world, however, players become invested in the way things are and a subset of them can respond with extreme negativity to anything coming down in power. So if we recognize something is too powerful in testing, we're more eager to cut it down before launch than we are to power up something that feels a little weak.

Doubling and Halving

How do we tune? The Sid Meier way: doubling and halving. For many people, the instinct when tuning is to try small incremental adjustments. This is usually not an efficient process. It's almost always much faster to halve or double the values being adjusted and only make smaller tweaks after you've already passed the target. Is this unit too slow? Double its speed. Is this class not gaining health fast enough? Double its health per level. Is this weapon mod doing too much damage? Halve it. Is this encounter too big? Remove half of the combatants.

Once you've confirmed you've gone too far, you can roll back so you've gone just far enough. With enough field testing and adjustment, we can get all of the systems and content into a place where we feel like they are solid and provide a wide variety of viable choices for players. After that, the game has to survive contact with the players.

Post-Launch Tuning

Even projects with large development and testing teams can have difficulty catching all issues with systems and content, and it shouldn't be too surprising. The gamers who play the finished product often outnumber the dev and testing teams by a ratio of several hundred to one. There are combinations of choices they will make that the developers could only theorize about and that the testers couldn't hope to test purely due to time constraints.

After a game has launched, there are two ways to handle tuning. The most straightforward is patching. Patching on consoles is generally much more restrictive than patching on PCs. On consoles, patching takes a non-trivial amount of time and money, so the number of chances we have to "get it right" is very low. As a result, we have to be very confident in the changes before we commit them to a patch. If we overdo it, we may not get another chance to compensate.

Unofficially, after all of the patches have been wrapped up and everyone has moved on, there's also the modding community. As a designer, I never want to leave something in such a state that it requires mods to rectify, but it's extraordinarily difficult to catch everything, even with patches. Additionally, sometimes players simply don't like the balance choices we make. We're trying to satisfy hundreds of thousands, sometimes millions, of players. We can't suit everyone's tastes, so modding is a great way for players to experiment with game development and create their own personal "cut" of a game they love.

All These Feels

The most important high-level goal with any choice the player makes is that they feel good. This is an abstract concept, but it's important to understand that games come down to a series of experiences for the player. The reason we tweak or adjust anything isn't simply to achieve a mythic "perfect balance" as a goal in its own right, but to make something balanced enough that the player's experience with that content is satisfying. There are myriad aesthetic and mechanical elements that feed into the player's perception of the options that are available to them. We want players to feel that their choices fit their character concept and are ultimately up to the challenge — without making the challenge irrelevant.


Josh Sawyer is a veteran game designer who has worked on a number of role-playing games including Icewind Dale II, Fallout: New Vegas, and the upcoming Pillars of Eternity. You can follow him on Twitter at @jesawyer.