Bugs in Fallout: New Vegas might have eaten your save file. Maybe they took away a few hours of progress, or forced you to reset a couple of quests. Maybe game-crashing bugs pissed you off to the point where you wished you could get your $60 back. But they probably didn’t cost you a million dollars.
[This article was originally published on April 11, 2013.]
Perhaps you’ve heard the story: publisher Bethesda was due to give developer Obsidian a bonus if their post-apocalyptic RPG averaged an 85 on Metacritic, the review aggregation site. It got an 84 on PC and Xbox 360, and an 82 on PS3.
“If only it was a stable product and didn’t ship with so many bugs, I would’ve given New Vegas a higher score,” wrote a reviewer for the website 1up, which gave New Vegas a B, or 75 on Metacritic’s scale.
“It’s disappointing to see such an otherwise brilliant and polished game suffer from years-old bugs, and unfortunately our review score for the game has to reflect that,” said The Escapist’s review, which gave the game an 80.
If New Vegas had hit an 85, Obsidian would have gotten their bonus. And according to one person familiar with the situation who asked not to be named while speaking to Kotaku, that bonus was worth $1 million. For a team of 70 or so, that averages out to around $14,000 a person. Enough for a cheap car. Maybe a few mortgage payments.
Those sure were some costly bugs.
This is not an anomaly: for years now, video game publishers have been using Metacritic as a tool to strike bonus deals with developers. And for years now, observers have been criticizing the practice. But it still happens. Over the past few months, I’ve talked to some 20 developers, publishers, and critics about Metacritic’s influences, and I’ve found that the system is broken in quite a few ways.
There is something inherently wrong with the way publishers use Metacritic.
Hop into a debate with some video game fans on your favorite message board, and there’s one subject that will always come up: review scores. Which game scored the highest? Which scored the lowest? Which are the best review websites? Which are the worst?
Inevitably, at some point, someone will jump into the fray and say something like “lol review scores mean nothing anyway.” To some people, maybe that’s true. But to the people who make and sell video games, review scores are more important than many casual fans realize. Mostly because of Metacritic.
For the uninitiated: Metacritic is an aggregation website that rounds up review scores for all sorts of media, including video games. The people who run Metacritic take those scores, convert them to a 100-point scale, average them out using a mysterious weighting formula (more on that later), and spit out a number that they call a Metascore, meant to grade the quality of that game. The Metascore for BioShock Infinite, for example, is currently an 94. Aliens: Colonial Marines? 48.
To people who work in gaming, these Metascores can mean a lot. Say you’re a developer who needs money. You’ve got some ideas to pitch to publishers. You take some meetings. They’re going to ask: just how good have your games been?
“Typically, when you go into pitch meetings and whatnot, publishers are going to want to know your track record as far as Metacritic,” said Kim Swift, a game designer best known for helping create games like Portal and Quantum Conundrum. “As a company, what is your Metacritic average? As an individual, what is your Metacritic average?”
Swift works for Airtight Games, an independent studio that is tied to no publishers. Their Metacritic history: Dark Void, which has a 59 on Metacritic, and last year’s Quantum Conundrum, which sits at 77. [UPDATE: In 2014, a year after this article was originally published, Airtight shut down. Swift now works for Amazon Games.]
In order to survive, studios like Airtight have to negotiate deals with big companies like Capcom and Square Enix. Often that means talking about Metacritic. Sometimes that means wearing their history of Metacritic scores like a scarlet letter.
This is common. An employee of a well-known game studio told me about a recent pitch meeting with a publisher, during which the publisher brought up the studio’s last two Metacritic scores, which were both average. The studio employee asked that I not name the parties involved, but claimed the publisher used the Metascores as leverage against the studio, first to negotiate for less favorable terms, and then to turn down the pitch entirely.
Often, developer bonuses or royalties are tied to game review scores. Fallout: New Vegas is one high-profile example, but it happens fairly often.
“It’s pretty common in the industry these days, actually,” Swift told me. “When you’re negotiating with the publisher for a contract, you build in bonuses for the team based on Metacritic score. So if you get above a 90, then you get X amount for a bonus. If you get below that, you don’t get anything at all or get a smaller amount.”
In other words, a developer’s priority is sometimes not just to make a good game, but to make a game that they think will resonate with reviewers, which could mean anything from artificially extending a game’s length or adding superfluous features that they believe reviewers like.
“When you’re working on a game, part of what you want to do is have a high score,” said Swift. She said she’d never seen a developer change part of a video game just for the sake of raising scores, but the influence is undoubtedly there.
“It’s usually some other thing like, ‘Hey, we could use another couple hours on this game because people perceive a longer game to be a higher value,’” Swift said. “It’s never directly pointing back to, ‘This is gonna improve our score by X number of points.’”
Matt Burns, a longtime game designer who worked for a number of big shooter companies and now makes indies with his company Shadegrown Games, wrote about his personal experiences with Metacritic back in 2008. Burns said he watched firsthand as a development studio worked as hard as possible to make a game that would snag high review scores.
“Armed with the knowledge that higher review scores meant more money for them, game producers were thus encouraged to identify the elements that reviewers seemed to most notice and most like–detailed graphics, scripted set piece battles, ‘robust’ online multiplayer, ‘player choice,’ and more, more of everything,” Burns wrote.
“Like a food company performing a taste test to find out that people basically like the saltiest, greasiest variation of anything and adjusting its product lineup accordingly, the big publishers struggled to stuff as much of those key elements as possible into every game they funded. Multiplayer modes were suddenly tacked on late in development. More missions and weapons were added to bulk up their offering–to be created by outsource partners. Level-based games suddenly turned into open-world games.
“Before you cry in despair, keep in mind that all these people wanted in the end was the best game possible–or, more precisely, the best-reviewed game possible.”
And then there’s this wry joke by Warren Spector, talking about the words that influenced his career during a talk at the DICE conference earlier this year. Powerful words. Legacy. Mentor. And...
While chatting with Obsidian head Feargus Urquhart for the profile I published in December of 2012, I asked him about what had happened with Fallout: New Vegas. For legal reasons, he couldn’t get into the specifics.
“I can’t comment on contracts directly,” he said. “But what I can say is that in general, publishers like to have Metacritic scores as an aspect of contracts. As a developer, that’s challenging for a number of reasons. The first is that we have no control over that, though we do have the responsibility to go make a brilliant game that can hopefully score an 80 or an 85 or a 90 or something like that.”
According to Metacritic’s rating scale, any game above a 75 is considered “good,” but realistically, according to multiple developers I spoke with, publishers expect scores of 85 or higher. Sometimes, Urquhart told me, the demands can get unreasonable.
“A lot of times when we’re talking to publishers–and this is no specific publisher–but there are conversations I’ve had in which the royalty that we could get was based upon getting a 95,” Urquhart said. “I’ve had this conversation with a publisher, and I explained to them, I said, ‘Okay, there are six games in the past five years who have averaged a 95, and all of those have a budget of at least three times what you’re offering me.’ They were like, ‘Well, we just don’t think we should do it if you don’t hit a 95.’”
That’s the developer’s perspective. Now let’s look at this from the other side. Say you’re a publisher. You’re about to sign a seven- or eight-figure deal with a development studio, and you want to make sure they’re not going to hand you a clunker. Why not use Metacritic as a security blanket in order to minimize risks and ensure you get yourself a great game?
Here’s some very reasonable rationalization from a person who worked at a major publisher and asked not to be named:
“Let’s say that [a publisher] wanted to pay $1 million up front (through milestone payments over the course of development), but the developer wanted $1.2 million. If they wouldn’t budge, sometimes we would offer to make up the difference in a bonus, paid out only if the game hit a certain Metacritic [score].
“That conversation could happen during development too. Maybe a developer wanted more time and money in the middle of the production, to make a better game. So the counter was, ‘If you’re so sure it will make the game better, we’re gonna tie the additional funds to the Metacritic score.’ It was a way to minimize risk.”
But a different person who once worked for major publishers (and requested anonymity because he was not authorized to speak on this issue) says that Metacritic scores are just an excuse publishers use in order to deprive developers of the bonuses they deserve.
“Well, generally the whole Metacritic emphasis originated from publishers wanting to dodge royalties,” that person said. “So even if a game sold well, they could withhold payment based off review scores... The big thing about Metacritic is that it’s always camouflaged as a drive for quality but the intent is nothing of the sort.”
Multiple developers I spoke to echoed similar thoughts, although nobody could share hard evidence to back up this theory. I reached out to a number of major publishers including Activision, EA, and Bethesda, but none agreed to comment for this story.
Marc Doyle, the former lawyer who co-founded Metacritic in 2001 and keeps it running every day, told me during a phone conversation last week that he feels no responsibility for what video game publishers or developers do with his website.
“Metacritic has absolutely nothing to do with how the industry uses our numbers,” he said. “Metacritic has always been about educating the gamer. We’re using product reviews as a tool to help them make the most of their time and money.”
But gamers aren’t the only ones who use Metascores. Not by a long shot. Even the massive Japanese publisher Square Enix recently cited Metacritic as one of the factors they used to predict sales for their games.
“Let’s talk about Sleeping Dogs: we were looking at selling roughly 2~2.5 million units in the EUR/ NA market based on its game content, genre and Metacritic scores,” former Square Enix president Yoichi Wada wrote in a recent financial briefing. “In the same way, game quality and Metacritic scores led us to believe that Hitman had potential to sell 4.5~5 million units, and 5~6 million units for Tomb Raider in EUR/ NA and Japanese markets combined.”
“Review scores are a part of our industry and it’s something we pay attention to as developers,” said Swift. And they lead to trends. “Review scores of this year are gonna drastically affect what’s gonna be seen next year,” she said.
Even big retailers like Walmart and Target ask publishers for Metacritic predictions when deciding whether or not to feature certain games. “One of the criteria [retailers] have is, ‘What’s the review score gonna be?’” said Tim Pivnicny, vice president of sales and marketing at Atlus USA. “That comes up a lot... They’re concerned if it’s going to be a good game.”
Metacritic has a significant influence on the way games are produced today. That’s a problem.
When I first heard about the Fallout: New Vegas bonus, I wrote an editorial about how silly it is for publishers to use Metacritic as a measure of quality. Video games are personal experiences, and they can’t be evaluated objectively, especially through some sort of arbitrary numerical score that means different things to different people. (Go ahead and try to explain the qualitative difference between an 81 and an 82.)
That’s the obvious reason. But there are others. For one, people are gaming the system. On both sides of the aisle.
There’s the story of the mocked mock reviewer, for example. Some background: game publishers and developers often hire consultants or game critics to come into their offices, play early copies of games, and write up mock reviews that predict how those games will perform on Metacritic. Often, if possible, publishers and developers will make changes to their games based on what those mock reviews say. Mock reviewers are then ethically prohibited from writing consumer reviews of that game, as they have taken money from the publisher.
One developer–a high-ranking studio employee who we’ll call Ed–told me he hired someone to write a mock review, then just shredded it. Ed didn’t care what was inside. He just wanted to make sure the reviewer–a notoriously fickle scorer–couldn’t review his studio’s game. Ed knew that by eliminating at least that one potentially-negative review score from contention, he could skew the Metascore higher. Checkmate.
(In case you’re wondering, Kotaku writers are prohibited from doing mock reviews or taking any work from the publishers we cover.)
When I asked Metacritic’s Doyle about practices like this, he admitted that he had heard similar stories. He said he works closely with all 140 review publications that he aggregates on Metacritic, and he said he constantly evaluates and examines each one. “Trying to prevent people from gaming the system is something I always think about,” Doyle said.
But it’s still happening.
“Anything we can do to optimize the score, we’re gonna do,” Ed told me.
Sometimes it’s subtle things: lavish review events that force game critics to review games on a studio’s terms; review embargoes that become more flexible when a score is higher; swag that gets sent to offices and discarded oh-so-often, like Gears of War beef jerky and Legos based on sets from Lego City Undercover. So long as Metacritic has an effect on the people who make games, the people who make games will find ways to influence it.
Those most susceptible to pressure from video game publishers may be the smaller websites that need traffic from aggregate sites like Metacritic in order to survive–websites that might make sketchy deals in order to get that traffic. Jeff Rivera, a game journalist who worked as an editor for a group of websites called Advanced Media Network (which later became Kombo.com), told me he saw one of those deals back in 2006.
“We had an agreement with Sega that we would run a week-long special with our top stories on the DS channel being dedicated to Super Monkey Ball,” Rivera said in an e-mail. “I was handling the review, and on the night before we were going to publish, I got an IM from a co-worker asking what I was going to score the game.
“I told him that I didn’t know yet and wondered why I was being asked, as it was something I’d never had happen before. He went on to tell me that PR said that our review would be guaranteed exclusive for a day if my score was to be 8.0 or better.”
Rivera said he had already written his review at that point, and that he had scored the game an 8.1. (The review is no longer online, but it’s still listed under Kombo on GameRankings.)
“I told them I that didn’t know what I would give it, because I didn’t want them feeling like they ‘bought’ my review score,” he said. “More pressure came to divulge my score, and I kept saying that I didn’t know, but that 8.0 was the ball park range.”
When I asked Sega for comment on this story, they sent over a statement: “Sega has a strict internal policy against soliciting high scores in exchange for early reviews and against the practice of influencing reviewers.” But Rivera said this had happened in 2006. I asked Sega when they’d enacted this policy, but the publisher never got back to me.
From conversations I’ve had with developers and other press, it seems like this sort of thing occurs less often these days. But there are always stories and whispers. Developers begging reviewers to change their scores. PR people intentionally sending out late review copies when they know a game is going to be bad, or sending early copies to websites known for handing out higher scores.
If you read about games online, you’re probably familiar with some of the websites on Metacritic: outlets like IGN and GameSpot are well-established publications that pay their writers and have solid reputations. But other names on Metacritic’s large list of publications are less recognizable. Some are run by volunteers; others are lesser-known to American gamers.
In order to give more importance to the bigger websites, Metacritic uses a weighting system that puts more emphasis on the heavy-hitters, making their scores count for more. But Doyle and his team won’t give any details about the system they use. This opaqueness has led to some controversy over the years: most recently, a Full Sail University study made headlines when the people behind it claimed to have modeled Metacritic’s formula, but their model turned out to be wrong. The event led many to ask: why doesn’t Metacritic just tell us how they weigh outlets?
“We’re transparent about everything on Metacritic except for the critic weightings,” Doyle told me. “That may seem like a drastic thing, but I’m just telling you that, in my opinion, it’s not. If you simply stripped out all the weights, it wouldn’t have a huge effect on that number.”
Doyle gave me a few explanations: for one, he said he doesn’t want publishers pressuring the highest-weighted publications. Another reason: Metacritic tweaks the system frequently, and they don’t want to have to talk about it every time they do, potentially embarrassing a publication whose weight they’ve just lowered.
But people find it hard to trust what they don’t understand. And nobody understands how Metascores are computed.
One of Doyle’s other big policies has also been in the news recently: Metacritic’s refusal to change an outlet’s first review score, no matter what happens. It’s a policy they’ve had for a while now, Doyle told me. He enacted it because during the first few years of Metacritic, which launched in 2001, reviewers kept changing their scores for vague reasons that Doyle believes were caused by publisher pressure.
“I decided that if we can, as an aggregator, act as a disincentive for these outside entities, whoever they may be, to pull that kind of stuff, and we can protect our critics by backing up their first published and honest opinion, then we’re gonna do what we can to do that.”
Sometimes, however, this leads to some skewed Metacritic results. In late 2012, GameSpot pulled their review of Natural Selection 2, which had been written by a freelancer. The review contained multiple factual inaccuracies. A different writer then reviewed the game, giving it an 8. But the original score–a 60–remains on Metacritic to this day.
More recently, the website Polygon, which uses an adjustable review scale, gave SimCity a 9.5 out of 10 before it launched. On launch day, when crippling server errors rendered the game unplayable for most, Polygon changed their score to an 8. A few days later, as the catastrophic problems continued, they switched it to a 4. It’s currently a 6.5. Yet anyone who goes to SimCity’s Metacritic page will still see the 9.5.
Still, Doyle stands by his policy.
“Metacritic scores really are that snapshot in time when a game is released, or close to after it’s released,” said Doyle, “when the critics decide, ‘I’ve played this enough, I can evaluate this now fairly, and here’s the score.’”
Another problem for developers: outlier scores. What happens when tons of people like a game, but for one or two reviewers, it just doesn’t click?
“The problem is the scale,” said Obsidian’s Urquhart. “There’s an expectation that a good game is between 80 and 90. If a good game is between 80 and 90, and let’s say an average game is gonna maybe get 50 scores, if you wanna hit that 85 and someone gives you a 35, that just took ten 90s down to 85... Just math-wise, how do you deal with that? Some guy who wants to make a name for himself can absolutely screw the numbers.”
One reviewer well-known for aberrant scores is Tom Chick, who runs the blog Quarter To Three. Chick is listed for having the lowest Metacritic score on BioShock Infinite (a 60) and Halo 4 (a 20), among others. He uses a 1-5 scale that Metacritic converts into multiples of 20, so Chick’s “I liked this game,”–3 out of 5–is converted into a 60, which most Metacritic readers see as a bad score.
But Chick is okay with this system, and when I asked him his thoughts on how Metacritic uses his numbers, he defended the aggregation site.
“An aggregate is only as good as its individual components,” Chick said in an e-mail. “And I feel that a lot of the data fed into Metacritic is of questionable value for how it clusters ratings into a narrow margin between seven and nine. But that’s not a Metacritic problem. That’s an IGN problem, a Game Informer problem, a GameSpot problem. And part of how we get past that problem is by recognizing more varied data. That’s ultimately one of the reasons I’m on Metacritic: I believe a wider range of opinions can add to its value.”
Chick uses a totally different scale than many other websites on Metacritic: Game Informer, for example, describes their 6/10 as follows: “Limited Appeal: Although there may be fans of games receiving this score, many will be left yearning for a more rewarding game experience.” Chick, on the other hand, says his 6/10 means something else entirely. “I believe strongly in using the entire range of a ratings scale, so three stars means that I like a game,” he said. “Quite literally. We have a ratings explanation on Quarter to Three that explains that three stars means ‘I like it.’ It’s that simple.”
Yet Chick’s 60 and Game Informer’s 60 are averaged together. They both affect developer bonuses. They both have an impact on contract negotiations. And they both change the way video games are made.
“The nature of an aggregate system is that multiple scores are aggregated,” Chick said. “You might as well blame IGN for giving a game a 92 instead of a 96. As for how I feel about a studio losing its bonus because the publisher has set an arbitrary number, that’s not my responsibility. My responsibility is solely to my readers.”
Chick’s message is admirable, and his criticism is always sharp, but his scores illustrate one of the biggest problems with how publishers and developers use Metacritic today: inconsistency. When Chick’s scale is so drastically different than Game Informer’s, how can any outside observer look at an average of the two and think that number has any meaning or significance?
There are other points to think about, too. If one person loves a game, and another person hates a game, is it an *average* game? Or just a game that one person loved and another person hated? If two people score a game 100 and two people score it 0, it’s not worth a 50–it’s just polarizing.
The system doesn’t work. And I’m not the only one who thinks so.
“Metacritic’s usefulness as a consumer aid is clear and obvious,” said longtime critic and Gears of War: Judgment writer Tom Bissell. “That the game industry has internalized its values, however, and uses its metrics, apparently uncritically, as a valuable source of self-appraisal, has to be one of the great mysteries of modern industry. It cannot be a coincidence that the form of modern entertainment most self-conscious about its status as an art form is also so slavishly attached to Metacritic.”
“It bastardizes the editorial process for reviews,” said Justin Kranzl, an ex-game critic and current PR rep for Square Enix. “We’re conditioning readers to skip the copy or the video and just get the score. For people who love a dynamic and varied media landscape–and any self respecting PR person should fall into that category–that’s terrible.”
“I think Metacritic is something only publishers care about,” said Monkey Island designer and longtime game developer Ron Gilbert. “The devs I know only care about it to the extent that a publisher bonus has been tied to the game’s Metacritic score (which is a stupid, stupid, stupid thing to do). I’ve never looked up the Metacritic score for any game I’ve worked on. It’s completely irrelevant to me.”
“Metacritic encourages the fallacy that all opinions should be weighted equally, and that a ‘bad’ review is an unenthusiastic review,” said Bissell. “But that’s not true. There are some games I am *more* likely to play when a certain critic gives them what Metacritic regards as a ‘bad’ review. Metacritic leaves no room to discuss, much less pursue, guilty-pleasure games, noble failure games, or divisive games. Everything’s just a 7, or an 8, or a 6.5. That’s the least interesting conversation I can imagine.”
(Metacritic game hubs do include blurbs from each of the reviews they aggregate.)
“Rating a game is so subjective,” said Airtight’s Kim Swift. “I think one of the scarier things for a developer is when a reviewer opens up with ‘I typically hate this type of game’ and you’re like ‘Oh, crap.’”
“I don’t want to carry that burden... these are people with children and families,” said longtime critic Adam Sessler, an ex-TV host who now produces videos for Revision 3 Games. “It is a horrible feeling that what I’m saying–I’m giving my subjective evaluation of an experience that will not be the same experience as other people are going to have–that somehow withholds food and resources... To me it is noxious in the extreme.”
“In fact I would encourage more outlets to employ scoring scales that are incompatible with Metacritic,” said Kranzl, “and I’m always down to discuss with them different ways of getting there.”
“I’ll say this,” said Sessler. “I have considered not doing this job before because of this, because I think there’s something so morally questionable and repugnant about it.”
“I wish it would go away, but if not Metacritic, then some other service would pop up,” said Gilbert. “We humans love to quantify stuff. I wonder what the Metacritic of the Mona Lisa was. I heard it hung in King Louis XIV’s bathroom for many years.”
Perhaps it’s in our nature to make numbers out of everything. And it’s hard to deny that Metacritic is a useful tool for measuring how a small group of people felt about a game at one particular point.
But it’s not a useful tool for much else. There are too many variables, too many people trying to manipulate the system. There’s too much subjectivity in the review process for anyone to treat it like an objective measure of quality. Video games are designed to be personal experiences, and it is disingenuous for publishers to act like review scores are any more than the quantification of those personal experiences. It’s harmful to everyone. Everyone.
It’s harmful to critics, who have to deal with PR pressure and the guilt of taking money out of peoples’ pockets.
It’s harmful to developers, whose careers can be tied to the whims of a critic who may be in a bad mood when he or she plays their game.
It’s harmful to publishers, who must be concerned that they have to put so much value on a website that won’t tell anyone how they calculate their review score averages.
Most importantly, it’s harmful to gamers, because it has a palpable negative impact on the way our video games turn out every year. When developers change games because they think that’s what reviewers will want to see, nobody wins.
Metacritic is a useful tool, but video game publishers have turned it into a weapon. And something’s gotta change.