Microsoft is spending half a billion dollars to make sure that when you hear the word Kinect you think "the future of video games."
But Microsoft believes that the real potential for the voice recognition, motion-based technology is how they plan to use it with cell phones, computers and perhaps in the military and health industries.
"Think about a world where machines understand what people want from them," said Kinect Creative Director Kudo Tsunoda. "You can see that extrapolate out to a host of other devices."
Microsoft promised when they launched the $150 Xbox 360 Kinect add-on this week that it would make you the controller. That means it allows gamers to play soccer, race on a hoverboard and raft down rapids all by simply going through the motions in front of your television. The Kinect also allows you to video chat with friends, flip through the console's menus with a swipe of your hand and zip through movies by plucking at the air in front of your face.
Despite the sometimes magical moments it creates, Kinect is really just a collection of off-the-shelf cameras, sensors and microphones packed into an array that sits atop or under your television. It looks surprisingly like the head of the lovable, titular robot star of Pixar movie Wall-E.
What Microsoft is betting on isn't that plastic peripheral, it's the software and algorithms that Kinect uses to do its job. A job that includes not just watching a gamer's movements and translating it into on-screen video game action, but also using the spoken word to take direction, recognizing who is playing a game and, perhaps ultimately, what that person wants from their gaming experience.
"From a broad Microsoft perspective," Microsoft's director of the Kinect technology, Alex Kipman added, "we believe in a more natural way for people to interact with technology. People don't like to hold gadgets in their hands, that includes keyboards and mice.
"The Kinect is the start of that journey."
The first thing you're likely to notice when you play a Kinect game is how, like with Wii and the Playstation Move, the game recognizes your movements. Unlike its competitors, with the Kinect you don't need to hold anything in your hand to play games, but on some level the experience feels very similar across all of the platforms.
I pointed this out to Kipman, the brains behind Kinect. He's quick to explain that the neatest bit of technology to come out of Kinect has nothing to do with motion tracking. It's the audio work, he said.
"It is totally different than some of the things people have experienced before," he said.
While there already are clocks, phones and other electronics that actively listen for voice commands, most current voice recognition systems are "push to talk." That means you have to push a button to get them to listen to you. And typically, the device is being used in a place where the user is a foot or two in front of the microphone and not moving around.
That's the case if you're in a car, or using a PC's software or on a phone. But not the case with Kinect.
"In our world, first of all, people are sitting all over the room," he said. "There are many different people sitting 10 to 12 feet away from the system. People are having fun in the living room, so it's not a quiet environment. There is lots of ambient noises."
And lots of noises pumping out of the speakers thanks to the Xbox 360 as well. But you don't have to quiet down the room and then push a button to get the Kinect's attention. It's always listening for you.
To get that to work Microsoft combined several different types of technology.
"The first one is how advanced our multichannel echo cancellation is," he said. "We are able to cancel out all of the noise coming out of the speakers because the Kinect is synced with the Xbox and can cancel it out the instant it comes out.
"The second issue is the ambient noise of people talking. We use sophisticated algorithms to figure out who is talking and where they are so we can hone in on them."
Kipman says Kinect uses its software to essentially form a virtual cone around the speakers mouth and listen to that one spot for commands. And it can do that even if the person is moving.
"Once you combine and fuse all of these things together you get a system that transforms how you interact with it," Kipman said.
Tested in my home, the voice recognition technology struck me as the most impressive. I can sit on my couch, more than nine feet from the television and that Wall-E sensor, lean back into the cushions and say in a normal voice "Xbox." Occasionally I have to repeat myself, but typically it hears me. It can even hear me through the roar of Last.FM's music or the sounds of movies or ESPN sports. There are some stumbling points. It does, for instance, seem to have issues recognizing my 9-year-old son. But when it works it's impressive, a little transformational and, yes, a tiny bit creepy.
The best example of how Kinect will impact things not related to video games can be found on the Xbox 360 today. It's called Video Kinect.
On its surface Video Kinect is a typical video chat program. The difference is that the users don't need to wear a headphone or sit close to the screen to use it. Nor do they have to sit still. A person can pace in front of the camera and speak in a normal tone and the Kinect will follow the user using digital zoom and hear what he or she says with its audio algorithms.
"In a way what you see with Video Kinect is where we are transforming not just gaming but entertainment," Kipman said. "You don't have to understand technology, but technology understands you."
Kipman says he receives calls on a daily basis from people involved in the military, healthcare, mobile phones and computer spaces about Kinect and its potential in those fields.
"There is a wide amount of interest from many, many industries," he said. "I love the interest, but if you want to create something transformational you need focus. And our focus has really been, let's nail it in the living room on the Xbox 360."
But Microsoft is already looking past this week's launch of Kinect, to the technologies uses outside of gaming.
"I think that both with the gesture tech and the voice recognition there are so many different types of machines with some sort of input paradigm associated with them that this could be applied to," Tsunoda said. "Is it coming to the PC and mobile spaces? Sure, but anything with an input device, keypads or things you are punching numbers into, can use this.
"This is much bigger than something just for the mobile phone and PC."
I asked Kipman and Tsunoda if we would see this technology in computers, cell phones and cars in the next decade.
"If it takes a decade to get there I should fire myself today," Kipman responded.
While the two declined to be more specific about when we might see the technology behind Kinect on phones and computers, they did make it a point to say that the existing cameras and microphones in cell phones and laptops are "within spitting distance" of the tech needed to get the job done.
Microsoft's commitment to extending this technology can be seen in who worked on the project.
"What you see in Kinect is a movement across all of Microsoft" Kipman said, "where people are galvanized behind this vision about transforming the world from what it was to what it will be.
"From Steve Ballmer on down, everyone at Microsoft is waiting to see if the consumers will love this as much as we love it"
And Kipman says that Microsoft is already working with companies they've partnered with in the past. While he declined to name those companies, past efforts have included bringing television to the Xbox 360 with AT&T's Uverse and push-to-talk voice commands to cars with Ford, Kia and Fiat's take on Windows Embedded Automotive 7. That's better known to drivers as SYNC, UVO and Blue&Me.
Microsoft first talked about the notion of bringing television to the Xbox 360 in 2007. The idea was initially pitched as giving gamers the ability to watch and record television on their console while they played games. It has since evolved into something more akin to Google TV, a smart TV platform that grants users access to a more interactive television viewing experience that includes the Internet.
When I met with Microsoft co-founder Bill Gates in 2007, he also talked to me about the importance of giving the Xbox 360 the ability to see and hear. Even back then Gates called vision and audio for the Xbox 360 the future.
"I don't mean just the video conferencing, I mean the actual software analysis," he told me at the time. "There is a lot of work between the Xbox team and Microsoft research on that."
This notion, of combining television with a gaming console that is always listening and watching seems to form the basis for Kinect's next big leap.
For instance, while Kipman said that using motions and voice to control channel surfing would be a good use of the technology, but added that "there are other massive uses that are not as obvious. You can use those inputs in more subtle ways."
"Imagine a world where we sit in our living room watching a Brazil versus England football game and the Xbox notices you are wearing Brazilian t-shirt and makes it a more Brazilian experience," he said. "We could offer a two-way discourse as you're watching news."
The Xbox 360 already allows you to use Kinect to watch live and recorded sports with ESPN. That experience includes the ability to hang out with fellow sports fans in avatar form and do things like vote in sports polls.
The key to doing all of that, Kipman says, is to make technology disappear and put the person at the center of the experience.
As Microsoft moves forward to spread the technology first seen with Kinect to other areas, they will continue to extend its use with gaming as well as they strive to change they way people play games.
"I think this is as big as going from black and white to color," Kipman said. "It changes the discourse for how people treat interactive entertainment, and even making the more linear static kind more interactive."
"Kinect is not an 'or' technology, it's an 'and' technology," he said. "It's an and of identity, of voice, of gestures and the controller. You can have a controller experience augmented with voice, augmented with who I know you are. I can have any number of hybrid experiences."
Imagine a game you play with an Xbox controller that also listens for your voice commands or watches for you to lean or make gestures to do things like fling a grenade or peek around a corner.
"From a creative perspective the stuff I'm most excited about isn't how can we take existing games and update them with Kinect, it's how can we create new experiences that are Kinect enabled," Tsunoda said. "You have games that are very story based, that have you build a relationship with your characters. With Kinect you could do stuff like combine not just motion-based stuff, but voice, like being able to talk to people in the game. Human recognition stuff. You can start building completely unique experiences in game like you do in real life."
Tsunoda says Kinect reminds him of his childhood dreams of having a holodeck to play with.
"I remember dreaming of going into a fully immersive world where you are able to interact with everything as you would in real life," he said. "This is far from a fully immersive world, but I do think that Kinect is one of the very first stepping stones toward making that dream a reality."