Microsoft is spending half a billion dollars to make sure that when you hear the word Kinect you think "the future of video games."
But Microsoft believes that the real potential for the voice recognition, motion-based technology is how they plan to use it with cell phones, computers and perhaps in the military and health industries.
"Think about a world where machines understand what people want from them," said Kinect Creative Director Kudo Tsunoda. "You can see that extrapolate out to a host of other devices."
Microsoft promised when they launched the $150 Xbox 360 Kinect add-on this week that it would make you the controller. That means it allows gamers to play soccer, race on a hoverboard and raft down rapids all by simply going through the motions in front of your television. The Kinect also allows you to video chat with friends, flip through the console's menus with a swipe of your hand and zip through movies by plucking at the air in front of your face.
Despite the sometimes magical moments it creates, Kinect is really just a collection of off-the-shelf cameras, sensors and microphones packed into an array that sits atop or under your television. It looks surprisingly like the head of the lovable, titular robot star of Pixar movie Wall-E.
What Microsoft is betting on isn't that plastic peripheral, it's the software and algorithms that Kinect uses to do its job. A job that includes not just watching a gamer's movements and translating it into on-screen video game action, but also using the spoken word to take direction, recognizing who is playing a game and, perhaps ultimately, what that person wants from their gaming experience.
"From a broad Microsoft perspective," Microsoft's director of the Kinect technology, Alex Kipman added, "we believe in a more natural way for people to interact with technology. People don't like to hold gadgets in their hands, that includes keyboards and mice.
"The Kinect is the start of that journey."
Kinect Isn't Just Watching, It's Listening
The first thing you're likely to notice when you play a Kinect game is how, like with Wii and the Playstation Move, the game recognizes your movements. Unlike its competitors, with the Kinect you don't need to hold anything in your hand to play games, but on some level the experience feels very similar across all of the platforms.
I pointed this out to Kipman, the brains behind Kinect. He's quick to explain that the neatest bit of technology to come out of Kinect has nothing to do with motion tracking. It's the audio work, he said.
"It is totally different than some of the things people have experienced before," he said.
While there already are clocks, phones and other electronics that actively listen for voice commands, most current voice recognition systems are "push to talk." That means you have to push a button to get them to listen to you. And typically, the device is being used in a place where the user is a foot or two in front of the microphone and not moving around.
That's the case if you're in a car, or using a PC's software or on a phone. But not the case with Kinect.
"In our world, first of all, people are sitting all over the room," he said. "There are many different people sitting 10 to 12 feet away from the system. People are having fun in the living room, so it's not a quiet environment. There is lots of ambient noises."
And lots of noises pumping out of the speakers thanks to the Xbox 360 as well. But you don't have to quiet down the room and then push a button to get the Kinect's attention. It's always listening for you.
To get that to work Microsoft combined several different types of technology.
"The first one is how advanced our multichannel echo cancellation is," he said. "We are able to cancel out all of the noise coming out of the speakers because the Kinect is synced with the Xbox and can cancel it out the instant it comes out.
"The second issue is the ambient noise of people talking. We use sophisticated algorithms to figure out who is talking and where they are so we can hone in on them."
Kipman says Kinect uses its software to essentially form a virtual cone around the speakers mouth and listen to that one spot for commands. And it can do that even if the person is moving.
"Once you combine and fuse all of these things together you get a system that transforms how you interact with it," Kipman said.
Tested in my home, the voice recognition technology struck me as the most impressive. I can sit on my couch, more than nine feet from the television and that Wall-E sensor, lean back into the cushions and say in a normal voice "Xbox." Occasionally I have to repeat myself, but typically it hears me. It can even hear me through the roar of Last.FM's music or the sounds of movies or ESPN sports. There are some stumbling points. It does, for instance, seem to have issues recognizing my 9-year-old son. But when it works it's impressive, a little transformational and, yes, a tiny bit creepy.