June 1, 2012
Voice recognition software, most of us would probably agree, is a pretty cool thing. But the talking to machines part–be it smartphone, TV screen or dashboard–well, not so much. Asking advice of a device? Reeks of geek. Enunciating each word so you can be understood? How cool can you really be?
But Apple, true to form, has taken this head on by hiring three icons of cool to star in their latest ad campaign for Siri, the voice of the iPhone 4S. There’s Zooey Deschanel (Adorable Cool) and John Malkovich (Cerebral Cool) and Samuel L. Jackson (Ultimate Cool), and all make engaging in wordplay a with a phone seem the sport of gods.
Critics, nonetheless, point out that in real life, Siri is neither as responsive nor all-knowing as she’s portrayed in commercials. You, too, I’m sure, are shocked to hear this. Others see the whole thing as ripe for parody–see Zooey’s brother Jooey do a Funny or Die version of Zooey’s and Siri’s rainy day together.
No matter. Siri has become a lead singer in the robot chorus, the “You Got Mail” voice of a new generation.
It is fashionable in some circles to suggest that Siri isn’t Steve Jobs-worthy, that if he were still alive, Jobs would have pulled it off the market or, at the very least, never would have approved such a high-profile ad campaign for so flawed a product.
But as Jobs’ successor, Tim Cook, said earlier this week, iPhone 4S owners like Siri. According to a survey released in March, almost 90 percent say they use it at least once a month. And keep in mind that Siri, one of the very few Apple products said to be in beta when it was released, won’t celebrate her first birthday until October. She’s still learning language and, even more importantly, just beginning to tap the potential of artificial intelligence.
Siri will likely be a centerpiece of Apple TV, expected to make its debut in December. But chances are, the place where talking to machines will go mainstream is in our cars.
Drive, she said
Sure, that’s already happening, but you still have to switch to robot speak if you want to be understood. And even then there’s no guarantee. That will start to change this summer when some new models will come equipped with something called Dragon Drive!
It’s the invention of Nuance Communications, a Massachusetts-based company that’s become a powerhouse in the voice recognition business. (It’s widely believed to be the brains behind Siri.) Nuance and voice recognition in cars took a big leap forward last week when the firm announced that Dragon Drive! will be able to tap into the cloud.
What this means is that the system will dramatically ramp up its computing power and memory capability. And that means that the voice in your dashboard will become more Siri-like and allow you to actually converse with it. No more monosyllabic shouting. The day is coming when you’ll be able to casually mention that you feel like some Allman Brothers and seconds later “Whipping Post” will come pumping through the speakers.
The key is how well we’re able to teach machines context and pragmatics–how language is used in social situations. And that’s tricky business. For starters, even the most sophisticated voice recognition device needs to wait for a human to finish speaking so it’s able to parse and interpret the whole sentence. Then there’s the “theory of mind,” the ability to understand that other people can have different beliefs and intentions than our own. As far as we know, only humans can do this.
A recent study by two Stanford psychologists can give you a sense of what’s involved in helping machines intuit. Researchers Michael Frank and Noah Goodman set up an online experiment in which participants were asked to look at a set of objects and then select which one was being referred to be a particular word. For instance, one group of participants saw a blue square, a blue circle and a red square. The question for that group was: Imagine you are talking to someone and you want to refer to the middle object. Which word would you use, “blue” or “circle”?
The other group was asked: Imagine someone is talking to you and uses the word “blue” to refer to one of these objects. Which object are they talking about?
The responses helped the researchers get a clearer picture of how a listener understands a speaker and how a speaker decides what to say. From that, they developed the kind of mathematical model that can expand and refine a computer’s thought process.
Said Frank: “It will take years of work but the dream is of a computer that really is thinking about what you want and what you mean rather than just what you said.”
A manner of speech
Here are some more recent developments in voice recognition:
- Siri goes silent: IBM tends to be real nervous about corporate secrets from getting out, so it now forbids its employees from using public file transfer sites, such as Dropbox. But it also has a ban on the use of Siri in the office because security execs worry that someone, while talking to their phone, could reveal sensitive info that ends up on Apple’s servers.
- Take that, Apple!: Samsung launched its new Galaxy X III smartphone in London this week, and while its big touchscreen is getting a lot of attention, it also features new voice and face recognition software.
- Do what I say, not what I do: And Samsung’s not stopping there. It recently filed a patent application for a robot that understands human speech. The robot would be able to adjust its “listening” capabilities to take into account ambient noise that might interrupt or disrupt commands it’s been given. It would also be able to recognize who’s speaking to it, even if the background noise is very loud.
Infographic bonus: You think your car is computerized now. Wait until it’s completely plugged into the Internet. Get the lowdown on what a connected car can do.
December 2, 2011
Remember that scene in Minority Report when Tom Cruise manipulates 3-D images in mid-air simply by moving his hands. It’s a moment when you forget the plot, the setting, the sci-fi theme and you just sit there and think, “That is soooo cool.”
Flash forward to last fall when Microsoft rolled out its Kinect motion-sensing devices for the Xbox 360. At the time you didn’t hear many people say “This changes everything.” It was mainly seen as Microsoft’s answer to Nintendo, a Wii without the wand that allowed people to play games simply by moving their bodies.
That’s clearly what Microsoft had in mind and it no doubt was supremely tickled when Kinect became the fastest-selling consumer tech product of all time—10 million sold in just four months. But within weeks of its debut, Kinect began morphing into something much bigger. First, hackers started using it to give robots 3-D vision. Then other tinkerers took it in more directions—from creating interactive shadow puppets to adapting it so surgeons in operating rooms could manipulate CT scans by just waving their hands. Sound familiar?
At first Microsoft did the lawyer thing, threatening to “work closely with law enforcement groups” to keep people from tampering with its Kinect. But savvier heads prevailed. Over the past year, it’s done a full 180 on this, first launching a website celebrating what it’s dubbed “The Kinect Effect,” then a month ago releasing a very slick ad showing just how much Kinect has caught the wind. Just two weeks ago, Microsoft announced “Kinect Accelerator,” a program designed to help developers and startups create original products using the Kinect.
And then, earlier this week, word leaked out that the next version of Kinect will be able to read your lips and facial expressions and gauge how you’re feeling by the tone of your voice.
Yet as impressive as all of this sounds, I’m sure some of you may be thinking, “I don’t play video games, don’t own a robot, am not a surgeon and have never dabbled in shadow puppets, so what’s Kinect got to do with me?”
I’ll answer with another question: You’ve used a TV remote, right?
That’s where this is headed, to your living room. No one wants to use a keyboard to control what’s on their TV. A remote’s bad enough. And touching the screen isn’t very practical. But being able to change channels by waving your hand, or calling out a number or even blinking your eyes, well, I’d say we have a winner.
Tell me what you want
The other hot item in the realm of human-machine bonding is Siri, the “personal assistant” that lives inside the iPhone 4S. With its high-end voice recognition software, it carries out your spoken requests. Need to send a text to a friend? Tell Siri. Out-of-town and looking for Mexican food? Ask Siri for recommendations. Wondering if you’ll need an umbrella tomorrow? Siri will be your weathergirl.
This, undoubtedly, is the future of search, but as with Kinect, hackers are broadening Siri’s horizons. One has figured out how to use the software to order his car to start. Another has jerryrigged it so he can tell his thermostat to turn down, his lights to turn off and yes, his TV to turn on.
Here’s more from the world of human-machine relationships:
- There’s something in the air: From Russia comes a technology that one-ups Kinect. It’s called DisplAir and uses an infrared camera, a projector and cold fog to produce 3-D images in thin air that can be controlled with hand movements.
- Please don’t type on my face: Keyboards may be on their way out, but virtual keyboards that can be reflected on almost any surface, and actually work, are coming soon.
- Ah, the touch of cardboard: Researchers in Germany have come up with a way to make clothing, furniture, even cardboard, work like the touch screen of an iPhone.
- You’re so cute when you write with your finger: A Finnish company has developed technology that turns walls into group screen-touching experiences. Already it’s being used in bars in Japan and Hong Kong.
- It’s not just a guy thing: Rebecca Rosen, associate editor at The Atlantic weighs in on why so many helper devices, such as Siri and GPS, have women’s voices.
Video Bonus: Can’t get enough of the Kinect hacks? Here are a dozen more.
The Question: What would you like to see a body-motion technology like Kinect be able to do?