Sometimes it's hard to appreciate that the countless electronic voices we hear, from the prompt at the self-checkout to the disembodied tone coming from our phones, were provided by a real person. Where do those voices come from? To find out, I asked the original voice of the iPhone assistant Siri, Susan Bennett.
She's a voice actor who, in addition to her iconic iPhone oeuvre, does commercials, sings, and provides voices for many other companies and services. And she explained just how her unique industry works.
How Susan Bennett became the original voice of Siri — and what it says about voice acting
Talking to Susan Bennett is surreal — at one moment she sounds completely normal, except she has the most pleasant voice you've ever heard. But in a flash she can turn on the Siri voice, and you start thinking you're talking to your computer.
Bennett is a native of Burlington, Vermont, who moved to upstate New York when she was young, and her background gave her a neutral American speaking style. After acting and singing at Brown University, she went to twangy Atlanta, where her clear, unaccented voice has given her a unique competitive advantage.
In the 1970s, Bennett broke into voice acting by humanizing a very different computer than Siri, singing the jingle for Tillie the All Time Teller, one of the first ATMs. For decades, Bennett recorded the narration for answering services, PA systems, and other clients that range from big and corporate to small and local. That experience led her to her most recognizable gig.
When Bennett recorded the voice for Siri in 2005, she had no idea it would end up on the iPhone. She recorded it well before the company that built Siri was bought by Apple, and she didn't even know she was the voice of Siri until the product debuted in the App Store in 2010 and then appeared on the iPhone 4S in 2011. But as seamless as Bennett sounds as Siri, it was a surprisingly difficult project to capture her voice.
How a digital assistant like Siri is recorded
Siri needs to be able to say just about everything in the English language, and that took a lot of hard work.
"I recorded four hours a day, five days a week for the month of July," Bennett says. For a voice actor, that workload causes a lot of strain. "That's a long time to be talking constantly. Consequently, you get tired."
The original Siri "was to sound otherworldly and have a dry sense of humor," Bennett says. She added that to her take on the character, even as she focused on staying consistent and clear.
Voice acting always requires some technical acumen — as Bennett says, it's about "being able to read 65 seconds' worth of copy in 60 seconds." But recording for a computerized voice like Siri is especially difficult. These marathon vocal sessions didn't involve reading full words or sentences. Instead, she recorded the raw materials for speech — basic sounds.
The technique of using sophisticated computer programs to build words and sentences from basic sounds is called concatenated speech (Vox sister site The Verge described the process of linking those sounds in 2013). The goal is to try to include every possible sound (usually drawn from a syllable-long building block) so they can be assembled in every possible combination for every possible word.
To record these, voice actors are forced to recite gibberish-like sentences that include all of the English language's different sounds.
At her home studio, Bennett recorded a few phrases for me. She'd saved an old script for a digital voice that she'd done earlier for Lucent Technologies, including absurd phrases like "oil your mills jewel weed today." Bennett calls it "digital voice poetry," and she suggests you get a glass of wine while listening:
The process can take a while because the goal is to record as many varieties and types of sounds as possible, in order to make a better and more human sounding speech. For example, actors like Bennett don't just need to record an "s" sound — they need to record the varying "s" sounds in words like "hiss," "snakes," and "rose." Eventually, the sounds are stitched together by a computer, with a goal of ever-increasing naturalistic sound.
Bennett thinks some new recordings have probably been incorporated into the current version of Siri, to improve it and provide more options for users. That means the digital assistant you hear on your phone today is likely a mashup of different human voices, including Bennett and others, strung together into one helpful program.
New technology has turned voice acting into a highly competitive business
Still, it's more important than ever that Bennett be able to say she was the original voice of Siri. It serves as a unique marker in a business where there's always new talent trying to get the next gig. And that competitive spirit extends to Bennett's home studio, which would make any audiophile envious.
It's built on rubber feet to absorb sound, and she uses it every day. There's foam on the wall, a desk with a pre-amp and mixer, and a Neumann TLM 193 microphone (average price: $1,599). Sitting on an adjustable stool, she reads her scripts off an iPad and has a computer monitor to see how recording is going.
She's invested seriously in her studio because a majority of her recording occurs at home, typical of many voice actors. Thanks to worldwide high-quality connections — begun with high-quality ISDN lines and extending to today's fiber-optic broadband — it's possible for actors around the world to record from home and compete with one another. Like so many industries, technology changed everything for voice actors.
"You could choose a talent from anywhere and record that person from anywhere else," Bennett says. "All the people from any city no longer were limited to their local group of actors. They could go anywhere in the world."
She installed her ISDN in 1996, and to remain competitive, many voice actors did the same. Technology has brought big opportunities to the business, as well as stiffer competition.
But as competitive as voiceover is, voices will always be necessary
Bennett takes care of her voice: drinking tepid water sometimes instead of tea, occasionally having some honey, and avoiding clearing her throat.
But there's no magic strategy to becoming a voice actor, because something about the voice is innate.
"I think that voices are very personal," she says, "and I think that's one of the reasons why people love Siri and all the other digital assistants, because they do bring a bit of humanity to all this machinery we're dealing with."
That's unlikely to change, even as computerized voices become more common. Something about a voice can't be simulated. That's very clear when you talk to Susan Bennett and hear her sound just like Siri. But it's even clearer when she breaks character and starts to laugh.