clock menu more-arrow no yes mobile

Filed under:

What’s next for virtual assistants like Alexa? Maybe buying stuff for you automatically.

Voice tech expert Bret Kinsella says that, eventually, virtual assistants are “just gonna do things on our behalf.”

If you buy something from a Vox link, Vox Media may earn a commission. See our ethics statement.

Amazon Echo Dots Stephen Brashear / Getty Images

When Apple’s Siri and Amazon’s Alexa first came on the scene they were impressive novelties. Over time, though, they got smarter and more popular — and now we’re entering a new phase of voice tech, says editor Bret Kinsella.

On the latest episode of Recode Decode, Kinsella told Recode’s Rani Molla that one of the most interesting trends to watch in his industry is whether virtual assistants will gain more “agency” — that is, the freedom to make decisions without explicit commands from their owners. He cited the example of Google Duplex, an offshoot of Google Assistant that can call restaurants for you if you have a Pixel phone.

“Eventually the voice assistants are gonna get to understand our habits, preferences and likes, and they’re just gonna do things on our behalf,” Kinsella said. “Schedule a hair appointment, find out store hours, those types of things.”

“If it sees something on sale that we bought in the past or they think that we need or we’ve indicated somehow that we need, it might just show up at our door,” he added.

In the near term, however, you have to ask your Alexa or Google Assistant to buy something for you — something that very few Alexa owners are doing, according to an August report in The Information. Kinsella expressed skepticism about that report, noting that many more people had said in Voicebot’s own surveys and others’ that they had used voice shopping at some point.

“The thing that gives me a little bit of a pause around that report is that, consistently, other people or other surveys have shown much higher numbers than that, like 5 or 6X have at least tried it,” he said. “Trial and habits are different things. I don’t know what the number was referring to, but I don’t doubt that there’s some disappointment around how fast this is taking off. It’s a matter of people learning that it’s a thing.”

You can listen to Recode Decode wherever you get your podcasts, including Apple Podcasts, Spotify, Google Podcasts, Pocket Casts and Overcast.

Below, we’ve shared a lightly edited full transcript of Rani’s conversation with Bret.

Rani Molla: I’m here with Bret Kinsella, the editor of, a publication that has to do with all things voice technology. I’ve recently been working on a project about voice because it is the future. It’s how we’re going to interact with machines, before the machines take over. Bret, welcome to Recode Decode.

Bret Kinsella: Thank you for having me.

Let’s start with your history. How did you become this authority on voice technology?

Well, I’ve done a number of different things in technology. I started working in tech in 1996-7, depending on how you count, and worked with big consulting companies, learned business strategy, learned technology deployment — you know, companies like Accenture and Sapient. Worked with a bunch of startups over time and eventually a series of events happened.

I wound up starting a small agency and I really focus on post-A round funded startups. Those companies that are trying to go from 200,000 in revenue to 10 million, so they’ve got some traction but they really need to grow. I was doing that and that led me to start working with a company called Zap Media. Two of the co-founders, I had worked with previously, so they asked me to come in. They were an engineering led company and they really needed to establish themselves and that was a voice-based solution. What they had is interactive audio ads.

About when was this?

This was 2013. This was pre-smart speaker era. The focus was on mobile ads and it was really about being able to deliver this interactivity by voice on mobile devices when you’re listening to streaming music.

So if you’re listening to streaming music, you talk to it, or it just responds to you? What do you mean?

Yeah, if you think about it, the advertisements weren’t very effective in some instances because the call to action is incongruous with the way you’re actually engaging with the media. They would say like, “click to convert,” you know, or, “open a video,” or something. It’s in your pocket or your purse and it’s behind the lock screen and all these other things.

As good as that ad format is, it was like five times better when you could just respond by voice in order to execute the call to action. I learned a lot in that in terms of how voice works. We implemented our own voice models and those types of things and they were a client of mine, and I enjoyed doing that work.

Around 2014, the Echo came out and it was a novelty. 2015, Amazon approached that company and said, “Hey, we’d really like you to support the Echo because you understand this space and we’re looking for people who understand the space.” We talked about it as a team — and I have a background in strategy consulting and have done a lot of startup positioning — I said, “Okay, well, let’s research it, let’s see what it’s like.” We did, and ultimately it led to the company really focusing on that.

In fact, it’s basically all their business now. They’ve got over 1,000 Alexa skills and Google Assistant apps published on behalf of big brands. Some really large ones, as well. A lot of media companies as well.

Through that process, before we even got into that, because I’d done that research, I was talking to some people at Advertising Week and they said, “Oh, this is a good topic. I’d love for you to write a story for us.” I said, “Okay, I’ll write something up,” and that was really popular. It got picked up by and republished by Huffington Post and I had all these people asking me and saying, “Where’d you get this information?”

I said, “Well, it was kind of a pain in the neck.” There weren’t really a lot of people covering it, and people who were just covering surface level. I knew it at a depth because we’d worked from a technical level, as well as I have this background in businesses and how they adopt technology. So I said, “Okay, just as a service to these people who are asking me, I’ll just throw it up on a website.”

“I’m going to start blogging.”

Yeah, a blog like once a week or something like that. It wasn’t going to be big.

You started Voicebot?

I did.

When was this?

September 15, 2016.

You were doing it once a week. What is it now?

We publish 50 to 70 times a month now. We go really deep. It’s a resource. And if you think about it, we do news, but as much as anything we are a chronicle of what’s going on in the industry. A lot of the things we’ll write just so you can go to our search bar, which is much better than Google if you want to learn about this space because Google biases towards all sorts of things that make it hard for you to find what you really want sometimes. We try to cover the things that we think are really important. In 2016 we could cover everything because there wasn’t that much. Today, it’s different.

Let’s get into the history of voice tech. Maybe briefly give us an overview of how we got to where we are today, where we’re talking to our microwaves.

If you think about voice technology, it started long before technology as we know it. I just gave a talk in Chicago and what we talked about was what’s called the Gutenberg parenthesis. If you think about it, for millennia, all we really had was the oral tradition. We spoke. And then in 1440 in Germany, Johannes Gutenberg introduced the printing press. All of a sudden you could deploy text at scale, and it really became the default for learning, for information sharing.

The economics changed tremendously. We had all this text out there floating around, and oral communication moved to the background. What happened was we saw as we moved into technology, text was a dominant paradigm and it was something that was a lot easier for the engineers to make a machine understand.

We had this whole idea. I talk about the “textual stranglehold” that they had. That’s been fine, but now what we’re seeing is because of some of these breakthroughs — and I’ll walk through just a couple of them — is that for the first time computers understand us in the language we normally communicate as opposed to us modifying our behavior around so that the computers can understand us.

In the 1950s, Bell Labs, I think Audrey was the first one that is like the best known. I think IBM had a much more significant breakthrough in the ’60s where it could understand 16 words. I think it was called the Shoe Box. We walked through all of the ...

Great conversation with that, huh?

It was great. In fact I had a shoe phone, I didn’t think about this, a mobile phone later on, but a whole different story. We moved up and I think the next big era really was around the late ’90s and early 2000s. People will recognize like the term Dragon. Dragon dictation system.


Yeah. That was the first one that was really good, and that actually had some natural language understanding in it. I’ve spent some time with some of the people who developed that. Really tremendous technology. That was what people thought it was. It was really mostly dictation as opposed to control and interaction.

Then to move forward to what I would call the modern era, that’s sort of the pre-modern era and Dragon was probably the pinnacle of achievement in that, in that pre-modern era. Then we had the introduction of Siri in 2011, and that blew people’s minds. Right?

Right. Talking to your phone.

Just amazing. Now, Siri had some issues because it actually couldn’t do at the time everything that they said it could do in the TV commercials.


But, still really amazing. Once they re-architected the platform, worked pretty well in most of those use cases. That’s interesting, but if you, if you look at a commercial around the Siri launch, it’s about what Siri can do today. They didn’t really expand it significantly and there’s a lot of reasons for that.

Amazon again blew everybody’s mind with the introduction of the Echo. I see those as the two points of this modern era. First, we had the phone and then we have the Echo, and then that was obviously followed a couple of years later by Google Assistant. Now we just have, this looks like vertical adoption of voice and we have tremendous advances.

Right. Even in the past few weeks you had the Google hardware event. They rolled out one with the screen, sort of like the Echo Show. Even Facebook has a Portal now, which, any opinions on that?

Yeah. I guess a couple of things. First of all, I would say the broader context is ... I just wrote something recently which talked about the idea of “phase one of the modern era voice is over.” This goes back to what Jeff Bezos said at a Recode conference two years ago. He said, “We’re the first batter in the first inning,” when he was being interviewed by Walt Mossberg.

I think that at the time that was basically true, but we’re not there anymore. This market has matured significantly. There’s a lot of players, and what we’re seeing is the second round of players are coming in. The second wave of devices are coming in, the second round of features. Facebook fits into that. I think it’s an interesting solution that they got, that they put out to market.

Just so everyone knows, the thing that makes the Facebook Portal a little different than the rest of these speakers is that it’s meant for video calls and it kind of follows you around the room.

Yeah. Right. The killer app for that is the camera follows you around the room, and it is a nice piece of engineering to do that.

I’d pay to not have that. I will.

That’s also true. Do you want to ... sometimes you want to get away from it, but the video chat is a really good solution there. I think the folks in Cupertino sorta yawn at this. They’re like, “We’ve had Facetime forever. Why is this better?” It is actually better if you’ve used these types of devices. The Echo Show is really an excellent video chat device. I can tell you from experience with people that I’ve given those to that it’s really nice.

Facebook has this challenge in dealing with voice. Most of their content is visual. That’s how they interact. How do they actually bring their assets to a space that’s mostly voice? Video chat is probably a good point of entry for them.

All right, which makes me think about, you know, so all of these are adding screens now, which in my opinion, that makes it seem like there’s a deficit in voice. There’s something that isn’t enough for voice alone to take care of. That’s why they need to add screens to these or at least for certain platforms. Do you agree?

No. I don’t agree. What I think about is what’s best for the user. It’s not like the visual interfaces that we had didn’t have audio. A lot of the visual interfaces allowed us to do other things. They have text and they have audio as well because it was a richer experience. They didn’t have the capabilities to use voice as an input mechanism in that case until recently.

What I tell people is don’t think voice only. You might, sometimes it’s okay to think of voice first, but it’s not necessarily voice only. Some use cases will be voice only because you’re driving, for example, and you don’t want people looking at a screen, but many other use cases, particularly with complex outputs, you deal with data every day, right?

Right. I mean the visual ...

Data is terrible in an audio environment. That’s a perfect example. In particular, it’s a perfect example because actually getting data often requires a complex input, and voice is actually much better at complex inputs than text.

I want you to give us an overview of where we are right now with voice. You know, we’re about to have a holiday season where all sorts of people are going to pick up smart speakers. What percentage of the United States owns a smart speaker right now?

I just did a national survey of U.S. adults and smart speaker ownership and it looks like about 40 or 57 million people own a device as of September.

It’s like a third?

About. Well, it’s actually, that’s going to be closer to like 24 percent of U.S. adults, so about 250 million in the U.S. When we look at that device ownership, that’s grown significantly. In fact, that’s up more than 10 percent. It’s about, since even the beginning of the year. That’s been growing at a tremendous rate.

I think what we’re going to see in this holiday season is we’re going to see more smart speakers purchased, for sure. That’s something that it’s no longer a novelty where people are just buying them, adding to their home. I think a little over half of the people only have one now, so there’s a lot of those people who are going to buy more.

We’re also starting to see voice be a bigger driver for other types of devices, whether they be headphones or appliances, and we’re going to see a lot more of that this holiday season. These include these multi-modal devices, the interactive displays, that are designed to work with voice but also complement it with visual.

On these devices right now as they are, what’s working and what isn’t working?

What’s working? Well, a few things are working, for sure. Utilities are working. Any type of utilitarian interaction — information, I want the weather, conversion, or timers, those types of things, those are far and away the things that users say they’re using most frequently. The other thing that’s really working is media. Media is the killer app of smart speakers and it’s not surprising. They’re speakers, right?

Makes sense.

People always say, “Voice will take off when there is a killer app for it.” Actually, smart speakers have been adopted en masse because they’re great for listening to music or talk radio, other things, podcasts, maybe Recode Decode.

You could always listen, you could stream music before this. You could stream it without having to talk to it.

But it’s so easy. I mean, think about it, right? To set it up, and maybe you had a Sonos System, and so you could go into your phone and you could start things, you could search and that’s great. People who didn’t have that had to make sure it connected to the device properly all the time and all these things. Now you just say, “Alexa, play Renegades by X Ambassadors,” and she does.

How many people or what share of Americans are listening to music on their smart speakers?

I think the number right now is around 80 percent of the people who have a smart speaker say they’re listening to some sort of music or talk radio on a monthly basis. That drops a little bit when you go down to weekly or daily, but it’s a significant portion. Almost everybody who buys these tries to listen to music and then they do listen to music.

The interesting thing here I think is that not only are they listening to music or podcasts, but they’re actually listening to more music and podcasts, right?

Yes, yes. There’s some good data from Edison Research which talks about that. The people who own smart speakers report that they are listening to more music after they purchased the device and they’re listening to more radio as well.

That has a whole lot of repercussions for people who are selling this media or advertising against this media for listening to more of it.

Absolutely. If you think about it, particularly for radio, I think it’s important because there’s a lot of data which shows that radios have left U.S. households over the last 15 years. In fact, the ownership is pretty low among millennials in particular, and we expect that trend to continue.

What happened with smart speakers is that they brought radio right back into the home. That was a great solution for radio because they weren’t present, and all of a sudden it was this like you call up your favorite radio station just by saying it in the morning.

Right. I spoke with some people at NPR and they were saying that, thanks to smart speakers, [they] have seen such growth in listenership and all of it’s accretive. It’s not like they’re losing it somewhere else. It’s just they’re gaining it.

Yeah, it looks like it’s accretive. And it’s not just NPR, some other radio organizations that I’ve worked with ...

Spotify was saying that as well, yeah.

Spotify, think about the Cumulus Network. They’ve got over 300 radio stations on today. Smart speaker listening as a percent of all of their streaming grew by 4X over four months in the holiday season into the first quarter of this year.

Okay. Let’s talk about something that’s kind of working, I think. Smart homes. I know that the introduction of smart speakers has made it easier to set up your smart lights or your smart thermostat. How well is that working? I know it’s driving sales of all these gadgets.

Well, it was a bigger deal a year ago than it is now. A lot of the early adopters of things like smart speakers are also early adopters of things like smart home. They already either had smart home devices that they were controlling with their mobile phones, or they were thinking about it. And the smart speaker was probably the best way to say it was the catalyst for them to get into that.

We find that somewhere in the neighborhood of 20 to 30 percent of smart speaker owners have done something with smart home. That means most of the people who have smart speakers have not. That’s a big opportunity, I think, for the smart phone, smart home device makers, but it’s also a big opportunity for consumers because it’s just another utility that you are able to access from the device once you have it in your home.

One thing I’ve heard from people is that what’s made the growth of smart home devices is that it used to be such a pain in the ass, and now you could turn on Alexa and these things coordinate with each other a bit better than they used to. Is that so?

I think that’s true, and I think it’s going to be more true as we go forward. Just a month ago, Amazon introduced the Smart Plug. The Smart Plug is interesting because it basically self-discovers. You don’t really need to configure it. That’s one of the challenges a lot of people had with smart home. You download the app and you configure it and all these other types of things. Amazon’s trying to make that dead simple, that if you stay within that ecosystem and you already have some of our devices, they just discover each other.

Google’s close to that, anyway, because they have this auto-discovery. It’s not quite as simple because of the way they’ve integrated it.

Yeah. One of the issues is it seems to be that a lot of these devices require a certain set of commands, like, “Alexa, turn on my heat.” That’s not quite natural language. Or depending on the device, you have to say a different incantation.

There’s a couple different things with that. First of all, there’s these rules-based systems that was really what we had with voice recognition in the past. It was looking for certain types of keywords or phrases and from that, it was gonna execute it.

With Alexa, you have to allow for a much more natural language interaction. That means it requires the smartphone makers to do some mapping between what we call intents — what the user wants to do — and what the device can do. Yes, there’s a number of things they have to do. The ones who are constraining your language are the ones that aren’t gonna be as successful. The ones that allow you to talk like you would normally want to talk are the ones that are ultimately gonna do much better.

Are you a smart home person?

Not really. I’ve got a couple things that I don’t use that often, as far as smart home goes.

Do you see this as everyone’s gonna have a smart home in the future, or is this more of a novelty?

I think most people will wind up having a smart home. If you think about just the things like the Ring doorbell or Nest Cam doorbell and those things, those are really popular with people. The idea that you can interact with them by voice is interesting. If you think about the August locks and those types of things where people can do remote entry, there’s a lot of utility there for people.

Let your Airbnb guests in.

That’s right. I’ve been the victim of that in the past. There is a lot of utility there. They’re much better now. You can do these routines now where you can just say one command and it’ll do several different things like “turn on lights as well as the television,” or things like that. You can cluster them.

There’s a lot of features now which are good, but I will tell you that sometimes it just does take longer to say, “Turn off the light in the living room,” than to actually turn off the light in the living room.

I agree, as someone who constantly is telling my Google Home to turn on the TV, listen to this. It would be easier.

“I’m sorry, I can’t control that device.”

Yeah. It’s like, “Man, I should have just used my thumbs.”

Let’s talk about something that I don’t think is working, that’s voice shopping. People have made a lot of noise about how voice is the future of shopping, but it seems like most people who have smart speakers or smart assistants on their phones or whatever devices aren’t using it to buy stuff yet.

Well, yes. I think that’s largely true, but again it’s a new technology, so there’s a learning curve here.

I will say that I’ve been pretty surprised with the numbers. We’ve done several consumer surveys on this over the past year. Consistently, we’re seeing that over one in five, maybe close to one in four people, they say they’ve tried it. Now, it depends on what you’re looking at. There’s the smart speakers, there’s also the smartphone. It also depends on how you define what the shopping experience is.

Right, whether you’re actually buying something or you’re searching or asking about ...

Correct. Correct. The question, ultimately, is there’s this shopping aspect, and then there’s the transaction itself. What we have found in the consumer survey data is that a lot of people actually have purchased things, maybe because of the novelty effect, so they just said, “Okay, Alexa, buy this,” or Google Express has done really quite well. They did it through a little bit of a good promotional trick, is that they gave people $20.

Right. That’s why everyone’s purchased something worth $20.

It was amazing. You see some of these different studies. I’m like, all of the people who bought a Google Home in the fourth quarter of last year got a $20 gift certificate to Walmart, delivered for free to their house. So a lot of people bought batteries or something.

But on a daily basis thing, I saw one of the surveys you had done, this is how many people do this action monthly, ever, daily. Daily shopping’s got to be really low, or weekly shopping even. I guess that would be a better metric.

Yeah, in fact, daily and weekly are probably a rounding error, largely because, except for things like food, most people don’t shop daily. That’s also part of it.

Maybe monthly’s better ...

For this category, monthly probably is better. I do believe that that’s going to significantly increase as people start to adopt either home delivery or pickup of food more. Most of the people are not using that for that today.

The Information put out a piece a few months ago. It had said something to the effect of, “Only 2 percent of Alexa users had used it to shop in 2018.” That’s a far cry from the 20 percent, 25 percent you’re mentioning. What do you make of that?

I look at reports of that — which are unsourced — and I say, “Maybe.” I ask the question and I didn’t get a response because I wanted to understand who it was that they talked to, because they didn’t indicate that it was someone actually from the company. They indicated it was someone who had seen a briefing.

A source who’d seen it.

I wanted to know if they had actually seen the document themselves or it was someone who had seen a presentation somewhere and was relating that. Because there’s a lot of different ways that you can look at the data.

Also, the other thing I was interested in was the timing, because most people actually tried to do the purchase around the time they acquired the device. Depending on when you do it during the year, you say “in the last two quarters,” it’s gonna be a different number.

I guess if you had around the holidays, and you had that $20 off, you’re gonna have done it back in December.

Right, but if you say, “In the last six months,” and you’re taking the survey in July, you’re gonna get different readings.

What I would say is that the thing that gives me a little bit of a pause around that report is that, consistently, other people or other surveys have shown much higher numbers than that, like 5 or 6X have at least tried it. Trial and habits are different things. I don’t know what the number was referring to, but I don’t doubt that there’s some disappointment around how fast this is taking off. It’s a matter of people learning that it’s a thing.

Right, that it exists. But brands, marketing, things like that, they’re not waiting for this to take off.

They’re not.

I talked to a bunch of different companies, CPG companies, food and beverage, things like that, they’re all in for voice already. In my head that’s like they don’t want to miss the boat. They don’t want to be left behind when everyone starts buying things on their mobile phone.

Yeah. There’s a lot of debate about whether voice is a channel or it’s a UI or what it is. What we know is that when one in five people in the country — or one in four, potentially — have access through a specific media, that’s at scale, and that’s what most consumer brands want. They want to be at the places where there’s scale of users. That makes sense.

The other thing with brands is it’s a real issue for them. They haven’t seen this issue since the rise of the internet. When the internet came about, if you didn’t have a website, you literally could not be found on the internet because you only had analog content. You had to create digital content.

When we went to mobile, we didn’t really have that situation because at the very least, there was a browser on the mobile device, and so you could still be found, and you probably still would be found through search.

When you get to voice, people don’t have audio content, at least they don’t have it packaged in a way that you can access through a conversational UI. All these companies are literally silent if they don’t have a voice app. Best-case scenario, they are trusting the voice assistant to deliver the consumer that asks about them to a Wikipedia page, or a position zero.


Something. Best case. Most often that’s not controlled by the brand. Brands want to control that experience. They want to know that their message is getting through. If they don’t have a voice app, there’s no chance for them to do that. And it’s clear that this is something that is not only going to be popular with consumers, it’s clear that it is something that’s popular, and it’s a way that people are using it.

It seems that companies are using voice apps, mostly as the marketing/educational apps or skills or actions, whatever you’d like to call them. Could you give us some examples? I know Tide, for example, it tells you how to get a stain out of your shirt or whatever material. You tell them what material it is, what you got on it, and they say how to get it out. Obviously the idea is eventually, someone’s gonna go buy Tide to go get the stain out of their sneakers.

That’s right. You think about that as really just keep Tide top of mind. If I have a stain, where do I go? When you ask Alexa and Alexa says, “Tide can answer that question for you,” you’re just thinking, “Oh, you just associate Tide with stains.” They’ve captured that moment, they’ve captured that need, and they own that real estate now for their brand and the mind of consumers. They’re just reinforcing that with their Alexa skill.

Some other examples, there’s this interesting thing that Mattress Firm did recently. They include promotions in their voice app. They also have FAQs, if you think about it. But expert tips on how to buy a mattress, what’s important, those types of things. People will wind up there. Then they’ll ask for the promotions, and the promotions are designed to get people in the store.

You think about it, is that part of the buying process? Absolutely. If you show up in a mattress store, there’s a high likelihood you’re gonna leave with a mattress. Most people just don’t browse.

“I’m just browsing, browsing mattresses.”

Right, exactly. Oh, yeah, yeah. It’s not like the Apple store. People don’t just go in to look around and dream.

You’ve mentioned position zero. For shopping, if and when people do end up buying things through voice, that’s gonna be really important because if you type “black shirt” on your phone to search that, you’ll see 50 different results. If you say, “Hey Alexa,” or,”Hey Google, I want to buy a black shirt,” you’re gonna get one, maybe two options that she’s gonna read to you. How do you get to the top of this list? What does this mean for brands that are just trying to be at position zero?

No one really knows how to get to the top of the list. Position zero is a method.

Amazon knows.

Well, yeah. Nobody outside of the people who own algorithms. Eventually those algorithms will be so complicated, they won’t even know.

I had Brad Abrams on my podcast last year. It’s instructive on this topic of voice SEO. What we were able to discern is, first of all, he confirmed that when Google does this — and Amazon wasn’t really doing this in the past, so Google was the only place. Amazon’s more recently come to this recommendation concept. Google had done some testing with two recommendations, never more than two. The vast majority were one, and they were generally leaning towards that being a better experience.

People buy what they first hear. I’ve seen some studies on that. You’re more likely to accept your first offer than ...

Oh yeah. It’s like if you’re an insurance company and someone calls your call center, your conversion rate is like 40 percent. I mean, it’s crazy. That’s why people will pay $200 for a click-through and Google AdWords for insurance, because it’s worth so much money.

If you look at this, position zero is a way ... One of the things, you have a sophisticated audience in this space. Everyone says position zero. Position zero is helpful, depending on how the question is asked. But it is not the first place that the search engines look. And I say search engine being the Google Assistant and Amazon Alexa.

The algorithm that’s putting it to the top.

The algorithms are different. This is really the first major overhaul of the algorithms we’ve seen in over a decade. There’s reserve terms that Google, Amazon, others have said, “Hey, we want to answer this, because we have content.” Then they look at the ability of voice apps to be able to answer the question.

Then they consider things like position zero. You’re much more likely to be able to get a hit by having your own voice app, and optimizing it to be able to answer those questions. Most people just don’t get that because they keep thinking about the tech space world, and that everything’s gonna be like that. Voice actually changes this significantly.

One concern I have now, right now you can’t advertise to be the first result, according to Amazon and Google. You can’t pay to get there like you can in the search results online.

That will change.

That will certainly change for these ad-based platforms. Right now, I think the idea is they’re trying to gain people’s trust before they subvert their trust. What’s gonna happen when there’s one option, and that option is a sponsored option?

Is that option the best option for the consumer?

Good question. No!

Well, it might be. If it is, no one’s gonna care. Everyone’s gonna be happy with it. I think eventually what we’re gonna see is ... Let’s talk about where we’re gonna go, and then we’ll move back to this place.

Eventually what’s going to happen is voice assistants are going to have agency. That’s why all of these big tech players actually care about this. Facebook, they can’t really do what they did with mobile and just say, “I’m not gonna do a phone because I don’t need to own the platform. I can be the most popular app,” because there’s this intermediary. It’s like the browser is controlling things first because there’s someone who’s basically saying, “Hey, this is what is most important.” Eventually the voice assistants are gonna get to understand our habits, preferences and likes, and they’re just gonna do things on our behalf.

If we think about something like Google Duplex, we’re asking it to do something for us. It’s going out into the world to actually to do a task.

That’s where they did the demo and they said, “Schedule a hair appointment,” or ...

Schedule a hair appointment, find out store hours, those types of things. Eventually, that leads to this idea that the voice assistants will just do things for us because they’ll know what our preferences are. And in a world of free returns ...

Two-day delivery, free returns.

That’s right. It’s not big a deal. If it sees something on sale that we bought in the past or they think that we need or we’ve indicated somehow that we need, it might just show up at our door.

I don’t know. I find returning stuff really a pain.

Well, I think so, too. But eventually, that’s gonna happen. Most of the time, it’s probably gonna be right.

Yeah. I guess you could. If it’s extra toilet paper, you’re gonna use it eventually.

That’s right. If you think about it, that’s one of the first areas that Amazon’s leaning in on, and that is consumables. The idea is, as a product seller, can you get something into the shopping cart history? If it’s in the shopping cart history for a product category, your brand, then when someone asks Alexa, that’s going to be recommended. They won’t have to buy it, but it’ll be recommended.

But eventually what you’re gonna see is that the voice assistants are gonna get us on this more regular replenishment, because they see what our habits are.

It’s gonna know what we want before we do.

Let’s talk a little bit about the hardware events that just happened. Amazon, Apple, Google all released new or updated voice gadgets. Could we talk about ... Let’s first talk about Amazon’s microwave, because I think this is … a ridiculous headline thing.

Of course. Why wouldn’t we?

What’s the deal with the microwave? Why did they make a voice microwave?

It’s an incredible headline.

It’s just marketing.

I always have to think back to this idea that Amazon does start most of their strategy sessions with headlines. When they were thinking about how to push this out into the consciousness of both consumers and of device makers, around appliances, they must have thought about the idea that this would sound ridiculous.

We’d already seen voice assistants in refrigerators. So you’re not gonna get a headline out of that. We’ve already seen GE and some of the others create Alexa-enabled ovens, regular convection ovens and those types of things. What else could you do? Absolutely, I think that was part of it.

But, and I wrote about this a couple weeks ago, I really think that the important thing around the microwave is Amazon is showing people what can be done, and they’re showing people how easy it is. You and I talked about this briefly about the idea of the new chip.

Right, they unveiled a new chip. You want to explain what that means?

Oh yeah. Basically, in the past, when you wanted to add voice interaction, even when they had systems on a chip, which Amazon rolled out with Qualcomm at the beginning of this year, you had to build all these things in. There was a lot more work for manufacturers, more expensive, and it was more complicated to do.

Amazon, in their interest in making this as simple as possible for people, said, “Okay, well, why don’t we make a simpler chip that’s even less expensive?” All it really does, it has Wi-Fi connectivity and it has a simple way for your microcontroller to communicate with Alexa devices. You don’t even need to have a microphone on it.

Which is, in a way, I’m glad. I don’t want a microphone in my microwave.

Yeah, I think that’s right. Once you start to add microphones to things, it’s really complicated.

And expensive.

It’s expensive and just the engineering alone. There’s a reason why there’s six and eight microphone arrays on a lot of these smart speakers. It’s not the same as a smartphone because of the near field. You’re close to it, it’s much more forgiving.

One of the big risks was gonna be these microphones would be in all these devices that might not be well-engineered to listen to you across the room, have all this type of interference because they’re microwaves, or have a lot of metal in those types of things, which can interfere.

So what they did is they said, “Oh, let’s make something even easier.” All you have to do is you can put a button ... Anyone can put in an actuator with a button, pretty simple, inexpensive. It will connect to this chip. It will activate something or because it self ... because it’s got that Wi-Fi in it and it can actually self-integrate with or automatically integrate with Alexa devices, you can automatically start talking to it once it’s installed and has power.

Okay, so this isn’t about selling microwaves for Amazon necessarily, unless it really takes off.

They’re going to sell. They’ll wind up selling a lot of microwaves because it’s a commodity business. So you look at the microwaves, you’re like how do you decide what the microwave is? There’s very few people that know the features.

I want the one that talks to me.

Yeah. It’s one thing when you look at a commodity, sometimes you just need one thing to make it stand out.

And it’s $60. I guess that’s a pretty reasonable price.

Pretty inexpensive. I think they’re going to sell a lot of them. I don’t think they chose microwaves for that reason. I think they chose it I think in part for the micro ... or in part for the ...


The headline. I think they do know their product categories and they knew that would be one where they’d probably have a little bit of success. But in the end, I wanted them to ... or I believe they wanted to send a message to appliance makers that “you need to get onboard with this. I’m going to show you how successful this is by a product I made. I will start moving into every product category that you don’t deploy in because I think it’s this important. But I won’t if you build it first.”

So it’s more about showing other hardware makers what they can do and now can do more cheaply. It’s pushing them there as opposed to necessarily trying to ... Amazon trying to get in the microwave business.

Absolutely. It’s a reference design. I mean they’ll sell anything, but they don’t really necessarily want to be in all product lines.

Do you want to mention a few of the other hardware things that Amazon — we can talk about Apple and Google as well — but that Amazon also rolled out of this one.

Amazon’s had some really interesting things. Echo Auto is a way to ostensibly put Alexa in your car by ... It’s just a small little device with a number of microphones that you can put on your dashboard. It’s not officially available yet. They’re in a limited trial. So it’s invite only. But I think that’s going to be ... that’s going to be popular with a certain segment. It’s basically a stop-gap mechanism until they get into the dashboard. But that takes years for the product life cycle for cars.

The subwoofer, I question whether that will be popular, but this is Amazon’s attempt to say, “We don’t need to do Google Home Max. Just take your regular Echo. It’s got a good high range anyway. The treble works fine. But we’re just going to let you have this large cylinder that’s got a lot of bass and pushes a lot of air.” So that was interesting.

Echo Input is for devices that don’t have a microphone but have audio output. So it’s a simple way to bring other types of devices in.

So what would that be? What would you use that for?

Stereo system.

Okay. Got it. So I can talk to my stereo.

Yeah, exactly.

Got it.

Yeah. Then they have the new DVR system and that’s really for cord cutters. I expect that one will probably be among the most popular.


So those are really interesting. Then they did some updates and those types of things as well.

So what about Google and Apple and even Facebook, for that example? Did they come out with anything new? It seemed like a lot of updates. Google Show or the ...

Google did not … Yeah, so the Home Hub is their smart display. It’s not as robust as, let’s say, the Lenovo Smart Display, which also has Google Assistant and as of this week has all the same features that the Home Hub has. But it’s designed to be small. It won’t do video chat. It’s really much more of a smart home aficionado type of tool. I think that’s really where it is.

Google really did not announce much in the voice space. It was really much more about Pixel. They have some new interface designs. Google’s killing it when it comes to the user interface with voice on mobile and the multi-modal through Google Assistant, which is also available on iOS. I actually ... That’s all I use for search now is Google Assistant when I’m on the phone.

I saw some prediction it was ... in the next five years, half of all searches will be through voice.

Yeah. Some people are saying it will be within two years.

It’s tough to know.

Comscore I think has it at 2020 now. They had it at 2022. I think they moved it up. But yeah, it is much better just because typing, particularly on the phone, is not good. And it’s really good. It’s very powerful.

But I will say just like in closing, Google made a ton of announcements in the first half of the year. I mean, they’re rolling out in 30 languages as opposed to five that some of their competitors are supporting. They’ve got Google Duplex, which is the bigger news it’s going to roll out in November in four cities. So they’ve done a lot already this year. But they are not necessarily going to build as many devices as Amazon. As I think they’re going to rely much more of their partner network.

Then Apple, I know Apple has made it a pain for a lot of other outside hardware makers to make devices for them.

Well, they don’t allow them to.


Is that what you mean by “a pain?”

Well no. I mean you can ... Some ... They partner with some people but some ...

Like for a charging station?



So Apple wants to do all their own hardware.


What did they announce? Anything new?

Not at their hardware event. I mean, well, they’ve got the Series Shortcuts, which they announced originally in June and they demonstrated it more recently. It’s not really a voice solution. It’s not really an AI solution but it is clever. I think some people will enjoy that as a tool.

They do have another hardware event coming up in the next couple weeks. It’s possible they might say something about AirPods. It’s possible they may also talk about a smaller, less expensive Google or Apple HomePod. So those are possible.

But I think for Apple the key areas are the AirPods, the phone and the watch. That’s really what they’re focused on. The watch, I think, is an underappreciated voice input tool because it could be the type of thing that means we don’t have to bring our phone with us anymore everywhere. The problem has been manual input on the small screen. Voice input really takes care of that.

Right. You don’t have to tap on a little screen.

That’s right.

So everything in our homes ... We’re getting all these smart speakers. I want to talk about the privacy issue here of having just a bunch of different devices with microphones and Wi-Fi in your home. Is this just a nightmare waiting to happen?

Well, I have a theory that Americans don’t care about privacy. That Americans talk about privacy but all of their actions over the last 20 years have suggested to me that they’ll trade it for convenience all the time.

So people are going to buy Facebook’s Portal even though Facebook recently ...

I don’t know if they’ll go that far. It’s got to have utility beyond just what it does today. But I think there’s a lot of Facebook aficionados who wouldn’t have any reservation about doing that in particular if they integrated Facebook with Facebook Portal, which they have not done.

Yeah, strange. It’s just a ...

It’ll be an update.


It’s complicated to do.

I’m sure.

Yeah. People underestimate how complex it is to ... because it’s not just getting the technology right. You have to get all the use cases right and use cases are totally different than what you thought they were once you start doing the voice interaction.

That’s where something like the Echo Show really ... I thought it was astounding at how well it engineered that for voice use cases with the screen. I think that’s instructive for people who will come afterwards. So we’ll see.

As far as privacy goes, I would just say that I’m happy that the companies are doing what they’re doing and with the wake words, keeping it on device. I understand some people don’t trust that.

When you say “wake words” you mean ...

The only thing that they’re listening for is the wake word which would be “Alexa” or Google Assistant, the activation phrase.

”Hey, Google.”

”Hey, Google.” So they’re only listening for that and that’s stored locally so that doesn’t go to the cloud unless you say that. Then the speech that comes after that is in the cloud.

So what we’ve seen is there’s a lot of law enforcement that have tried to get records for like, “Oh there’s an Alexa in the house.” What they find is there’s really not a lot of information there because it’s only when people are interacting with it. But I understand why people would be concerned with that and if they are, they can just ... They can delete the app. They can get rid of their device. They can unplug it.

Throw it out the window.

They can unplug it. The one thing I will say with the early Echo, which I thought was a great move on their part, is they actually had a mechanical cut-off for the microphone. So when you click mute it actually mechanically disconnects the microphone from the device. That I think is a nice gesture towards privacy. But until we see that it’s a problem, most people are going to ignore it.

All right. In closing, I want to ask you just one more thing. Tell me the future. What’s the future of voice and where is it going? Where is it going to be five years from now? Easy question.

Okay. Well, I think that voice is not going to displace screens but it will displace the amount of time we interact with screens at least through touch and through typing. That’s just inevitable. It’s much easier than the other things that we’ve done in the past, so that’s the first thing I would say.

The second thing I’ll say is I do believe that in addition to using voice more we’re going to start using a lot of different use cases that we haven’t had in the past and that these assistants are going to start doing things for us. Google Duplex is a perfect example. We talked about this idea of agency. They’re going to do things on our behalf. Sometimes we’re going to ask them. Sometimes they’re going to do it for us. We’re going to be happy about that.

The final thing I’ll say is that a real revolution’s going to be voice interaction with screens. There’s going to be screens around us in the places that we go and we’re going to be able to use our voice and interact with them and get a personalized experience without having to carry the screen in our pocket all the time.

All right. So voice is going to take over, the voice robots are going to take over and we’re going to be happy with it.

I’m not a believer in singularity. I know there’s some very smart people who are. But yeah, I think voice is going to be very, very common but it’s not going to displace visual because we are visual people.

All right. Bret, it was great talking with you. Thanks for coming on the show.

This article originally appeared on

Sign up for the newsletter Sign up for Vox Recommends

Get curated picks of the best Vox journalism to read, watch, and listen to every week, from our editors.