clock menu more-arrow no yes mobile

Filed under:

Your voice will soon become the primary way to interact with all machines

The Alexa Skills Kit did for the Amazon Echo what the iOS SDK did for the iPhone — open it up as a platform that could conceivably solve every customer demand.

YouTube user LarMots plays rock-paper-scissors with his Amazon Echo.
LarMots / YouTube

Apple did not invent the mobile application. Phones have had apps since the first time a phone had a screen. These first apps, however, lived in a closed ecosystem; the calendar, web browser and games came preloaded and locked down. If you wanted something more, you needed to buy a new phone. App developers worked for or were close partners of the phone developer, and reach was extremely limited.

Then came the iPhone, and the iOS SDK.

Suddenly, the phone was not just a phone, but a platform, a medium in which you could even launch a business. Apps could now move between phone versions (and eventually to other Apple platforms, like the iPad and MacBook), and the barrier to entry dropped substantially. You no longer needed to work for the phone developer or maintain a serious partnership; all you needed was a few bucks and some developer knowledge.

The brilliance of the iOS SDK was rooted in the acknowledgement that it is infeasible and unreasonable to expect Apple to develop all of the software required for a satisfying customer experience. There is simply too much to build in too many domains for any one company to handle it all. By allowing third-party individuals and organizations to develop for the iPhone with ease and simplicity, Apple created a product that could seemingly solve every customer demand across every domain, out of the box.

With the recent explosion of intelligent conversational systems throughout much of the technology world, we have again come to a similar problem space. Many of the earliest attempts at conversational bots have failed (or struggled) largely because they attempted to do too much all by their lonesome. The domain space of a general AI system is as large (or larger) than the mobile space, and the interface is exponentially more complex.

AI needs skills

Amazon was the first company to get this right. Siri was released more than three years before the Echo debuted, but Apple failed to learn from its own success with iOS. Siri remained locked down and inaccessible to Apple’s army of third-party developers, whereas the Echo launched ready for extension. The Alexa Skills Kit would do for the Echo what the iOS SDK did for the iPhone: Make it a platform that could conceivably solve every customer demand.

In the almost two years since the Echo was launched, the Alexa Skills Store has become populated with thousands of skills from hundreds of developers, from banking and finance (CapitalOne, Fidelity) to transportation (Uber, Lyft) to music (Pandora, Spotify) to news (USA Today, Washington Post) to connected home (Nest, Ecobee), and so much more. A companion kit, called Alexa Voice Services, further opened up the possibilities by allowing developers to embed Alexa in third-party devices, spreading the Amazon platform to more devices and services without requiring Amazon to do all the heavy lifting.

Alexa is now the platform; skills are the new apps

It would seem that Apple and others have caught the wind; Siri recently opened up (a little) with SiriKit, and Cortana (Microsoft) and M (Facebook) now have developer options.

This paradigm shift is going to do for voice and conversational interfaces what apps did for the mobile experience. These domain-constrained skills will make bots and assistants appear to be so much smarter than they actually are. The platform will be responsible for all of the backend work (understanding what was said, breaking everything down into entities, sentiments and semantics, and classifying goals) and all of the domain-specific work will happen within the skill, developed by third parties and embedded in the platform.

As the landscape of skills expands and deepens, voice and conversational platforms will become the primary way to interact with all machines. In the near future, you will be able to call for a ride, balance your checkbook, chat with your family, set your favorite shows to record, order groceries and book a vacation entirely with your voice from anywhere in your home, your car, your office — or from your phone.

Bryan Healey is the director of AI at Lola, an on-demand, personal travel service for hotels, flights and anything else you need for your trip. Previously, Healey was a software development manager at Amazon, where he led multiple teams totaling more than 20 engineers on the Alexa team, building large-scale data management and model-building software. Reach him @Bryan_Healey.

This article originally appeared on

Sign up for the newsletter Sign up for Vox Recommends

Get curated picks of the best Vox journalism to read, watch, and listen to every week, from our editors.