Year in Review: 2017 in Voice-First News for Alexa Developers

This was a huge year for voice-first. How big? When putting together this blog post, there was so many things that made me say, “No, that couldn’t be in the past twelve… really? Huh.” Yeah, it’s been that kind of year. And that was all before Amazon kicked off December with announcement after announcement at AWS re:Invent 2017. Let’s look back at the biggest news for voice-first developers in 2017 by starting with news from Amazon. Recaps from Google and others will come next, but we’ll begin with the biggest player in voice-first.

Wow, I’m sure I’m going to miss something here. There were so many new devices, APIs, and tools to keep up with. My takeaway from AWS re:Invent is that this was the year that Alexa grew up. We saw voice-first become dining table technology in 2016, and this year was about third-party developers keeping that growth going. New form factors, big brands embracing voice-first, and new developer tools all drove major growth over the past twelve months.

Echo Look, Echo Show, Echo Spot

Have you forgotten about the Echo Look already? It’s still available only through an invitation, and I’ve never met anyone who has had one. The selling point for this odd device was that it had a camera (the first, but we’ll see below, not the last Echo device to sport one) and AI that could help you determine the best outfit to wear. While this, combined with my dressed-down social circle, could explain why I’ve never met anyone who has the device, the Look also received negative reviews and costs more than all but one other Echo device.

That one other, more expensive, Echo device is the Echo Show. The second Echo device with a camera, this was the first one with a display. For users, the effect was clear: being able to watch videos or video chat with loved ones. For developers, this meant the new API for video skills and a new type of display elements, called display templates. Right now these templates are very limited (though, with time and skill, you can display anything as an image), but interestingly, I’ve seen a survey from the Alexa Skills Kit team asking about developer interest in a more configurable way to markup display content. There’s no guarantee that it will get through, but they’re clearly thinking about it, so let them know if you’re interested.

The summer and fall brought rumors that other new Echo devices were to be released, coming to a head when Amazon stopped selling the original Echo. Not long after, we got a brand new Echo, the Echo Plus, and the Echo Spot. These compliment the Echo Dot and Echo Show, plus all of the third-party Alexa devices out there.

The base Echo has been redesigned and comes in at a lower price. Now just $99, the Echo is stockier and designed to fit in more with a modern home. To make that clear, look at the different purchasing options. Gone is “any color so long as it’s black.” Now, there’s fabric, brushed silver, and wood-finished. And if you’re curious what the killer app is for voice-first devices, look at the option that bundles the Philips Hue.

The Echo Plus takes that even further. The Plus comes with a Hue bulb at no added cost, but it also includes the ZigBee hub directly in the device itself. I’d love to see Amazon’s sales numbers. If it were me and I were just starting, this is the device I’d buy. I’m looking at my Philips Hue hub taking up space beneath the television amongst a bunch of other wires and I’d jump for the opportunity to free up some space and ultimately save money. For developers, this is an indication that smart home is key for voice-first.

The Echo Spot is a device I don’t know how to place. It can either be an indication that the Echo Show hasn’t been selling very well (and Amazon suspects it’s because of the form factor) or that the Echo Show has been selling really well (and Amazon wants to try a new size). Everyone I’ve spoken with who has a Show loves it. However, developers I’ve spoken with have said that they don’t see much traction for Show-specific features. Developers need to be cognizant of the Spot because it uses the same templates as the Show, but handles them differently due to the different size.

Third-Party Alexa Devices

Developers need to remember that Alexa is a platform_. This is clear in the third-party devices that have been released with Alexa. A speaker for the video game Destiny 2tl?ie=UTF8&linkCode=ll1&tag=dcoatescom-20&linkId=338c96a5ea60821a9d6f819b60faf20c, Sonos One, the beautiful and unnecessary GE Sol, or the 4th-gen Moto X were all products that came along “Alexa-enabled.” Other companies added Alexa into their existing products, like Bragi, who make the best wireless earbuds. Motorola even created a specialized Alexa mod for its Moto Z line—which incidentally cost more than buying an Echo itself.

Alexa Gadgets

Amazon moved beyond voice for output with the Echo Show and Echo Spot. They also moved beyond voice for input with the announcement of the Alexa Gadgets API. The Gadgets API is a new input mechanism to create physical items that interact with Alexa. In truth, they can provide a limited output as well. The “proof of concept” is the Echo Button. I’m very bullish on developing for the Echo Buttons, although it will always need to be just an addition and never the main star. Buttons came from the games group of Alexa, so it shows you their perspective on how it will be used, but that doesn’t mean that it has to be used only for games. The API isn’t public for developers yet, and it takes a while to wrap your mind around. Once you do, it can be fun to work with. Watch out here in early 2018 for a guide on how to build gadget skills.

Alexa for Business and Microsoft Partnership

If Alexa’s growing up, it’s long-past time to enter the workforce. This came with re:Invent with the announcement of Alexa for Business. Alexa for Business features tools for setting up a host of Alexa devices at one time, managing users, assigning device-specific information (e.g. the location inside an office), and creating skills specific to an organization. These last two are most relevant to developers. Now, developers can build skills for their companies and those skills can tap into individual devices differently. This location information can be accessible by public skills, too. For most developers that’s not very interesting, but for a company that builds services for other companies this opens up a lot of doors. One example could be a way-finding skill that uses information provided by the business to direct visitors.

Speaking of business, I’m going to note the Amazon/Microsoft partnership here because Microsoft is rather “business-y” and, well, that’s my justification for not giving this its own section. An announcement that seemed so consequential at the time hasn’t moved forward in any public kind of way. Users will be able to use Cortana skills on Alexa, and Alexa skills on Cortana. What this means for developers is still unseen.

Kids Skills

Alexa may had grown up in 2017, but Alexa also was there for the kids, too. Starting in August, developers could build skills targeted toward kids under the age of 13. This was followed later in the fall with a developer challenge to build children-oriented skills. Not much is different for developers; you could ignore this news and nothing would change for you. But for certain developers or companies, this opens up a new opportunity to interact with children and their families, at least in the US. This will open up Alexa to a new group of skills.

Skills Growth and New Regions

Not that Alexa needed any help adding new skills. In roughly the past year, the number of third-party skills has grown five-fold, from 5,000 to 25,000. There is evidence that the number of new skills continues to slow, however anecdotally I would wager that the slowdown has come from “template skills” that flooded the skill store early on. Alexa has been growing internationally, too. In the past year, Alexa has announced or expanded into Canada, India, Australia, and New Zealand. Recently, Amazon has announced that they will ship to countries where Alexa previously wasn’t available, although they won’t officially support these new countries—for example, no Spanish language support just yet.

Monetization and AWS Credits

As the ecosystem has grown, developers increasingly want to know how they can make money. One way has been to use account linking, where a user creates an account on a website, enables a skill, and connects the two together. This is a lot of friction and, if user reviews are a guide, has left many frustrated users. Another way is to be rewarded directly by Amazon, if you’re a developer with a highly-engaging skill. This was released first in May, and then expanded in the fall. Later, Amazon announced that developers will be able to monetize their skills directly through paid subscriptions and one-off purchases. The monetization API is still in closed beta, but has already been used by the Jeopardy skill, which was also made free for Prime subscribers.

If most developers won’t yet be able to make money with their skills, Amazon wants to ensure that they don’t lose money. Or at least those on AWS. This was done through a $100 a month AWS credit for any developer that incurred charges over the previous month. This is enough to cover the costs of nearly all amateur skill developers—just don’t forget to actually apply it to your account, as it isn’t done automatically.

Beta Testing, Automated Testing, a New Testing Simulator, and the ASK CLI

2017 saw new functionality around testing. One was the long-requested beta testing that allowed developers to share an unpublished skill with up to 500 other people. This isn’t the private skills that people want—I don’t think that will ever come about and Alexa for Business doesn’t fulfill the wishes of most asking for it—as the beta test can only last up to 90 days. Nonetheless, this is a good way for groups of people to work together on a skill without the overhead of setting up an Alexa for Business group.

A new testing simulator appeared mid-way through December that builds upon the existing tool, adding entity resolution, dialog management, and voice input. An in-browser tool is never going to be able to fulfill the needs of everyone. That’s where the newly released Skill Management API (SMAPI) comes in. It does a lot, but one of the things that it does is provide an API for skill invocation and skill simulation. Skill invocation goes directly to the Lambda endpoint, while simulation will go through the Alexa service first. This is a boon for developers who want more automated testing capabilities.

The SMAPI also powers the ASK Command Line Interface. This might be the thing I’m most happy about at the end of 2017. I am, of course, an unabashed fan of CLI tools, but this also makes skill development so much easier. One of the things I like the most about it is that it also introduced a project structure with easy-to-follow conventions, so that moving from one project to another is much simpler than before when the skill model might have been completely separate from the skill fulfillment (and maybe not even version controlled).

SSML and Speechcons

Alexa, SHOUT! Alexa, whisper. Or speak really fast or slowly or… Amazon brought enhanced SSML capabilities to Alexa this year. This gives developers:

  • Prosody: rate, volume, and pitch of Alexa’s response
  • Emphasis: sugar on top of prosody, slowing down and increasing the volume when you want Alexa to emphasize something
  • Expletive: I can’t believe Alexa just said “***************!”
  • Whisper: I find Alexa’s whispering a little creepy, but… to each their own
  • Sub: have Alexa say something different than what’s in the text

What if you want Alexa to say “bah humbug” or “ruh roh” or “kerplop,” but you want it with gusto? Well, speechcons, my friend, are what you’re looking for. My opinion? Speechcons work fine on their own or with plenty of dead-air on either side of them. Otherwise it feels jarring—think twice about whether you need them.

New APIs: device address, skill events, notifications, voice recognition, and progressive response

And here it is: the meaty-techy portion of the 2017 news. New APIs that allow developers to do more with their skills. These new APIs included tools for smart home devices, a way to get the Echo device address, or user lists. These three won’t be useful for most skill developers, but are invaluable for devs that need that information. More widely useful are skill events. These trigger your skill’s fulfillment when your user takes certain actions related to the skill. The one I find the most interesting for a “skill analytics” perspective is the “skill disabled event,” that triggers whenever a user has decided to remove your skill. I don’t suspect users do this very often: disabling skills isn’t easy and there’s no “available storage space” to be concerned about. However, it will give you a better picture of your skill’s reach.

Entity Resolution

The last one is huge for Alexa devs: entity resolution. Another way to think of entity resolution is to think of them as synonyms. No longer does your code need to check if a given slot matches a group of values—for example, is “grand” included in your “array of good values?” Now “grand” would be a synonym for “good” and you can check the canonical value and branch off from there. This can also be done with phrases, so that “tossed and turned” could be a synonym for “poor” when asking the user how well she slept. Or, use entity resolution for error correction. If Alexa keeps hearing “hello” instead of “yellow,” add it as a synonym to match that canonical color value.

There was so much going on with Alexa this year that keeping up was difficult. I’m sure there are things I missed here. What were they? Let me know on Twitter. Here’s hoping that 2018 is just as feature-packed and your skills rise to the top.