AWS re:Invent Day 2 Notes

Day two of AWS re:Invent saw a couple of sessions, some time speaking with people about Algolia and our work with Alexa, and the Alexa Devs meetup, where there was a good smattering of people working on Alexa. Here are the notes from today’s sessions:

The State of the Science
Building a Voice-Enabled Customer Service Chatbot Using Amazon Lex and Amazon Polly

Alexa: The State of the Science

Presented by Rohit Prasad and Ashwin Ram, both of Amazon, with a recording introduction and outro by William Shatner. A lot of meat in this session.

Prasad said that one of the “indicators of AI” is the speed of developing new skills—and the quality of those additions. He was referring here to “internal” skills, of which Amazon has tried to expand, including on the side of music, which is the most-used Alexa feature (unsurprisingly). Of course, third-party skills enrich the user experience, too, and the count of those has increased five times since the last AWS re:Invent, jumping from 5,000 to 25,000 third party skills. User interaction has increased in lockstep, with five times as many weekly active skill interactions over the past year.

Amazon has focused on “deeper contextual learning” recently to improve the Alexa experience. The key word here is context. This includes context across turns, multi-modal context, evolving device context, and personal context.

Context across turns: Alexa remembering information across interactions. Alexa handles this in a probabilistic manner, carrying intent and entities and merging them with the information from the next utterance.
Multi-modal context: as Echo and Alexa-enabled devices move into different form factors, how does Alexa handle this media? This context understands that a user might be referring to something on a screen or something spoken.
Evolving device context: does Alexa know the device capabilities as the user interacts with different devices? The example here was asking for a title on the Echo would bring up the audio book, while asking for the same title on the Fire TV will play the movie.
Personal context: understanding who is speaking to Alexa, and information about that person. This is already happening in first-party skills and will be rolled out to third-party skills in 2018.

There were three big announcements that are relevant to Alexa skill developers:

Expansion of Your Voice (person recognition) to third-party developers in early 2018
Expansion of notifications developer preview—no extra information was provided
Skill discovery through “invocationless” usage

The skill discovery is a big one for users, but likely not something that third party developers will have control over. Users can ask “Alexa, start a metronome at 100 beats per minute” and Alexa will pick a relevant third party skill even if it isn’t enabled for the user. The user can also ask for a group of skills, like “Alexa, let’s play a game” and Alexa will choose the best skill for that user. This is all done by first short-listing potential related skills for the request, and then matching the utterance to intents across all of the short-list. This feature is already rolling out.

The final part of the session was the awarding of the prizes to the Alexa prize finalists, with the presentation by Ashwin Ram of Amazon.

The Alexa prize was a university competition to build conversational skills that led to interactions with an average time of over twenty minutes. People could connect to one of the short-listed skills at random by saying “Alexa, let’s chat.” The three finalists were the University of Washington (Sounding Boarding), Czech Tech University (Alquist), and Heriot-Watt (What’s Up Bot).

The winner was the University of Washington, with an average duration of over 10 minutes and a rating of 3.17 out of 5. CTU came in second with 3 minutes, 55 seconds of interaction. Heriot-Watt came in third with 4 minutes and 1 second of duration, but a lower overall score. Because the goal was a greater than 20 minute interaction time and no team achieved that, Amazon is re-opening the Alexa prize again.

Building a Voice-Enabled Customer Service Chatbot Using Amazon Lex and Amazon Polly

This session didn’t address voice as much as I expected (very little, overall). Mostly the focus was on the high-level considerations of building bots.

Most people start off building bots by building internal employee support. This is a use case I’ve never really thought of, but it makes sense. I could see us using this at Algolia, as we’re adding multiple people a week across many different offices. We already have an internal service for searching relevant content and use Slack bots extensively (including one that helps co-ordinate on-site interviews, but is only a “push” experience).

The presenters noted that focus is important when building bots, because people will want to take it in every direction. I suspect this has to do with the text-based nature. People have learned how much work goes into building and designing new pages for websites, but think chatbots are as simple as “Just write a new response.” However, every new feature adds new sample utterances, which can conflict with others—a reason for focus and for testing.