ResponseBuilder: This is what makes the Alexa Skills Kit SDK for Node.js talk
Well, just like that, it happened. We’ve had our first change in code during the Dig Deep series. Amazon made a (welcome) update to the response builder just a couple of days after this post went up. I’ll create an updated post soon. In the meantime, use this guide if you’re using the existing NodeJS SDK.
Without building a response, it seems like we couldn’t make Alexa speak. It’s interesting, however, to note that
ResponseBuilder only came about in August of 2016—just over a year ago. While the impetus was supporting long-form audio, it also alleviated a lot of the annoying manual response building that we had to do before. We’ll look at it in-depth today.
As a reminder, this is the Dig Deep series, where we look line-by-line at the tools and libraries we use to build voice-first experiences. This is not the place to go for tutorials, but if you want to learn interesting little nuggets about what you use every day, off we go…
ResponseBuilder function is there primarily so we can build our responses ourselves. This will either be as a replacement for emitting events like
:tell, etc. or for playing audio. We see the code below in full (along with a few other functions).
First off, we’ve got a couple of methods up-front.
isObject does exactly what you would expect and we covered
IsOverridden in a previous post so I won’t go over it again here. But this is the Dig Deep series, where we look at code line-by-line together, so I’d be remiss in skipping over these. Otherwise, boring! Let’s get to the fun stuff.
ResponseBuilder function is a long one, clocking in at nearly 120 lines in total. The good thing for us, though, is that it’s so long only because it’s building a long object. So there won’t be a ton of complexity here. There is, meanwhile, a lot to learn in terms of how we make Alexa talk.
First we’ve got the setup that’s common to all response types. We set the response type to be
self.response, which right now is an empty object which we saw when we looked at
The version is set to 1.0. This isn’t much different than what’s happening when we connect to DynamoDB, but is in my opinion much cleaner than
2012-08-10 as an API version identifier.
And, finally, we start beuilding a
response object on the
response object. Yearh, it’s a bit weird naming, but let’s roll with it. For simplicity sake, we’ll call it the
response sub-object. It has a single key/value pair to start:
shouldEndSession, which is
shouldEndSession ends the session, doesn’t wait for a user response, turns off the light on the top of the Echo, and will save the session to DynamoDB if you’ve set that up.
Next we have an IIFE that will return to use our response object. It is wrapped inside an IIFE so that we can maintain the value of
this to the response object.
speak is the simplest one. It sets what Alexa will say to whatever is passed in as
speechOutput after wrapping it in
<speak></speak> tags for SSML via the
Finally, this—and all other methods—return
this, to make the method calls chainable.
listen is not too different than
speak except it sets the reprompt (what is said if the user doesn’t respond to the initial prompt from Alexa) and specifies that the session should not be ended. Because it only sets the reprompt and not the prompt,
listen should never be used on its own.
A card is what’s displayed inside the Alexa app on a device or on the web. It is not the same as the templates displayed on the Echo Show. It displays a title and content, plus an optional image.
This method takes three arguments:
cardImage is an object that has one or both of
There are two card types, and which type we’ll use depends on whether we have an image or not. The
Simple card has no image, while the
Standard card does. Both types have a
title, whereas the
Simple card displays text with
content and the
Standard card displays it with
text. I can see the argument for the text not being all of the content in the
Standard card, but it seems to complexify it nonetheless.
Finally, this is all being set as the
card attribute on the
This creates a card for account linking, which we’ll look at in a future post.
Now we’re having fun…
Here’s another example of Amazon giving us multiple ways to do the same thing. Be sure to thank Akshat Shah next time you see him around…
audioPlayer method takes up to six arguments. The first one is always mandatory and is the action you wish to take. The options are
clearQueue. If you provide anything else, you might as well provide
clearQueue because it’s the fallthrough case.
For the remaining, all are mandatory if you’re playing audio. No further are needed if you’re stopping audio. And the second (
behavior) is necessary if you’re clearing the queue.
Because this does the same as the next three combined, we’ll just look directly at those.
audioPlayerPlay method (or, as I like to call it, the Audio Player, Play On method) will play a stream of long-form audio. It is—as all long-form audio capabilties are—unsupported on Fire TV.
The first argument is
behavior, which accepts one of three values:
ENQUEUEWill play the new stream after what is currently in the queue.
REPLACE_ALLReplaces all in the queue, including the currently playing and immediately plays the new stream.
REPLACE_ENQUEUEDReplaces everything in the queue after the current stream that is playing. Does not stop the current stream.
The SDK will not throw an error if you include another value, but don’t do it. Seriously.
The second argument is
url, which is the location of the audio to stream. This must point to an HTTPS URL and can be MP3, AAC, MP4, HLS, PLS, and M3U.
The third argument is
token, which represents the stream and is 1024 characters or less. This is required because of the next argument.
The fourth argument is
expectedPreviousToken. This is, essentially, what stream should come before this one. This is used in situations where the expected behavior and behavior triggered by the user would potentially cause trouble (for example, a user saying “previous track” right as the current track is ending). This is only allowed and is required when the behavior is
ENQUEUE. The SDK will not, but the platform will throw an error otherwise.
The last argument is
offsetInMilliseconds. It’s a timestamp representing where in the stream playback should start. 0, of course, starts at the beginning. A developer might use this in a playback mode where a user is coming back to a certain point (e.g. an individual music track maybe not, a recording of a concert, yes).
Stops the stream. No arguments necessary.
audioPlayerClearQueue will clear the queue following a
clearBehavior. The options for
'CLEAR_ALL'. The difference between the two is that
'CLEAR_ENQUEUED' will clear all after the currently playing stream and continue playback, while
'CLEAR_ALL' will also clear the current stream and stop playback.
Finally, we’ve got
This wraps the text that we want Alexa to say and wraps it in SSML
<speak></speak> tags and sets the output type as
SSML. Believe it or not, in the early days of the SDK, you had to do this yourself. Life’s so much easier now.
That’s it for
ResponseBuilder. In the next post, we’ll examine the rest of
alexa.js. Until then…