The Text and Image Utilities in the Alexa Skills Kit Node.js SDK
In the last post in the Dig Deep series, we looked at the new template builders in the Alexa Skills Kit Node.js SDK that allowed us to assemble our templates for the visual representation on the Echo Show. We saw a handful of times utilities to create text and image objects and, while we looked at them from a high-level, we didn’t “dig deep.” We’ll do that in this post.
As a reminder, this is the Dig Deep series, where we look line-by-line at the tools and libraries we use to build voice-first experiences. This is not the place to go for tutorials, but if you want to learn interesting little nuggets about what you use every day, off we go…
First up, we’ve got the text utilities. You might use them like so:
“That’s it?” you ask yourself. “There must be more to it than that!” There is, but only very little. We’ll dig deep into the code and you’ll come out of it both wondering “why?” and being very grateful that it’s there. Life can be funny that way.
Here is the code for the text utilities (the
That’s all there is to it: three static methods.
(If you’re not familiar, static methods are methods that can be used directly off of the class rather than an instance of the class. You would say
TextUtils.makeTextContent rather than
const textUtilsInstances = new TextUtils; textUtilsInstance.makeTextContent.)
Once we get to
makeRichText we’re going to take a mighty detour, so we won’t go from top-to-bottom. Instead, we’ll start with
makeTextContent method is used in situations like
ListTemplate1 where you would have three levels of text:
(Note that in the template above, there is only primary and tertiary text.)
In all templates other than
ListTemplate1, the different levels of text are concatenated so there is, in reality, not much use for them.
This method returns an object with potential keys of
tertiaryText. It checks each argument in order to see if it exists and adds that property if so. The upshot of this is that if you want to skip one of the texts like is done in the list template above, you’ll pass in a falsy value (
null, empty string; it’s your call).
The tricky thing, though, is that you are not sending in text for each of these values. You are sending in text objects that you’ve created with
makePlainText method takes in text and returns an object with two keys. A key of
type has the value of
'PlainText' and the key of
text has the text that was passed in as an argument. This is the point where you say “Yeah, this is really simple, but I’m glad I don’t have to create this object over and over again. Thanks Amazon!”
This method’s a lot less interesting than
makePlainText you could add text and that was it. Quite literally, the text stands alone.
But with rich text… oh boy. Bold! Line breaks! Actions! Rich text is similar to the HTML you wrote twenty years ago (although, I have to add, that in reality this is XML). So forget what you’ve learned about avoiding the use of
<b> for bold or
<i> for italics and check out what you can do with rich text for the Echo Show.
Here’s what you can do inside rich text:
- Font size
- Line break
Bold, Italics, Underline, Line Break, Font Size
<b> for bold,
<i> for italics, and
<u> for . Got it? Got it.
Line breaks are just like you know from HTML—add a break(return) in between lines. Escape it when you’re done:
Font size is interesting. Put away your pixels and don’t even think about reaching for your ems or rems. With the Echo Show, you’re sizing with numbers which in turn correspond to pixel sizes. What’s tricky is that there are just four numbers: 2, 3, 5, and 7. The default size is 3, equivalent to 32px.
In the previous post, I mentioned seeing a skill that “hacked”
BodyTemplate1 in order to get a grid of items. The way this worked was by using rich text and, more specifically, inline images. These are set using the
Note that the source must be absolute (of course) and the width and height are absolute values with no unit. While the height doesn’t have a specified limit, it should fit within the Echo Show screen, which is 600px minus an unknown amount of padding on the top and bottom. The width cannot be larger than 880px, which accounts for the width of the Echo Show minus 72px of padding on the left and the right.
You are not required to add anything to your inline image tag but the
src. If you’re like me, the
alt attribute might seem pretty pointless, as there is no cursor with which to display a tooltip or no “SEO” beenfits to come. Then you realize that the Echo Show comes with a screen reader and that accessibility is important, so you remember to always add the
alt attribute to your inline images.
Actions allow the user to interact with the skill through items on the display templates. Your template might display the user’s shopping cart and have two actions: purchase or cancel.
If a user touches “Purchase” the
Display.ElementSelected event will be sent to your fulfillment. Using the ASK Node.js SDK, you won’t listen for that event precisely. Instead you’ll listen for
ElementSelected. All prefixed events are stripped of their prefix in the
To determine which action was chosen, you’ll look to
this.event.request.token. For the user in this example who wishes to purchase, the value will be
Actions can really set apart your skill on the Echo Show, but don’t forget that Alexa is still a voice-first platform. Don’t expect that the actions will be the primary means of interaction, even when using the Echo Show. Most users will still use the Show from afar and will only touch the screen in rare circumstances.
Because they’re using the Show from afar, images are a useful way to add at-a-glance information to your templates. Doing this will involve the image utilities.
ImageUtils class serves the same purpose as the
TextUtils class, but for images: building the object that will be sent to the Alexa service.
Here is an example of how it would be used:
And here’s the code powering it:
There are two methods to examine:
makeImage accepts the following arguments:
- URL: Must be hosted on HTTPS, JPEG or PNG, and the CORS settings must be configured to allow the Alexa service to access the image.
- width and height: Optional
- size: What? We specified the size with width and height. Do they expect a size in MB? No, this is a size descriptor and is a string that is one of the following:
X_SMALL: 480px x 320px
SMALL: 720px x 480px
MEDIUM: 960px x 640px
LARGE: 1200px x 800px
X_LARGE: 1920px x 1280px
– If you only care about the Echo Show, only provide the
X_SMALL size. If any larger size is provided, it will be given precedence and scaled down.
- description: Used for vision imparied people with the Echo Show.
If you want to provide a size string and a description, but you don’t want to specify a width or a height, set those values to something falsey (empty string, ‘undefined’, ‘null’) as the method checks for their presence before setting the attributes.
After assembling an image object,
makeImage passes it along as a single item in an array with the description to the
makeImages is used by
makeImage but can also be used on its own. If you use it on its own, you’ll have something like this:
Note that the description is per image set, not per image. The code that powers that is:
There’s not much interesting here, other than what we just saw where description is for all of the images and not each individual image.
Here we have it: text and image utilities for the Alexa Skills Kit SDK for Node.js. These utilities assemble the data provided to them into the object that must be sent to the Alexa platform and will be used for Echo Show templates.
That’s it for this post. Until next time…