Building an Alexa podcast skill

Disclaimer: I’m a software engineer at Amazon, but everything expressed in this post is my own opinion. 

Photo by Andres Urena

What am I building?

Essentially I’m building a podcast skill for the Church I attend (Kingsgate Community Church). The aim is for people to be able to ask Alexa to play the latest sermon and have the device play back the audio. That way people can keep up-to-date if they missed a week at Church.

What I’m building would be easy to adapt (if you just wanted a generic podcast skill) and I’m keeping the code for the back-end on my public Github page.

Building the skill

Alexa skills are pretty simple to create if you use AWS (although you’re not restricted to this). There are two main parts to building a skill

  1. Configure the skill and define the voice commands (called an interaction model)
  2. Create a function (or API) that will provide the back-end functionality for your features

Configuring the skill

Creating a skill is pretty simple, you can visit the Alexa Developer Portal and click on “create a skill” where you’ll be asked some key questions.

  1. Skill type – For my podcast skill I’m using a custom skill so I can control exactly what commands and features I want to support (learn more about skill types)
  2. Name – This is what will be displayed in the skill store so it needs to be clear so that people can find your skill to enable it.
  3. Invocation Name – This is the word (or phrase) that people will say when they want to interact with your skill. If you used the invocation name “MySkill” then users would say “Alexa, ask MySkill…”. It doesn’t matter if someone else is using the same invocation name since users have to enable each skill they want to use. You should check the invocation name guidelines when deciding what to use.
  4. Global fields – The skill I’m building will support audio playback so I tick “yes” to use the audio player features.

Voice Commands (a.k.a the interaction model)

This is one of the parts that people struggle with when creating Alexa Skills. What you are really designing here is the interface for your voice application. You’re defining the voice commands (utterances) that users can say, and what commands (intents) these will be mapped to. If you want to capture a variable / parameter from the voice command, these can be configured into something called a ‘slot’. I find it easiest to think of intents like I would REST API requests. You configure the voice commands that will trigger an intent / request and then your skill handles each different intent just like you would handle different API requests if they came from a different button click command.

The developer portal has a great new tool for managing the interaction model. You can see from the screenshot on the right that I have a number of different intents defined. Some of these are built-in Alexa intents (like AMAZON.Stop) and I have some custom intents too. The main two intents I’ve configured are

  1. PlayLatestSermon – Used to fetch the latest sermon from the podcast RSS feed and start audio playback. Invoked by a user saying “Alexa, ask Kingsgate for the latest sermon”
  2. SermonInfoIntent – Gives details of the podcast title, presenter name, and publication date. Invoked y a customer saying “Alexa, ask Kingsgate for the latest sermon”

Adding an intent is as simple as clicking the add button, selecting the intent name, and then defining the utterances (voice commands) that users can say to trigger that intent. Remember: for custom intents the user will have to prefix your chosen utterance with “Alexa, ask <invocation name>…”, for built in Alexa intents (like stop, next, shuffle) the user doesn’t have to say “Alexa, ask…” first. It’s important to think about this as your user interface and pick utterances that users are likely to say without too much thought. You don’t want people to have to think about what to say so make sure you give lots of variations of phrasings. When you’re done you’ll need to click save and also build the model.

Creating the function

For my skill I’m using an AWS Lambda Function, which is a great way of publishing the code I want to run without having to worry about server instances, configuration or scaling etc.

Creating a skill is simple, just log into the AWS control panel in go to Lambda and then click create function. Pick the name, runtime (for my function I’m using Node.js 6.10 but you can use Go, C#, Java, or Python). Once you’ve created the function it will automatically be given access to a set of AWS resources (logging, events, DynamoDB etc). You’ll need to add a trigger (which defines the AWS components that can execute the Lambda function), since this is an Alexa skill I selected ‘Alexa Skills Kit’. If you click on the box you’ll be given the option to enter the ID of your Alexa skill (which is displayed in the Alexa Developer Portal). This gives an extra protection to make sure only your Alexa skill can execute the function. I’ve also given access to Cloudwatch Events, but I’ll cover this in another up-coming post about automated lambda monitoring).

The code for the lambda function is split into three main parts

  1. The entry point
    This sets up the Alexa SDK with your desired settings and also registers the handlers for different intents you support in different states. You can see the full entry code on Github.

    exports.handler = (event, context, callback) => {
      
        var alexa = Alexa.handler(event, context);
        alexa.appId = constants.appId; // Set your Skill ID here to make sure only your skill can execute this function
        alexa.dynamoDBTableName = constants.dynamoDbTable; // The DynamoDB table is used to store the state (of everything you set in this.attributes[] on a per-user basis).
    
        // Register the handlers you support
        alexa.registerHandlers(
            handlers.startModeIntentHandlers, 
            handlers.playModeIntentHandlers
        );
    
        alexa.execute();
    };
    
  2. The handlers
    The handler code defines each intent that is supported in different states “START_MODE”, “PLAY_MODE” etc. You can also have default code here for unhandled intent. This is a simplified version of the START_MODE handlers, you can see the full version on Github.

    var stateHandlers = {
    
        // Handlers for when the Skill is invoked (we're running in something called "START_MODE"
        startModeIntentHandlers : Alexa.CreateStateHandler(constants.states.START_MODE, {
    
            // This gets executed if we encounter an intent that we haven't defined
            'Unhandled': function() {
    
                var message = "Sorry, I didn't understand your request. Please say, play the latest sermon to listen to the latest sermon.";
                this.response.speak(message).listen(message);
                this.emit(":responseReady");
            },
    
            // This gets called when someone says "Alexa, open <skill name>"
            'LaunchRequest' : function () {
    
                var message = 'Welcome to the Kingsgate Community Church sermon player. You can say, play the latest sermon to listen to the latest sermon.';
                var reprompt = 'You can say, play the latest sermon to listen to the latest sermon.';
    
                this.response.speak(message).listen(reprompt);
                this.emit(':responseReady');
            },
      
            // This is when we get a PlayLatestSermon intent, normally because someone said "Alexa, ask <skill name> for the latest sermon"	
            'PlayLatestSermon' : function () {
                // Set the index to zero so that we're at the first/latest sermon entry
                this.attributes['index'] = 0;
                this.handler.state = constants.states.START_MODE;
    
                // Play the item
                controller.play.call(this);
            }
        
        }),
        ... rest of handler code
  3. The controller
    The controller is called by the handlers and takes care of interacting with the podcast RSS feed, calling the audio player to start playing the podcast on the device etc. The code is too long to show here so I’d suggest looking at the controller code in the Github repo.

Once you’re happy you can upload your skill code (or use the in-line editor). Since I have a few npm dependencies I zipped my function (and the node_modules folder) and uploaded it). You’ll also need to give the name of the function that should be executed when the function is called (for mine it’s index.handler).

You can then edit the configuration of your skill and point it to the ARN of your lambda function. You don’t have to publish the skill to start using it yourself, as long as your Alexa device is using the same account as your Alexa developer account, then you’ll be able to test the skill on your own device.