Church

Building an Alexa podcast skill

Disclaimer: I’m a software engineer at Amazon, but everything expressed in this post is my own opinion. 

Photo by Andres Urena

What am I building?

Essentially I’m building a podcast skill for the Church I attend (Kingsgate Community Church). The aim is for people to be able to ask Alexa to play the latest sermon and have the device play back the audio. That way people can keep up-to-date if they missed a week at Church.

What I’m building would be easy to adapt (if you just wanted a generic podcast skill) and I’m keeping the code for the back-end on my public Github page.

Building the skill

Alexa skills are pretty simple to create if you use AWS (although you’re not restricted to this). There are two main parts to building a skill

  1. Configure the skill and define the voice commands (called an interaction model)
  2. Create a function (or API) that will provide the back-end functionality for your features

Configuring the skill

Creating a skill is pretty simple, you can visit the Alexa Developer Portal and click on “create a skill” where you’ll be asked some key questions.

  1. Skill type – For my podcast skill I’m using a custom skill so I can control exactly what commands and features I want to support (learn more about skill types)
  2. Name – This is what will be displayed in the skill store so it needs to be clear so that people can find your skill to enable it.
  3. Invocation Name – This is the word (or phrase) that people will say when they want to interact with your skill. If you used the invocation name “MySkill” then users would say “Alexa, ask MySkill…”. It doesn’t matter if someone else is using the same invocation name since users have to enable each skill they want to use. You should check the invocation name guidelines when deciding what to use.
  4. Global fields – The skill I’m building will support audio playback so I tick “yes” to use the audio player features.

Voice Commands (a.k.a the interaction model)

This is one of the parts that people struggle with when creating Alexa Skills. What you are really designing here is the interface for your voice application. You’re defining the voice commands (utterances) that users can say, and what commands (intents) these will be mapped to. If you want to capture a variable / parameter from the voice command, these can be configured into something called a ‘slot’. I find it easiest to think of intents like I would REST API requests. You configure the voice commands that will trigger an intent / request and then your skill handles each different intent just like you would handle different API requests if they came from a different button click command.

The developer portal has a great new tool for managing the interaction model. You can see from the screenshot on the right that I have a number of different intents defined. Some of these are built-in Alexa intents (like AMAZON.Stop) and I have some custom intents too. The main two intents I’ve configured are

  1. PlayLatestSermon – Used to fetch the latest sermon from the podcast RSS feed and start audio playback. Invoked by a user saying “Alexa, ask Kingsgate for the latest sermon”
  2. SermonInfoIntent – Gives details of the podcast title, presenter name, and publication date. Invoked y a customer saying “Alexa, ask Kingsgate for the latest sermon”

Adding an intent is as simple as clicking the add button, selecting the intent name, and then defining the utterances (voice commands) that users can say to trigger that intent. Remember: for custom intents the user will have to prefix your chosen utterance with “Alexa, ask <invocation name>…”, for built in Alexa intents (like stop, next, shuffle) the user doesn’t have to say “Alexa, ask…” first. It’s important to think about this as your user interface and pick utterances that users are likely to say without too much thought. You don’t want people to have to think about what to say so make sure you give lots of variations of phrasings. When you’re done you’ll need to click save and also build the model.

Creating the function

For my skill I’m using an AWS Lambda Function, which is a great way of publishing the code I want to run without having to worry about server instances, configuration or scaling etc.

Creating a skill is simple, just log into the AWS control panel in go to Lambda and then click create function. Pick the name, runtime (for my function I’m using Node.js 6.10 but you can use Go, C#, Java, or Python). Once you’ve created the function it will automatically be given access to a set of AWS resources (logging, events, DynamoDB etc). You’ll need to add a trigger (which defines the AWS components that can execute the Lambda function), since this is an Alexa skill I selected ‘Alexa Skills Kit’. If you click on the box you’ll be given the option to enter the ID of your Alexa skill (which is displayed in the Alexa Developer Portal). This gives an extra protection to make sure only your Alexa skill can execute the function. I’ve also given access to Cloudwatch Events, but I’ll cover this in another up-coming post about automated lambda monitoring).

The code for the lambda function is split into three main parts

  1. The entry point
    This sets up the Alexa SDK with your desired settings and also registers the handlers for different intents you support in different states. You can see the full entry code on Github.

    exports.handler = (event, context, callback) => {
      
        var alexa = Alexa.handler(event, context);
        alexa.appId = constants.appId; // Set your Skill ID here to make sure only your skill can execute this function
        alexa.dynamoDBTableName = constants.dynamoDbTable; // The DynamoDB table is used to store the state (of everything you set in this.attributes[] on a per-user basis).
    
        // Register the handlers you support
        alexa.registerHandlers(
            handlers.startModeIntentHandlers, 
            handlers.playModeIntentHandlers
        );
    
        alexa.execute();
    };
    
  2. The handlers
    The handler code defines each intent that is supported in different states “START_MODE”, “PLAY_MODE” etc. You can also have default code here for unhandled intent. This is a simplified version of the START_MODE handlers, you can see the full version on Github.

    var stateHandlers = {
    
        // Handlers for when the Skill is invoked (we're running in something called "START_MODE"
        startModeIntentHandlers : Alexa.CreateStateHandler(constants.states.START_MODE, {
    
            // This gets executed if we encounter an intent that we haven't defined
            'Unhandled': function() {
    
                var message = "Sorry, I didn't understand your request. Please say, play the latest sermon to listen to the latest sermon.";
                this.response.speak(message).listen(message);
                this.emit(":responseReady");
            },
    
            // This gets called when someone says "Alexa, open <skill name>"
            'LaunchRequest' : function () {
    
                var message = 'Welcome to the Kingsgate Community Church sermon player. You can say, play the latest sermon to listen to the latest sermon.';
                var reprompt = 'You can say, play the latest sermon to listen to the latest sermon.';
    
                this.response.speak(message).listen(reprompt);
                this.emit(':responseReady');
            },
      
            // This is when we get a PlayLatestSermon intent, normally because someone said "Alexa, ask <skill name> for the latest sermon"	
            'PlayLatestSermon' : function () {
                // Set the index to zero so that we're at the first/latest sermon entry
                this.attributes['index'] = 0;
                this.handler.state = constants.states.START_MODE;
    
                // Play the item
                controller.play.call(this);
            }
        
        }),
        ... rest of handler code
  3. The controller
    The controller is called by the handlers and takes care of interacting with the podcast RSS feed, calling the audio player to start playing the podcast on the device etc. The code is too long to show here so I’d suggest looking at the controller code in the Github repo.

Once you’re happy you can upload your skill code (or use the in-line editor). Since I have a few npm dependencies I zipped my function (and the node_modules folder) and uploaded it). You’ll also need to give the name of the function that should be executed when the function is called (for mine it’s index.handler).

You can then edit the configuration of your skill and point it to the ARN of your lambda function. You don’t have to publish the skill to start using it yourself, as long as your Alexa device is using the same account as your Alexa developer account, then you’ll be able to test the skill on your own device.

Progress on the Kingsgate Media Player

Just a quick update on the progress of the Kingsgate media player. Lots of things have been completed, the main feature being that video playback is now supported! Here’s a video showing the app in action on my Amazon FireTV.

There have been a lot of other internal changes too, including:

  1. Implemeted dependency injection using Dagger 2
  2. Added automated builds using Travis-CI
  3. Added new HttpClient, HttpRequest and HttpResponse object using DI

Lots more to come!

Solving a problem, with technology

I’ve been attending a fantastic church, Kingsgate, for over a year now. The level of technology that the church uses is excellent, but there is one problem.

Podcasts.

For ages podcasts have been a pain to use. Some platforms make it easy, like iOS who have a dedicated podcast app where you can search for Kingsgate and you see the podcast. It’s not a bad experience but it’s not great either. For one, you only get the audio feed not the video feed. The logo is also missing and it just looks pretty crappy.

My main issue is that I want to watch the video of the sermons from Church not just listen to the audio. I have a fairly nice TV and I can stream a tonne of great programs from Netflix and Amazon Prime, and BBC iPlayer. So why is it so hard to watch the sermons from church on my TV?

I know there are RSS apps for a lot of different platforms (even the Amazon Fire TV) but I’m not really the target audience. What about my parents, how can they watch the podcast of the sermons on their TV:

  1. Go to the app store of choice
  2. Search for RSS (no “dar-ess”, the letters “R”,”S”,”S”)
  3. Now install one (if you’re lucky there is only one and you don’t have to make a choice)
  4. Now go to settings, and remove any default subscriptions (CNN, BBC etc)
  5. Click add subscription
  6. Now type “http://”
  7. Give up

There must be a better way…

I have an Amazon Fire TV which is essentially and Android box with a better interface. So I’ve decided to create an app that lets you easily access the sermon podcasts on the Amazon Fire TV.

12661840_921537694627093_5182743669736874803_n.jpg

Right now this is totally unofficial, and in no way linked to Kingsgate. The app is currently fetching a list of available sermons. The next steps are:

  1. Full-screen media playback
  2. Keeping track of played / un-played sermons
  3. Porting to iOS (iPad, Apple TV) and Windows 10 (desktop, tablet, windows phone and Xbox One)

All of the code for this is on GitHub so if you want to help out then feel free to fork the repo and send a pull request!

 Scroll to top