Ubiquiti / UniFi Protect Outage (Aug 2020)

I wanted to share a couple of thoughts on the recent UniFi Protect outage that happened on Aug 25th 2020 as it raises a number of questions that I think Ubiquiti need to answer.

Firstly tho, outages happen. It’s not a question of if, but when, and how severe. What’s important is that companies (and individual teams) expect them, and plan for how to handle them and provide quick mitigations (including proactively designing for failure modes). Having the right data, and the correct processes is critical to handling outages.

The incident

The incident was described as “Elevated API Errors” that affected “AmpliFi Cloud & UniFi Protect (AmpliFi Cloud Production API)”. This translated to customers not being able to access the UniFi Protect cameras both locally and remotely.

In total there were two outages that apparently lasted for a total of 8 hours 14 mins (more on this figure later). The first outage was classed as a partial outage, then 11 mins after it was “resolved” a major outage was triggered.

Partial outage

The partial outage lasted for 6 hours, 27 minutes. The status website wasn’t updated until 4 hours into the outage. Meanwhile customers, like me, weren’t able to access devices remotely.

Major outage

The major outage was posted 11 mins after the partial outage was marked as resolved. The status updates were much faster but the timeframe on the status website doesn’t add up as the timeline shows updates over 12 hours and 9 mins. Which makes the total outage 14 hours and 39 mins and their rolling 30 day availability 97.96% (if you calculate this based solely on the status timeline).

Plenty of customers were vocal about the issue, I personally cut support tickets after a couple of hours (as soon as I could work around a separate issue with their ticketing system).

The questions

Let’s be honest, problems like this happen, but there are a number of issues here that Ubiquiti need to address publicly, if they want their customers to have confidence in their cloud systems (a lot of their target customer base are sysadmins and tech professionals after all).

  1. Why did this outage last so long? If it was caused by a change (bad deployment etc) why wasn’t it rolled back quickly? What actions did folks have to take that took so long to get out of the door?
  2. What happened between the two incidents? Why did resolving a partial outage cause an even worse outage? Who is in charge of resolving these incidents and making the decisions about the course of action and assessing the customer impact to help teams make the right decisions?
  3. Why was the blast radius so wide? This was either an automated deployment (see #1 about rollbacks) or a manual change. Either way, why did this have global impact? Are changes not deployed into different regional stacks to isolate faults? Was it deployed too quickly, so the damage was done by the time the issue was apparent? Or, do they just have a single global system, that represents a clear single point of failure?
  4. How can customers still retain access to their local camera systems when the Ubiquiti cloud is down? Customers like the fact that footage is local, but still accessible remotely. So why do both local and remote access require cloud connectivity?
  5. Why does the timeline and availability data not align with their own summary information? How is availability being measured?

I raised my request for a root cause analysis / post-mortem in my support ticket and I was told

Regarding your query, we don’t have any official updates as of now. When we do so it’ll available on community.ui.com

I understand your concern, but if you check the historical uptime we haven’t had any major outages before this.

I think Ubiquiti owe it to their customers to provide more analysis of this outage. It’s not about historic performance, it’s about having confidence in their engineering systems and processes to be sure that you can rely on them.

Automatic monitoring of AWS Lambda functions for an Alexa Skill

If you’re not testing your services in production yourself, then you’re letting your customers test it for you!

Automatic testing of your system means you can catch issues before your customers see them. This doesn’t mean you can skip on unit testing or functional testing, it simply means you have an extra layer to notify you of problems before they affect users.

What to test

The Alexa Skill that I want to test is implemented as an AWS Lambda Function. The Lambda Function will be the focus of the automated testing since that’s where my code lives and where it’s most likely to have a problem. I’m not testing anything to do with voice commands, Echo devices, or the Alexa Skill backend. I’m treating that as a black box, after all I don’t have any way of fixing or debugging issues inside of those systems anyways.

When I’m monitoring the system I want to know the following

  • Is the Lambda function online and working
  • Are dependencies functioning correctly
  • How long does it take to execute (latency)

To test this I’ve decided to take the simplistic approach and use an automated request to my Lambda Function that synthesises a customer request and then use my monitoring systems to identify problems.

This works well for my use case because I have already implemented monitoring and logging for Lambda Function so I have pretty good metrics and most users interact with my skill on a weekly basis so this is likely to highlight problems before users find them.

How to test it

My Lambda function exposes a single handler function in Node.js so to test different behaviours you have to modify the values sent in the request payload (rather than having different APIs for each). There are a couple of different options that I can use to test my function but for simplicity and cost it’s easier to use an AWS CloudWatch Event to automatically trigger my Lambda Function at a set interval and check everything is OK.

Setting up the Cloudwatch event is split into two parts:

  1. Modifying the Lambda Function to allow CloudWatch
  2. Setting up the Event itself to call the function at a set interval

Allowing the function to be invoked is pretty simple. You modify your Lambda Function and add a new trigger from the menu on the left for “CloudWatch Events”, This means your function can be invoked either from an Alexa Skill or from CloudWatch.

You can then go into CloudWatch and add a new event. I’ve opted to trigger this event on a set schedule every 5 minutes. You can also select the target as being a Lambda function and then select the function name from the list, and also the version of the function you want to target (I always select the same version that I have live for users to make sure I’m testing what users are experiencing).

For the input of the event, I used a fixed JSON payload which specifies the intent name that I want to trigger.

  "session": {     
    "new": true,     
    "sessionId": "SessionId.253d1fe4-9af0-45de-b767-ccfea6f0e3d4",     
    "application": {       
      "applicationId": "<YOUR SKILL ID HERE>"     
    "attributes": {},     
    "user": {      
      "userId": "HEALTH-CHECK-USER"     
  "request": {     
    "type": "IntentRequest",     
    "requestId": "EdwRequestId.cb85ea9c-1e57-4226-83b6-f1a1d9e2eb8a",     
    "intent": {      
      "name": "PlayLatestSermon",       
      "slots": {}     
    "locale": "en-GB",     
    "timestamp": "2018-01-14T14:31:58Z"   
  "context": {     
    "AudioPlayer": {       
      "playerActivity": "IDLE"     
    "System": {       
      "application": {         
        "applicationId": "<YOUR SKILL ID HERE>"       
      "user": {         
        "userId": "HEALTH-CHECK-USER"       
      "device": {         
        "supportedInterfaces": {}       
  "version": "1.0" 

Once that’s in place you should have the specified intent being triggered every 5 mins!

Knowing when things go wrong

I’ve already written about monitoring Alexa Skills and creating a dashboard. I’d rather not have to keep checking graphs to know when something is going wrong. So, you can actually use CloudWatch alarms to check metrics for you and send an email when a configured threshold is breached.

When you add a CloudWatch event it automatically logs some metrics for each configured event, including for successful and failed invocations. This is perfect for knowing when the health check failed. I followed the configuration options for the alarm and set this to fail if I have three or more failed invocations in a 15 min period. I’ve also configured some other alarms on errors, and some capacity alarms. One thing that I find helpful is to set an alert when the status is ALARM (when it goes wrong) and also when it’s OK (when it recovers), that way, if you get a blip that triggers the alarm, you’ll also get a follow up telling you it was OK.

The beauty of this approach is that you get automatic traffic testing your code at whatever interval you pick, and you also get notified when something starts to misbehave so you can catch it before your users do helping ensure you have a reliable system and a better experience for your users

Building an Alexa podcast skill

Disclaimer: I’m a software engineer at Amazon, but everything expressed in this post is my own opinion. 

Photo by Andres Urena

What am I building?

Essentially I’m building a podcast skill for the Church I attend (Kingsgate Community Church). The aim is for people to be able to ask Alexa to play the latest sermon and have the device play back the audio. That way people can keep up-to-date if they missed a week at Church.

What I’m building would be easy to adapt (if you just wanted a generic podcast skill) and I’m keeping the code for the back-end on my public Github page.

Building the skill

Alexa skills are pretty simple to create if you use AWS (although you’re not restricted to this). There are two main parts to building a skill

  1. Configure the skill and define the voice commands (called an interaction model)
  2. Create a function (or API) that will provide the back-end functionality for your features

Configuring the skill

Creating a skill is pretty simple, you can visit the Alexa Developer Portal and click on “create a skill” where you’ll be asked some key questions.

  1. Skill type – For my podcast skill I’m using a custom skill so I can control exactly what commands and features I want to support (learn more about skill types)
  2. Name – This is what will be displayed in the skill store so it needs to be clear so that people can find your skill to enable it.
  3. Invocation Name – This is the word (or phrase) that people will say when they want to interact with your skill. If you used the invocation name “MySkill” then users would say “Alexa, ask MySkill…”. It doesn’t matter if someone else is using the same invocation name since users have to enable each skill they want to use. You should check the invocation name guidelines when deciding what to use.
  4. Global fields – The skill I’m building will support audio playback so I tick “yes” to use the audio player features.

Voice Commands (a.k.a the interaction model)

This is one of the parts that people struggle with when creating Alexa Skills. What you are really designing here is the interface for your voice application. You’re defining the voice commands (utterances) that users can say, and what commands (intents) these will be mapped to. If you want to capture a variable / parameter from the voice command, these can be configured into something called a ‘slot’. I find it easiest to think of intents like I would REST API requests. You configure the voice commands that will trigger an intent / request and then your skill handles each different intent just like you would handle different API requests if they came from a different button click command.

The developer portal has a great new tool for managing the interaction model. You can see from the screenshot on the right that I have a number of different intents defined. Some of these are built-in Alexa intents (like AMAZON.Stop) and I have some custom intents too. The main two intents I’ve configured are

  1. PlayLatestSermon – Used to fetch the latest sermon from the podcast RSS feed and start audio playback. Invoked by a user saying “Alexa, ask Kingsgate for the latest sermon”
  2. SermonInfoIntent – Gives details of the podcast title, presenter name, and publication date. Invoked y a customer saying “Alexa, ask Kingsgate for the latest sermon”

Adding an intent is as simple as clicking the add button, selecting the intent name, and then defining the utterances (voice commands) that users can say to trigger that intent. Remember: for custom intents the user will have to prefix your chosen utterance with “Alexa, ask <invocation name>…”, for built in Alexa intents (like stop, next, shuffle) the user doesn’t have to say “Alexa, ask…” first. It’s important to think about this as your user interface and pick utterances that users are likely to say without too much thought. You don’t want people to have to think about what to say so make sure you give lots of variations of phrasings. When you’re done you’ll need to click save and also build the model.

Creating the function

For my skill I’m using an AWS Lambda Function, which is a great way of publishing the code I want to run without having to worry about server instances, configuration or scaling etc.

Creating a skill is simple, just log into the AWS control panel in go to Lambda and then click create function. Pick the name, runtime (for my function I’m using Node.js 6.10 but you can use Go, C#, Java, or Python). Once you’ve created the function it will automatically be given access to a set of AWS resources (logging, events, DynamoDB etc). You’ll need to add a trigger (which defines the AWS components that can execute the Lambda function), since this is an Alexa skill I selected ‘Alexa Skills Kit’. If you click on the box you’ll be given the option to enter the ID of your Alexa skill (which is displayed in the Alexa Developer Portal). This gives an extra protection to make sure only your Alexa skill can execute the function. I’ve also given access to Cloudwatch Events, but I’ll cover this in another up-coming post about automated lambda monitoring).

The code for the lambda function is split into three main parts

  1. The entry point
    This sets up the Alexa SDK with your desired settings and also registers the handlers for different intents you support in different states. You can see the full entry code on Github.

    exports.handler = (event, context, callback) => {
        var alexa = Alexa.handler(event, context);
        alexa.appId = constants.appId; // Set your Skill ID here to make sure only your skill can execute this function
        alexa.dynamoDBTableName = constants.dynamoDbTable; // The DynamoDB table is used to store the state (of everything you set in this.attributes[] on a per-user basis).
        // Register the handlers you support
  2. The handlers
    The handler code defines each intent that is supported in different states “START_MODE”, “PLAY_MODE” etc. You can also have default code here for unhandled intent. This is a simplified version of the START_MODE handlers, you can see the full version on Github.

    var stateHandlers = {
        // Handlers for when the Skill is invoked (we're running in something called "START_MODE"
        startModeIntentHandlers : Alexa.CreateStateHandler(constants.states.START_MODE, {
            // This gets executed if we encounter an intent that we haven't defined
            'Unhandled': function() {
                var message = "Sorry, I didn't understand your request. Please say, play the latest sermon to listen to the latest sermon.";
            // This gets called when someone says "Alexa, open <skill name>"
            'LaunchRequest' : function () {
                var message = 'Welcome to the Kingsgate Community Church sermon player. You can say, play the latest sermon to listen to the latest sermon.';
                var reprompt = 'You can say, play the latest sermon to listen to the latest sermon.';
            // This is when we get a PlayLatestSermon intent, normally because someone said "Alexa, ask <skill name> for the latest sermon"	
            'PlayLatestSermon' : function () {
                // Set the index to zero so that we're at the first/latest sermon entry
                this.attributes['index'] = 0;
                this.handler.state = constants.states.START_MODE;
                // Play the item
        ... rest of handler code
  3. The controller
    The controller is called by the handlers and takes care of interacting with the podcast RSS feed, calling the audio player to start playing the podcast on the device etc. The code is too long to show here so I’d suggest looking at the controller code in the Github repo.

Once you’re happy you can upload your skill code (or use the in-line editor). Since I have a few npm dependencies I zipped my function (and the node_modules folder) and uploaded it). You’ll also need to give the name of the function that should be executed when the function is called (for mine it’s index.handler).

You can then edit the configuration of your skill and point it to the ARN of your lambda function. You don’t have to publish the skill to start using it yourself, as long as your Alexa device is using the same account as your Alexa developer account, then you’ll be able to test the skill on your own device.

Solving a problem, with technology

I’ve been attending a fantastic church, Kingsgate, for over a year now. The level of technology that the church uses is excellent, but there is one problem.


For ages podcasts have been a pain to use. Some platforms make it easy, like iOS who have a dedicated podcast app where you can search for Kingsgate and you see the podcast. It’s not a bad experience but it’s not great either. For one, you only get the audio feed not the video feed. The logo is also missing and it just looks pretty crappy.

My main issue is that I want to watch the video of the sermons from Church not just listen to the audio. I have a fairly nice TV and I can stream a tonne of great programs from Netflix and Amazon Prime, and BBC iPlayer. So why is it so hard to watch the sermons from church on my TV?

I know there are RSS apps for a lot of different platforms (even the Amazon Fire TV) but I’m not really the target audience. What about my parents, how can they watch the podcast of the sermons on their TV:

  1. Go to the app store of choice
  2. Search for RSS (no “dar-ess”, the letters “R”,”S”,”S”)
  3. Now install one (if you’re lucky there is only one and you don’t have to make a choice)
  4. Now go to settings, and remove any default subscriptions (CNN, BBC etc)
  5. Click add subscription
  6. Now type “http://”
  7. Give up

There must be a better way…

I have an Amazon Fire TV which is essentially and Android box with a better interface. So I’ve decided to create an app that lets you easily access the sermon podcasts on the Amazon Fire TV.


Right now this is totally unofficial, and in no way linked to Kingsgate. The app is currently fetching a list of available sermons. The next steps are:

  1. Full-screen media playback
  2. Keeping track of played / un-played sermons
  3. Porting to iOS (iPad, Apple TV) and Windows 10 (desktop, tablet, windows phone and Xbox One)

All of the code for this is on GitHub so if you want to help out then feel free to fork the repo and send a pull request!

The Red Ring of Death

The single phrase that strikes fear into the heart of any Xbox 360 owner, “The Red Ring of Death”, and it struck on Friday evening! There I was, sat playing Call of Duty 4 when without warning the screen turned to solid green. I’ve not had any problems with my Xbox or games crashing in the past so I quickly rebooted it using the front power switch (the console was completly unresponsive). When I switched it back on I was greeted with the following image


Thankfully Microsoft have an extended warranty policy for these problems (generally refered to as a “general hardware failure”), so I’m able to send it off for repair, however the repairs are completed in Germany so I’m hoping that I’ll be able to get the console back in time for Christmas so that I can play Left 4 Dead (even if it is a present from me to me)…

Error 99 – Canon EF 50mm f/1.8 II

About 18 months ago I brought my wife a Canon 50mm prime lens for her birthday and a few months later we took it on holiday to Newquay and were getting some great shots with it, when it suddenly fell apart, and I mean fell apart. We were left with the lens in two pieces, so we sent it back to Canon to be replaced.

Now about 12 months later the lens has developed another fault, only this time it’s much more subtle, and we’re getting “Error 99” displayed on the camera’s LCD screen when trying to take shots, with instructions to turn the camera off, replace the battery, and then try again.

I’ve done this several times and it only fixes the issue in around 50% of cases. I’ve also tried using different memory cards, and a different battery pack (fully charged) and it still has the same error message (intermittently). All of the other lenses are fine, so I’m confident it’s not an issue with the camera body (which is an EOS 400D).

I’ve contacted Canon support and they gave the following response

“Please be advised that “err 99” usually indicates that there is an incompatibility with a third party accessory (lens, external flash or memory card). If you experience this issue with two Canon lenses and/or with two different memory cards, this may indicate a fault with the camera itself and would require servicing.

It’s nice to see Canon are being as helpful as ever, and blaming third party accessories, and trying to get me to pay for a full camera service… I guess I’ll have to contact the Canon UK service centre directly and see if they can service the lens and bypass the support.

Sigma 24-70mm f/2.8 EX DG

Sigma 24-70mm f/2.8 EX DGI’ve been doing some shopping (and dreaming) recently for photography equipment, and I’ve been looking for a new lens that will be great in low-light and also be a good general purpose lens. Especially for portrait and landscape work.

After some searching I decided to go for the Sigma 24-70mm f/2.8 EX DG lens and started to price it up. After chatting to my brother-in-law about the lens it turned out that his lens had developed a minor fault (which meant it wasn’t good enough for him to use professionally) and he kindly offered it to me for free! Which was excellent! The lens fault is pretty minor, and it struggles to focus on objects that are less then 1 meter away. I’ve contacted Sigma and they’re confident they can repair the lens as part of a service for around £70. I’ve been out and about with the lens recently, and it really is excellent! Especially since it has such a wide aperture (f/2.8) throughout the whole focal range.

I should be posting some more pictures on flickr soon and on the new photoblog

 Scroll to top