Building a live sports platform? In this short article, Chris Wood, Spicy Mango’s CTO, explains that there are some things you might want to think about……
At the end of 2018 I had the pleasure of presenting at SportsPro in Madrid. Amongst the many themes presented – was a trend relating to performance, stability and reliability when it comes to mass audiences at scale.
In live sports, I believe what we deem as the quality of experience to the consumer is critical. Some of you may have noticed that subjects such as latency (offset from live) and the use of (or introduction of) immersive audio are hot topics at the moment. Rightfully so – these are things that really matter to worldwide audiences. Especially those with huge home cinema rooms and an addiction to gambling applications.
I’m proud to say that over the last few years, Spicy Mango has developed architectures and platforms for some significant and global world-wide live sports platforms. In our world, QoE isn’t simply the quality of the resulting video stream, but how consumers view the end to end experience – often reflected through your AppStore ratings and NPS (Net Promoter Scores).
In live sports scenarios, the moments in the run-up to the start of the event generate a peak traffic loading that has a huge impact on the platform and the consumer experience. After all – what good is immersive audio if you can’t get to the game on time? Of the major outages and interruptions that have plagued big brands delivering sports content over the last few years, it’s rarely (if ever) the final element of media delivery.
So what should we look out for when architecting platforms for global reach and scale?
The authentication service is one of several that bears the brunt of peak loading issues. In the final moments in the run-up to any live event beginning, analytics data usually demonstrates that a sudden surge of users heading to the service to prepare their tablets, TV’s or consoles for the main event results in a flurry of authentication and/or session token refresh requests.
For a typical SVOD or TVOD service that sees a peak of only a few thousand concurrent requests per second, this value can be multiplied significantly with the introduction of live sports. Our experience of the Winter Games demonstrated peaks of over 500K/rps (requests per second) in the final few moments before events began. It’s also worth noting that in scenarios of federated or SSO architectures – tunneling this request load to downstream partners may also fundamentally prove to be your bottleneck.
Next on my hit list is the Entitlements service. The Entitlements service translates a complex matrix of content availability (geo-blocking) and offer (package) rules in real time as content is requested by the consumer. In one of our most recent projects, 220 countries + 15 content types + day of the week variation + 4 offer types = a complicated and time-consuming rule set to parse. This results in significant load at the platform. Latency in areas such as SSL termination or retrieving entitlement responses from cache layers or database shards are all things to look out for.
Don’t think that because your consumers can start the event successfully – you’re home and dry. Unlike Authentication that often bears only a one-time peak load of traffic in the minutes before the event, the Entitlements endpoint is also hit (if you’re in a hardened environment), at regular intervals throughout the event.
If linked to your Location Service and perhaps Concurrency Service, it isn’t abnormal to see polling by the player at intervals of up to 30 seconds to ensure users aren’t tunneling through VPN’s or sharing credentials with friends and family. In many environments, a failed entitlement check will result in your consumers being ejected from the live stream. Maintaining 100% uptime of your entitlements endpoints is critical.
Cloud Scaling. This should be your golden egg – the solution to all of your capacity problems. Auto-Scale groups are wonderful – and we’re huge proponents of the out of the box capabilities that our public and private cloud providers offer. That said, in researching for this article, it still surprised just how many big names have been caught out here.
Rules and policies that monitor load in areas such as high response latency or instance CPU still take a few seconds to kick in and generate the additional compute capacity we need. Add a few more minutes of bootstrapping, attaching instances to a load balancer and signing off on health checks, and you’ve just hit an end to end duration of around 3-5 minutes.
Taking our entitlements use case above – you could have just missed the critical start of your live event. If the process you use to add capacity is only triggered at 75%, you’re likely to miss the mark. Pre-warming is a great way to solve this problem. Use your event schedule to script or automate the process of adding additional capacity in the hour or so before your live event starts.
This has been a big topic for Spicy Mango recently – and something we spoke about at the Sports Pro event with our partner Digital Element. Location Services are a critical part of the architecture – relied upon heavily for entitlements logic – but the way in which this core service is implemented can make or break.
It’s become fairly common knowledge that traditional IPv4 addresses are now in such short supply – that TTL (time to live) values have been reduced down from days to hours – meaning that a service provider may assign an IP address used in one part of a region on a Monday, to an entirely different part of the region on a Tuesday. For the consumer, this often means that the IP address they were assigned at the point of subscribing, may now be allocated to a region where playback rights are prevented.
When deploying your location services – it’s becoming ever more critical to ensure that customer services have the ability to bypass, whitelist or adjust an IP address in real time to grant the consumer access to a live event immediately. Moving to an owned and operated Location Service will bring you untold flexibility, help alleviate the capacity limitations offered by cloud-based solutions, and no doubt earn you a few extra points on your NPS score.
These subsystems only represent a fraction of the components in a typical environment – but they’re often the ones that create the most pain. For those embracing cloud and even perhaps on-premise environments in a build it yourself fashion – the challenge is yours. For those that have built platforms around multi-vendor externally hosted SaaS products, the challenge to fortify, scale and harden may require a little more thought.
To find out how Spicy Mango may be able to help you with your platform’s performance then please email us at firstname.lastname@example.org