Technology

Scaling OTT platforms for live sports

Scaling OTT platforms for live sports

Scaling OTT platforms for live sports

Scaling OTT platforms for live sports

Chris Wood

7 min read

|

15 Apr 2019

In live sports, I believe what we deem as the quality of experience to the consumer is critical. Some of you may have noticed that subjects such as latency (offset from live) and the use of (or introduction of) immersive audio are hot topics at the moment. Rightfully so – these are things that really matter to worldwide audiences.

In live sports scenarios, the moments in the run-up to the start of the event generate a peak traffic loading that has a huge impact on the platform and the consumer experience. After all – what good is immersive audio if you can’t get to the game on time? Of the major outages and interruptions that have plagued big brands delivering sports content over the last few years, it’s rarely (if ever) the final element of media delivery.

So what should we look out for when architecting platforms for global reach and scale?

Authentication

The authentication service is one of several that bears the brunt of peak loading issues. In the final moments in the run-up to any live event beginning, analytics data usually demonstrates that a sudden surge of users heading to the service to prepare their tablets, TV’s or consoles for the main event results in a flurry of authentication and/or session token refresh requests.

For a typical SVOD or TVOD service that sees a peak of only a few thousand concurrent requests per second, this value can be multiplied significantly with the introduction of live sports. Our experience of the Winter Games demonstrated peaks of over 500K/rps (requests per second) in the final few moments before events began. It’s also worth noting that in scenarios of federated or SSO architectures – tunneling this request load to downstream partners may also fundamentally prove to be your bottleneck.

Entitlements

Next on my hit list is the Entitlements service. The Entitlements service translates a complex matrix of content availability (geo-blocking) and offer (package) rules in real time as content is requested by the consumer. In one of our most recent projects, 220 countries + 15 content types + day of the week variation + 4 offer types = a complicated and time-consuming rule set to parse. This results in significant load at the platform. Latency in areas such as SSL termination or retrieving entitlement responses from cache layers or database shards are all things to look out for.

Don’t think that because your consumers can start the event successfully – you’re home and dry. Unlike Authentication that often bears only a one-time peak load of traffic in the minutes before the event, the Entitlements endpoint is also hit (if you’re in a hardened environment), at regular intervals throughout the event.

If linked to your Location Service and perhaps Concurrency Service, it isn’t abnormal to see polling by the player at intervals of up to 30 seconds to ensure users aren’t tunneling through VPN’s or sharing credentials with friends and family. In many environments, a failed entitlement check will result in your consumers being ejected from the live stream. Maintaining 100% uptime of your entitlements endpoints is critical.

Cloud Scaling

Cloud Scaling. This should be your golden egg – the solution to all of your capacity problems. Auto-Scale groups are wonderful – and we’re huge proponents of the out of the box capabilities that our public and private cloud providers offer. That said, in researching for this article, it still surprised just how many big names have been caught out here.

Rules and policies that monitor load in areas such as high response latency or instance CPU still take a few seconds to kick in and generate the additional compute capacity we need. Add a few more minutes of bootstrapping, attaching instances to a load balancer and signing off on health checks, and you’ve just hit an end to end duration of around 3-5 minutes.

Taking our entitlements use case above – you could have just missed the critical start of your live event. If the process you use to add capacity is only triggered at 75%, you’re likely to miss the mark. Pre-warming is a great way to solve this problem. Use your event schedule to script or automate the process of adding additional capacity in the hour or so before your live event starts.

Location Services

Location Services are a critical part of the architecture – relied upon heavily for entitlements logic – but the way in which this core service is implemented can make or break.

Traditional IPv4 addresses are now in such short supply that TTL (time to live) values have been reduced down from days to hours – meaning that a service provider may assign an IP address used in one part of a region on a Monday, to an entirely different part of the region on a Tuesday. For the consumer, this often means that the IP address they were assigned at the point of subscribing, may now be allocated to a region where playback rights are prevented.

When deploying your location services – it’s becoming ever more critical to ensure that customer services have the ability to bypass, whitelist or adjust an IP address in real time to grant the consumer access to a live event immediately. Moving to an owned and operated Location Service will bring you untold flexibility, help alleviate the capacity limitations offered by cloud-based solutions, and no doubt earn you a few extra points on your NPS score.

In conclusion

These subsystems only represent a fraction of the components in a typical environment – but they’re often the ones that create the most pain. For those embracing cloud and even perhaps on-premise environments in a build it yourself fashion – the challenge is yours. For those that have built platforms around multi-vendor externally hosted SaaS products, the challenge to fortify, scale and harden may require a little more thought.


In live sports, I believe what we deem as the quality of experience to the consumer is critical. Some of you may have noticed that subjects such as latency (offset from live) and the use of (or introduction of) immersive audio are hot topics at the moment. Rightfully so – these are things that really matter to worldwide audiences.

In live sports scenarios, the moments in the run-up to the start of the event generate a peak traffic loading that has a huge impact on the platform and the consumer experience. After all – what good is immersive audio if you can’t get to the game on time? Of the major outages and interruptions that have plagued big brands delivering sports content over the last few years, it’s rarely (if ever) the final element of media delivery.

So what should we look out for when architecting platforms for global reach and scale?

Authentication

The authentication service is one of several that bears the brunt of peak loading issues. In the final moments in the run-up to any live event beginning, analytics data usually demonstrates that a sudden surge of users heading to the service to prepare their tablets, TV’s or consoles for the main event results in a flurry of authentication and/or session token refresh requests.

For a typical SVOD or TVOD service that sees a peak of only a few thousand concurrent requests per second, this value can be multiplied significantly with the introduction of live sports. Our experience of the Winter Games demonstrated peaks of over 500K/rps (requests per second) in the final few moments before events began. It’s also worth noting that in scenarios of federated or SSO architectures – tunneling this request load to downstream partners may also fundamentally prove to be your bottleneck.

Entitlements

Next on my hit list is the Entitlements service. The Entitlements service translates a complex matrix of content availability (geo-blocking) and offer (package) rules in real time as content is requested by the consumer. In one of our most recent projects, 220 countries + 15 content types + day of the week variation + 4 offer types = a complicated and time-consuming rule set to parse. This results in significant load at the platform. Latency in areas such as SSL termination or retrieving entitlement responses from cache layers or database shards are all things to look out for.

Don’t think that because your consumers can start the event successfully – you’re home and dry. Unlike Authentication that often bears only a one-time peak load of traffic in the minutes before the event, the Entitlements endpoint is also hit (if you’re in a hardened environment), at regular intervals throughout the event.

If linked to your Location Service and perhaps Concurrency Service, it isn’t abnormal to see polling by the player at intervals of up to 30 seconds to ensure users aren’t tunneling through VPN’s or sharing credentials with friends and family. In many environments, a failed entitlement check will result in your consumers being ejected from the live stream. Maintaining 100% uptime of your entitlements endpoints is critical.

Cloud Scaling

Cloud Scaling. This should be your golden egg – the solution to all of your capacity problems. Auto-Scale groups are wonderful – and we’re huge proponents of the out of the box capabilities that our public and private cloud providers offer. That said, in researching for this article, it still surprised just how many big names have been caught out here.

Rules and policies that monitor load in areas such as high response latency or instance CPU still take a few seconds to kick in and generate the additional compute capacity we need. Add a few more minutes of bootstrapping, attaching instances to a load balancer and signing off on health checks, and you’ve just hit an end to end duration of around 3-5 minutes.

Taking our entitlements use case above – you could have just missed the critical start of your live event. If the process you use to add capacity is only triggered at 75%, you’re likely to miss the mark. Pre-warming is a great way to solve this problem. Use your event schedule to script or automate the process of adding additional capacity in the hour or so before your live event starts.

Location Services

Location Services are a critical part of the architecture – relied upon heavily for entitlements logic – but the way in which this core service is implemented can make or break.

Traditional IPv4 addresses are now in such short supply that TTL (time to live) values have been reduced down from days to hours – meaning that a service provider may assign an IP address used in one part of a region on a Monday, to an entirely different part of the region on a Tuesday. For the consumer, this often means that the IP address they were assigned at the point of subscribing, may now be allocated to a region where playback rights are prevented.

When deploying your location services – it’s becoming ever more critical to ensure that customer services have the ability to bypass, whitelist or adjust an IP address in real time to grant the consumer access to a live event immediately. Moving to an owned and operated Location Service will bring you untold flexibility, help alleviate the capacity limitations offered by cloud-based solutions, and no doubt earn you a few extra points on your NPS score.

In conclusion

These subsystems only represent a fraction of the components in a typical environment – but they’re often the ones that create the most pain. For those embracing cloud and even perhaps on-premise environments in a build it yourself fashion – the challenge is yours. For those that have built platforms around multi-vendor externally hosted SaaS products, the challenge to fortify, scale and harden may require a little more thought.


In live sports, I believe what we deem as the quality of experience to the consumer is critical. Some of you may have noticed that subjects such as latency (offset from live) and the use of (or introduction of) immersive audio are hot topics at the moment. Rightfully so – these are things that really matter to worldwide audiences.

In live sports scenarios, the moments in the run-up to the start of the event generate a peak traffic loading that has a huge impact on the platform and the consumer experience. After all – what good is immersive audio if you can’t get to the game on time? Of the major outages and interruptions that have plagued big brands delivering sports content over the last few years, it’s rarely (if ever) the final element of media delivery.

So what should we look out for when architecting platforms for global reach and scale?

Authentication

The authentication service is one of several that bears the brunt of peak loading issues. In the final moments in the run-up to any live event beginning, analytics data usually demonstrates that a sudden surge of users heading to the service to prepare their tablets, TV’s or consoles for the main event results in a flurry of authentication and/or session token refresh requests.

For a typical SVOD or TVOD service that sees a peak of only a few thousand concurrent requests per second, this value can be multiplied significantly with the introduction of live sports. Our experience of the Winter Games demonstrated peaks of over 500K/rps (requests per second) in the final few moments before events began. It’s also worth noting that in scenarios of federated or SSO architectures – tunneling this request load to downstream partners may also fundamentally prove to be your bottleneck.

Entitlements

Next on my hit list is the Entitlements service. The Entitlements service translates a complex matrix of content availability (geo-blocking) and offer (package) rules in real time as content is requested by the consumer. In one of our most recent projects, 220 countries + 15 content types + day of the week variation + 4 offer types = a complicated and time-consuming rule set to parse. This results in significant load at the platform. Latency in areas such as SSL termination or retrieving entitlement responses from cache layers or database shards are all things to look out for.

Don’t think that because your consumers can start the event successfully – you’re home and dry. Unlike Authentication that often bears only a one-time peak load of traffic in the minutes before the event, the Entitlements endpoint is also hit (if you’re in a hardened environment), at regular intervals throughout the event.

If linked to your Location Service and perhaps Concurrency Service, it isn’t abnormal to see polling by the player at intervals of up to 30 seconds to ensure users aren’t tunneling through VPN’s or sharing credentials with friends and family. In many environments, a failed entitlement check will result in your consumers being ejected from the live stream. Maintaining 100% uptime of your entitlements endpoints is critical.

Cloud Scaling

Cloud Scaling. This should be your golden egg – the solution to all of your capacity problems. Auto-Scale groups are wonderful – and we’re huge proponents of the out of the box capabilities that our public and private cloud providers offer. That said, in researching for this article, it still surprised just how many big names have been caught out here.

Rules and policies that monitor load in areas such as high response latency or instance CPU still take a few seconds to kick in and generate the additional compute capacity we need. Add a few more minutes of bootstrapping, attaching instances to a load balancer and signing off on health checks, and you’ve just hit an end to end duration of around 3-5 minutes.

Taking our entitlements use case above – you could have just missed the critical start of your live event. If the process you use to add capacity is only triggered at 75%, you’re likely to miss the mark. Pre-warming is a great way to solve this problem. Use your event schedule to script or automate the process of adding additional capacity in the hour or so before your live event starts.

Location Services

Location Services are a critical part of the architecture – relied upon heavily for entitlements logic – but the way in which this core service is implemented can make or break.

Traditional IPv4 addresses are now in such short supply that TTL (time to live) values have been reduced down from days to hours – meaning that a service provider may assign an IP address used in one part of a region on a Monday, to an entirely different part of the region on a Tuesday. For the consumer, this often means that the IP address they were assigned at the point of subscribing, may now be allocated to a region where playback rights are prevented.

When deploying your location services – it’s becoming ever more critical to ensure that customer services have the ability to bypass, whitelist or adjust an IP address in real time to grant the consumer access to a live event immediately. Moving to an owned and operated Location Service will bring you untold flexibility, help alleviate the capacity limitations offered by cloud-based solutions, and no doubt earn you a few extra points on your NPS score.

In conclusion

These subsystems only represent a fraction of the components in a typical environment – but they’re often the ones that create the most pain. For those embracing cloud and even perhaps on-premise environments in a build it yourself fashion – the challenge is yours. For those that have built platforms around multi-vendor externally hosted SaaS products, the challenge to fortify, scale and harden may require a little more thought.


In live sports, I believe what we deem as the quality of experience to the consumer is critical. Some of you may have noticed that subjects such as latency (offset from live) and the use of (or introduction of) immersive audio are hot topics at the moment. Rightfully so – these are things that really matter to worldwide audiences.

In live sports scenarios, the moments in the run-up to the start of the event generate a peak traffic loading that has a huge impact on the platform and the consumer experience. After all – what good is immersive audio if you can’t get to the game on time? Of the major outages and interruptions that have plagued big brands delivering sports content over the last few years, it’s rarely (if ever) the final element of media delivery.

So what should we look out for when architecting platforms for global reach and scale?

Authentication

The authentication service is one of several that bears the brunt of peak loading issues. In the final moments in the run-up to any live event beginning, analytics data usually demonstrates that a sudden surge of users heading to the service to prepare their tablets, TV’s or consoles for the main event results in a flurry of authentication and/or session token refresh requests.

For a typical SVOD or TVOD service that sees a peak of only a few thousand concurrent requests per second, this value can be multiplied significantly with the introduction of live sports. Our experience of the Winter Games demonstrated peaks of over 500K/rps (requests per second) in the final few moments before events began. It’s also worth noting that in scenarios of federated or SSO architectures – tunneling this request load to downstream partners may also fundamentally prove to be your bottleneck.

Entitlements

Next on my hit list is the Entitlements service. The Entitlements service translates a complex matrix of content availability (geo-blocking) and offer (package) rules in real time as content is requested by the consumer. In one of our most recent projects, 220 countries + 15 content types + day of the week variation + 4 offer types = a complicated and time-consuming rule set to parse. This results in significant load at the platform. Latency in areas such as SSL termination or retrieving entitlement responses from cache layers or database shards are all things to look out for.

Don’t think that because your consumers can start the event successfully – you’re home and dry. Unlike Authentication that often bears only a one-time peak load of traffic in the minutes before the event, the Entitlements endpoint is also hit (if you’re in a hardened environment), at regular intervals throughout the event.

If linked to your Location Service and perhaps Concurrency Service, it isn’t abnormal to see polling by the player at intervals of up to 30 seconds to ensure users aren’t tunneling through VPN’s or sharing credentials with friends and family. In many environments, a failed entitlement check will result in your consumers being ejected from the live stream. Maintaining 100% uptime of your entitlements endpoints is critical.

Cloud Scaling

Cloud Scaling. This should be your golden egg – the solution to all of your capacity problems. Auto-Scale groups are wonderful – and we’re huge proponents of the out of the box capabilities that our public and private cloud providers offer. That said, in researching for this article, it still surprised just how many big names have been caught out here.

Rules and policies that monitor load in areas such as high response latency or instance CPU still take a few seconds to kick in and generate the additional compute capacity we need. Add a few more minutes of bootstrapping, attaching instances to a load balancer and signing off on health checks, and you’ve just hit an end to end duration of around 3-5 minutes.

Taking our entitlements use case above – you could have just missed the critical start of your live event. If the process you use to add capacity is only triggered at 75%, you’re likely to miss the mark. Pre-warming is a great way to solve this problem. Use your event schedule to script or automate the process of adding additional capacity in the hour or so before your live event starts.

Location Services

Location Services are a critical part of the architecture – relied upon heavily for entitlements logic – but the way in which this core service is implemented can make or break.

Traditional IPv4 addresses are now in such short supply that TTL (time to live) values have been reduced down from days to hours – meaning that a service provider may assign an IP address used in one part of a region on a Monday, to an entirely different part of the region on a Tuesday. For the consumer, this often means that the IP address they were assigned at the point of subscribing, may now be allocated to a region where playback rights are prevented.

When deploying your location services – it’s becoming ever more critical to ensure that customer services have the ability to bypass, whitelist or adjust an IP address in real time to grant the consumer access to a live event immediately. Moving to an owned and operated Location Service will bring you untold flexibility, help alleviate the capacity limitations offered by cloud-based solutions, and no doubt earn you a few extra points on your NPS score.

In conclusion

These subsystems only represent a fraction of the components in a typical environment – but they’re often the ones that create the most pain. For those embracing cloud and even perhaps on-premise environments in a build it yourself fashion – the challenge is yours. For those that have built platforms around multi-vendor externally hosted SaaS products, the challenge to fortify, scale and harden may require a little more thought.


In live sports, I believe what we deem as the quality of experience to the consumer is critical. Some of you may have noticed that subjects such as latency (offset from live) and the use of (or introduction of) immersive audio are hot topics at the moment. Rightfully so – these are things that really matter to worldwide audiences.

In live sports scenarios, the moments in the run-up to the start of the event generate a peak traffic loading that has a huge impact on the platform and the consumer experience. After all – what good is immersive audio if you can’t get to the game on time? Of the major outages and interruptions that have plagued big brands delivering sports content over the last few years, it’s rarely (if ever) the final element of media delivery.

So what should we look out for when architecting platforms for global reach and scale?

Authentication

The authentication service is one of several that bears the brunt of peak loading issues. In the final moments in the run-up to any live event beginning, analytics data usually demonstrates that a sudden surge of users heading to the service to prepare their tablets, TV’s or consoles for the main event results in a flurry of authentication and/or session token refresh requests.

For a typical SVOD or TVOD service that sees a peak of only a few thousand concurrent requests per second, this value can be multiplied significantly with the introduction of live sports. Our experience of the Winter Games demonstrated peaks of over 500K/rps (requests per second) in the final few moments before events began. It’s also worth noting that in scenarios of federated or SSO architectures – tunneling this request load to downstream partners may also fundamentally prove to be your bottleneck.

Entitlements

Next on my hit list is the Entitlements service. The Entitlements service translates a complex matrix of content availability (geo-blocking) and offer (package) rules in real time as content is requested by the consumer. In one of our most recent projects, 220 countries + 15 content types + day of the week variation + 4 offer types = a complicated and time-consuming rule set to parse. This results in significant load at the platform. Latency in areas such as SSL termination or retrieving entitlement responses from cache layers or database shards are all things to look out for.

Don’t think that because your consumers can start the event successfully – you’re home and dry. Unlike Authentication that often bears only a one-time peak load of traffic in the minutes before the event, the Entitlements endpoint is also hit (if you’re in a hardened environment), at regular intervals throughout the event.

If linked to your Location Service and perhaps Concurrency Service, it isn’t abnormal to see polling by the player at intervals of up to 30 seconds to ensure users aren’t tunneling through VPN’s or sharing credentials with friends and family. In many environments, a failed entitlement check will result in your consumers being ejected from the live stream. Maintaining 100% uptime of your entitlements endpoints is critical.

Cloud Scaling

Cloud Scaling. This should be your golden egg – the solution to all of your capacity problems. Auto-Scale groups are wonderful – and we’re huge proponents of the out of the box capabilities that our public and private cloud providers offer. That said, in researching for this article, it still surprised just how many big names have been caught out here.

Rules and policies that monitor load in areas such as high response latency or instance CPU still take a few seconds to kick in and generate the additional compute capacity we need. Add a few more minutes of bootstrapping, attaching instances to a load balancer and signing off on health checks, and you’ve just hit an end to end duration of around 3-5 minutes.

Taking our entitlements use case above – you could have just missed the critical start of your live event. If the process you use to add capacity is only triggered at 75%, you’re likely to miss the mark. Pre-warming is a great way to solve this problem. Use your event schedule to script or automate the process of adding additional capacity in the hour or so before your live event starts.

Location Services

Location Services are a critical part of the architecture – relied upon heavily for entitlements logic – but the way in which this core service is implemented can make or break.

Traditional IPv4 addresses are now in such short supply that TTL (time to live) values have been reduced down from days to hours – meaning that a service provider may assign an IP address used in one part of a region on a Monday, to an entirely different part of the region on a Tuesday. For the consumer, this often means that the IP address they were assigned at the point of subscribing, may now be allocated to a region where playback rights are prevented.

When deploying your location services – it’s becoming ever more critical to ensure that customer services have the ability to bypass, whitelist or adjust an IP address in real time to grant the consumer access to a live event immediately. Moving to an owned and operated Location Service will bring you untold flexibility, help alleviate the capacity limitations offered by cloud-based solutions, and no doubt earn you a few extra points on your NPS score.

In conclusion

These subsystems only represent a fraction of the components in a typical environment – but they’re often the ones that create the most pain. For those embracing cloud and even perhaps on-premise environments in a build it yourself fashion – the challenge is yours. For those that have built platforms around multi-vendor externally hosted SaaS products, the challenge to fortify, scale and harden may require a little more thought.


To find out more about how the Spicy Mango team could help your business then please browse the site or get in touch by dropping us an email at hello@spicymango.co.uk or tweet us @spicymangotech

More insights you may enjoy

More insights you may enjoy

More insights you may enjoy

More insights you may enjoy

Unlock incredible potential and value with scalable, high performing and reliable platforms and capabilities across sports, broadcast and entertainment.

Get in touch

Contact us - we don't bite

Drop us an email at hello@spicymango.co.uk or call us on +44 (0)844 848 0441 or fill out the contact form below for a friendly chat.

We don’t share your personal details with anyone

Get in touch

Start your journey

Drop us an email at hello@spicymango.co.uk or call us on +44 (0)844 848 0441 or fill out the contact form below for a friendly chat.

We don’t share your personal details with anyone

Get in touch

Contact us - we don't bite

Drop us an email at hello@spicymango.co.uk or call us on +44 (0)844 848 0441 or fill out the contact form below for a friendly chat.

We don’t share your personal details with anyone

Get in touch

Contact us - we don't bite

Drop us an email at hello@spicymango.co.uk or call us on +44 (0)844 848 0441 or fill out the contact form below for a friendly chat.

We don’t share your personal details with anyone

Get in touch

Contact us - we don't bite

Drop us an email at hello@spicymango.co.uk or call us on +44 (0)844 848 0441 or fill out the contact form below for a friendly chat.

We don’t share your personal details with anyone