A Better Way to Scale WebRTC

Real-Time Video Streaming with Scalability and Broadcast Quality

When explaining our groundbreaking work, we often encounter skepticism and disbelief, we’re used to hearing things like:

“What you’ve just described is impossible.”

“WebRTC was built for video conferencing and isn't something suitable for broadcast use cases.”

That’s because where everyone else zigged, we zagged, which led to the creation of something different and unique.  We chose not to take the easy road of copying existing open source solutions and libraries to build a franken-service of stitched together components we don’t own.   

Read on to learn more about our approach, the limitations of existing solutions and the core principles that set Phenix apart.

The Phenix Solution

Phenix forever changed the ultra low latency video streaming landscape by being the first company to deliver video in production for large scale events with live loads in real-time. The entire audience experiences the live event simultaneously with less than half a second of field-to-screen latency. This enables interactivity between audience members, and between the audience and the talent behind the camera, or people attending the event in person.  Phenix has built the entire video workflow from encoding & transcoding to multiple bitrates, to adaptive bitrate (ABR) video delivery with the constraints of real-time and the objective of global scale, allowing us to achieve lower latency at scale than any other real time solutions provider.

Understanding WebRTC

To understand our achievements and how it differs from the approaches used by other real time video solution providers to lower latency, we have to explore the components of WebRTC.  The W3C WebRTC standard consists of several protocols, including Interactive Connectivity Establishment (ICE), Session Traversal Utilities for NAT (STUN), Traversal Using Relays around NAT (TURN), Session Description Protocol (SDP) and Secure Real Time Transport (SRTP). 

Each of these protocols does something different:

  • ICE, STUN & TURN make it possible to establish connectivity between two endpoints when there are Network Address Translations (NATs) involved.
  • SDP is the way two endpoints negotiate capabilities, like codec and audio/video parameters, and describe audio & video streams.  
  • Finally SRTP is the workhorse of WebRTC and is used to transport the audio and video content.  SRTP is the secure version of RTP, which was initially proposed in 1996, and then updated in 2003.

To answer the question of whether or not WebRTC is scalable, one must evaluate the core protocols. 

ICE, STUN, TURN and SDP are all used for establishing the connection for audio & video transport.  ICE, STUN, TURN and SDP exchange are all involved in connection establishment, and while the architecture and performance of these items is key for being able to handle high join rates, it’s not what most people are thinking of when they claim that WebRTC can’t scale. 

When people challenge the scalability of WebRTC, they’re usually referring to the steady state ability to continuously stream video to a large audience concurrently.  So the question then becomes “Can you build a server network capable of streaming video and audio over SRTP to a large audience?”

To answer this question, let’s take a high-level look at the core activities of an edge server in standard HLS live streaming vs SRTP streaming. Each scenario assumes an established connection.

 

HLS (HTTP Live Streaming) Protocol   SRTP (Secure Real-Time Transport Protocol)
 
  1. Receive HTTP request for video fragment
  2. Retrieve fragment from disk or memory
  3. Send video to the client
 
  1. Wait for the next video frame to arrive from the real-time source
  2. Send the next video frame to the client

In the HLS scenario, it’s a request / response loop, and in the SRTP scenario, it’s a loop forwarding frames of video. Both connections are established, persistent and encrypted so the resources reserved for that viewer and used to encrypt the stream are essentially the same.

You Get What You Design For

What’s different between the above two scenarios is where the core code comes from.  The HTTP code comes from open source projects and commercial systems where performance in serving HTTP objects was paramount, and they have been optimizing these since the beginning of the web.  The code for most companies claiming to have a real-time streaming platform has come from a set of open source projects designed to solve the video chat use case where scale is not a core consideration. 

Some might argue that with some tweaks and relying on other orchestration projects like Kubernetes in combination with the vast compute resources offered by the hyper-scalers, you can put together a scalable solution. However, you end up with inefficiencies stacked up and an expensive operational cost model that might work for thousands or even ten thousand viewers in close geographic proximity, but fails to work technically and economically for globally distributed events to hundreds of thousands or millions of viewers.

The Phenix Way

The Phenix Way architecture and development method is a key differentiator and why the Phenix live streaming platform can scale where others are challenged. With highly experienced engineers and architects who’ve worked at companies including Akamai, Skype, and IBM, Phenix took a completely different approach.  

Rather than using open source libraries and taking the quick route to market, we took our time to build our own tech stack specifically optimized to deliver broadcast-quality video to millions of viewers as quickly as possible.

The core components we built from scratch include:

  • a standards-compliant WebRTC library 
  • cloud orchestration to enable high availability across cloud providers 
  • machine learning and prediction models to  handle flash crowds 
  • real-time adaptive bitrate video delivery.  

We wrote every piece of these core components ourselves with the explicit knowledge that the code was going to be living in an environment for real-time video streaming at scale. We control every piece, and every piece knows it needs to be operating within a specific time bound and to be efficient with resources.  

High Availability & Multi-Cloud

With our intentional focus on ultra-low latency without sacrificing quality or scale, we’ve also architected the underlying infrastructure specifically to accommodate high availability video delivery. 

We rely on best-in-class third parties like Google Cloud Platform (GCP) and Oracle Cloud Infrastructure (OCI) for core componentry.  To avoid any one single point of failure, we’ve built our platform across multiple cloud providers to achieve an unmatched level of fault tolerance with global reliability using our own code.  Not only is our platform architected such that the hyperscalers are providing redundancy for one another, but it also expands the ways in which we can extend our network.  We're able to deploy in any compute environment across clouds, vendors, and on-premise. 

And we don't require customers to pay for bandwidth that they might consume, our system deploys the resources it needs in real-time, allowing you to optimize your budget by only paying for bandwidth that’s actually required based on your actual audience size.