Following last week’s blog post titled “MCUs are dead, Long live the MCU!", I received a lot of great feedback (both direct and from the analyst community) and requests to learn more about how we approached these challenges and innovated to address them.
So let me shed some light on our approach and how we addressed them to build our virtualized cloud-based MCU alternative.
We decomposed the MCU into a frontend that worries about protocol interoperability and a backend that does the heavy lifting of transcoding and mixing/compositing. This allows us to plugin many protocols that exist in the video world - H.323, SIP, XMPP, Skype, etc. and terminate them in a scalable cloud infrastructure. We then independently decode and encode different types of audio and video codecs - H.261, H.263, H.264, RTVideo, G.711, G.722, SILK, iSac, etc. So each leg of a multi-party business video call can come from any client/device and we don't need to force users to use a specific brand. For example, supporting H.264/SVC for us is just like supporting any other codec, but we don't force you to use it if you don't have gear that supports it.
Firewall Traversal, Security and Encryption
We built all of the firewall traversal techniques into the cloud service so that you can connect from behind your corporate firewalls and still be able to talk to people inside or outside your organization securely. We encrypt all legs of a call by default with state of the art public key and symmetric AES encryption because we believe this is the right thing to do and we have the horse power to do it without breaking a sweat. I have a whole blog post on this topic following the recent NY Times article on insecure video equipment in businesses
The scaling of our MCU architecture is par none. We did not try to build a single-box MCU and then try to find ways to cobble them together with scripts. We decomposed the MCU into components and built it from the ground up to scale as we grow. We use off-the-shelf systems and bring to bare the power of multi-core x86 64-bit processors. When Intel/AMD hit a wall 5 years ago in terms of processor clock speed, they started putting more cores into each die. It is not like you can run Microsoft Word any faster with these new generation of processors. But the multi-stream transcoding/compositing problem that we are solving suits very well to multi-core x86. As our service grows we just need to rack-and-stack x86 systems to keep scaling since the architecture is already in place to enable this.
High Performance and Low Latency
We have optimized our code for the x86 64-bit architecture and its multimedia processor instructions, variously called MMX, SSE or AVX instructions. This optimized code blows the pants off any DSP used in traditional hardware MCUs. This has allowed us to achieve extremely low transcode latency and we believe this is only going to get better. Cop out approaches like H.264/SVC claim that transcoding latency is one of the primary reasons to avoid MCUs. We have sub 15 milliseconds transcode latency in our system and this is continuing to get smaller as we optimize further. A good video call can be achieved when the end-to-end latency is below 250 milliseconds. The transcoding latency of our system is becoming a negligible component when compared to latency introduced due to the speed of light across the globe and jitter in the network.
Low Cost and Lower Cost
The approach we took affords us to run our service infrastructure at low cost and offer the service to customers at a dramatically lower cost than current approaches. The infrastructure costs are low for us because of the high performance and low cost off-the-self servers. We are able to offer it to you at lower costs because of the benefits of cloud computing, namely the sharing that comes from multiplexing a large number of customers on our infrastructure, yet providing you a high quality, more available service than ever possible before.
Feature Velocity and Flexibility
We could never have achieved the kind of widespread interoperability with Cisco, Polycom, LifeSize, Sony, Microsoft Lync, Skype and GoogleTalk without the flexibility of our platform. We find an interop issue, we debug, fix it and can roll into production in hours or days. We run our R&D team more like a web company than a traditional equipment maker. We can introduce features into production in a matter of weeks and not years. We can experiment with beta features similar to web companies like Google and Facebook and instantly get feedback and improve.
Global Presence and Over-the-Top Internet Experience
We are building presence in tier-1 datacenter facilities across the globe so that we can deliver our services from the closest point to end users. In doing this we are borrowing from some of the innovations from my previous life in the CDN, Caching and WAN optimization space. Companies like Akamai, Limelight and Cacheflow have shown ways to optimize one-way video delivery and we are enhancing that for bi-directional real-time video. We are peering with the right global backbones and can optimizing our routing based on real-time user experience metrics. We have also built error resilience schemes into our audio/video encoding such as FEC (Forward Error Correction), temporal/spatial scaling in codecs, retransmission techniques in low latency hops, etc. to make the service work well over the Internet so that you don't need private leased MPLS or other lines. We simply want you to have the best video calling experience on our service, better than anything you can achieve in a direct point-to-point call.