The pitfalls of using WebRTC
Jan 31, 2024 12 min read
The pitfalls of using WebRTC
Jan 31, 2024 12 min read
FacebookTwitterLinkedInCopy Link

Share this post

FacebookTwitterLinkedInCopy Link

As seen in LinkedIn Pulse.

Author: Allen Drennan, Co-Founder & CTO at Cordoniq Inc.

WebRTC has become a popular choice for developers who are considering adding conferencing and collaboration to their existing product. It seems simple enough. Add the core components of video collaboration to your code and you now have a collaboration app. The question that you must ask yourself is if WebRTC is so suitable for purpose, why does everyone gravitate toward using the major native video conferencing products?

It’s amazing the number of web apps on the market that have integrated WebRTC, but if you drill down into their customer support FAQs you will find they often tell their users to use something else. 

Here are some of the major issues you need to consider before using WebRTC in your product. 

Issues with supported web browsers

WebRTC was engineered to be used within the confines of a web browser. The browser itself varies widely from platform to platform, so you cannot guarantee that the attendee will have a browser with compatible video and audio codecs. This means that you must make sure everyone who joins a meeting meets a minimum browser specification, and that minimum standard is different for each platform you expect to support such as Windows, macOS, Android and iOS.

Those using older Android phones or tablets, for example, may have incompatible browsers because the manufacturer has chosen not to provide an upgrade path for the base Android OS.  This issue alone presents so many challenges as well as impacting user experience and the complete ease-of-use of your product, that it is a reason to abandon the idea of using WebRTC.

Additionally, by design, WebRTC has a very specific approach to both concurrency and events. The broad performance suffers due to the overall inherent design of the web browser and the approach to asynchronous concurrency instead of what a typical application design would use for synchronous communications. This means lower quality video at less frames per second while utilizing more processing power, often loaded onto a single core of a multi-core processor.

Issues with adjusting to available bandwidth

You cannot guarantee available Internet bandwidth and processing power for users who attend meetings with their computer and webcam, so you need to be able to adjust to both dynamically. This means not only adjusting the quality of your experience for your system and Internet connection, but also adjusting your quality as it relates to other attendees and their system and bandwidth.

WebRTC doesn’t provide everything you need to handle scalable video encoding for synchronous communications. While certain aspects of scalable encoding are advancing in the draft specifications now, it will be many years before each web browser, on each mobile, tablet and desktop platform has support for this, if ever. Also for scalable video encoding to be effective for video conferencing you need a robust backend to manage the bitstream layering and perform selective forwarding.

The lack of scalable video encoding means that your video collaboration app experience will suffer and be unusable for people who attend meetings where they possess insufficient bandwidth, and the implementation and compromise in your code will be to lower the resolution and frame rate of the video image to the lowest quality user experience.

Issues with traversing firewalls, NATs and routers

Often people considering implementing WebRTC forget about the even greater complexity of communications. WebRTC can use a myriad of UDP and TCP ports to interconnect users, and all these ports must be available and open at the firewall. NAT traversal, in and of itself, is such a big issue with WebRTC that you end up having to deploy your own network traversal modules that support protocols such as TURN and STUN.

This infrastructure needs to exist everywhere in your middleware, otherwise users will be unable to interconnect with each other. Implementing this requires system and network administrators, updating and maintenance and more.

Additionally, if the user who is interconnecting with your meeting may be able to operate WebRTC in their web browser but still be unable to interconnect with your meeting due to NAT traversal issues on their side.

Issues with point-to-point connections

To avoid operating middleware for NAT traversal or a backend cloud for handling WebRTC communications, sometimes developers are tempted to fall back on peer-to-peer connections with WebRTC. This only makes the situation with firewalls and NAT traversal worse because each attendee must have the ability to interconnect directly with each other over multiple UDP and TCP ports.

With peer connections you end up sending your video upstream multiple times to each respective attendee. This not only increases your overall bandwidth usage but lowers the scalability of the size of a meeting. Once again, the tradeoff becomes lower the video image quality and framerate to adjust to the lack of bandwidth.

But if you intend to build a solution that requires more than a few points of video or screen sharing, point to point connections will not be a suitable solution.

Issues with operating a scalable backend

If you are considering a large-scale service built around WebRTC you will need to operate your own cloud backend. That means routing communications through a dedicated data center in a secure manner. Besides having to become knowledgeable in cloud computing and system administration and the associated operating costs, you must build your own logic into the backend code so you can integrate your existing web product or service into this infrastructure. That includes some form of unified authentication and creating APIs to integrate your existing web product or service into the WebRTC hosted backend.

You will need to consider how to scale your solution and be elastic to demand. You will have to determine how to quickly deploy into multiple geographies and demographics. You have to consider how to scale up your backend data storage. Other issues will present themselves such as how do you record your meetings in the cloud using WebRTC? How do you handle security and privacy of sensitive customer communications in the cloud? Does your backend handle scalable video encoding so you can adjust to fluctuating bandwidth and improve or reduce quality accordingly?

It can quickly become an expensive proposition to build and deploy this model with your requirements.

Issues with complexity of implementation in code

At first the idea of just modifying your web product or service seems like a manageable task.  You add WebRTC components to your interface so that you can enable video and audio. These changes, however, take a great deal of time to integrate properly into the experience.  

This can take a significant amount of time to integrate, once you consider the APIs required to control and manage the experience, the integration points required for NAT traversal, and how you will handle the user experience of interacting with the cloud backend.

Impact of video conferencing on your app’s overall user experience

Since WebRTC is embedded into the web browser with your existing web application, you may find the processing overhead of adding video, especially if it involves many synchronous video users, has a negative impact on the quality of your existing web application’s user experience.

This is especially true on mobile devices and devices with less power, such as Smart TVs.   There are no simple solutions to this problem if you are using WebRTC. The browser encapsulates the WebRTC logic and oversees the prioritization, and that varies depending upon the web browser you choose and the OS platform it is operating upon.

Even worse, some implementations of WebView on Android devices are not fully compatible with WebRTC, and those components are not updated as frequently as full-fledged web browsers. You can’t really dictate to end users what web browser they want to use or expect the operating system to contain the required components for WebRTC that you require.

Limitations on the flexibility to the user experience

WebRTC is a black box in that you have limited control over how it is presented and how much control via API you can have over the presentation. While you can mix Html based presentations with WebRTC components, you will have limited control over WebRTC and how it presents audio and video or how it handles device selection, manipulation, or digital signal processing. If you experience a technical issue with an attendee, you are typically at their mercy to resolve the problem.

Ideally you have a rich set of APIs to control the presentation of the video and audio user experience and any screen sharing you need to manage from both the client side and the cloud backend. WebRTC just doesn’t provide this level of control.

Security concerns over implementations

WebRTC depends upon the security model of the web browser within the operating system of the device it runs upon. Instead of having a unified security model, it utilizes the web browser’s socket transport layer based upon WebSockets. This means you need to keep up with and maintain security control over both web browser security holes and issues and operating system related issues on every platform you intend to support and every browser revision you recommend to your customer.

Common web browsers are constantly updated to resolve security issues, but many users will utilize older web browsers on mobile devices that are not up to date with security patches. You won’t be able to guarantee a secure environment for each attendee without fully controlling the web browser they utilize.

Overall cost of implementation

When you consider all the above factors, from the amount of engineering time it takes to modify your application to include video and audio encoding, to the time it takes to implement a cloud backend to support it, along with associated systems administrative time, the cost of building a suitable solution that meets your requirements can really add up. You also have to consider the amount of time in end-user support you will require to help users with their web browsers, cameras and related hardware.

Just the time it takes for the software engineering changes and customization is significant. Cordoniq has all of these factors already covered in our framework and can provide them both quickly and economically for your business.

How Cordoniq solves these challenges

The reason so many people gravitate toward a native solution to video conferencing and collaboration is because of all the above pitfalls with WebRTC. While you may think that you are running the experience from a web browser, the vast majority of people are experiencing a native application when they launch products like Zoom™.

Cordoniq took a slightly different approach to the challenges presented by blending web and native experiences together. Instead of asking the developer to add components to their web application, product, or service to provide audio, video, and screen sharing, Cordoniq created a framework where users can integrate their existing web application into the Cordoniq framework without significant changes to their web application.  

Our framework acts as the shell around your web application but provides the native performance and reliability you require for the user experience. As a developer, you control the entire experience via API and your users are presented with a unified experience that includes both your web application and a comprehensive, secure collaboration experience.

Cordoniq implements a fully secure, TLS based approach to all client connections and communications, across all devices and operating systems regardless of how old the operating system version is or the age of the physical device. We don’t depend upon the operating system or the browser to dictate the security model.

We can do this for a fraction of the cost of building your own solution as our framework is ready to go and is highly customizable. This includes both the app experience and the backend cloud modules and related APIs. We also do not place restrictions on your usage of our framework such as maximum users, licenses, usage time or rooms. You can scale up to meet whatever your business needs.

Working with the Cordoniq framework and our team, your product could be ready for the marketplace in as little as 90 days. 

Let’s discuss how Cordoniq solves the challenges presented with WebRTC

Cordoniq implements scalable video encoding on all platforms including Windows, macOS, Android and iOS. Depending upon the processing power of the device, we dynamically adjust the utilization of the process for encoding and decoding as well as transmit multi-layer encoding so that attendees with less bandwidth can still participate in a meeting.

Cordoniq uses a single TCP connection to traverse a NAT, firewall and router. All communications are routed over this connection and the framework will automatically utilize UDP if the router permits the communication. Since all communications are routed over a single port, all communications can be secured using industry best Transport Layer Security 1.3 in a uniform manner. This communication model is identical on all supported platforms and devices.

Cordoniq provides easy to implement modules and containers for Windows and Linux based servers that we can deploy for you or if you choose, that you can deploy yourself in a private cloud, hybrid cloud or public cloud. Our modules are elastic and can expand and contract to meet demand, automatically handle failover and load balancing as well as geolocation.

We provide complete control over the security model; the privacy of your data and you choose how your information is stored. Our extensive set of APIs gives you complete control over all the aspects of operating the backend.

Cordoniq took the time to engineer low-level SIMD and NEON code to optimize the synchronous experience. We fully leverage the GPU to offload much of the rendering and processing, thereby freeing up the CPU for other requirements. This level of optimization guarantees that the video collaboration experience won’t detract from the user experience of your web application, product, or service.

Anything in the UX is possible with the Cordoniq framework. We work with our individual partners to blend in the graphical experience that best suits their needs. Almost any design methodology to synchronous UX is possible and we work with our partners to make sure the experience matches their objectives.

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.

Share this post

FacebookTwitterLinkedInCopy Link