🛠️Building a video streamer from scratch

Video streamer is a core micro-service in SnapCall architecture. The service manages rooms where users exchange with audio and video flows through WebRTC architecture.

Why did we start from 0 ?

Couple years ago SnapCall service worked only with audio flow, the idea was to connect users using WebRTC API with real time digital channel rather than phone. We've started using standard calls API like Twilio, Plivo, etc ... It was quickly very limited because one of our first customers filtered some UDP ports required by such APIs, and it was deal breaker.

To by-pass this limit, we've upgraded our infrastructure with FreeSwitch server, one of the most well-known open-source server software for real-time communication applications using WebRTC. All our infrastructure was built around it and during years we handle thousands WebRTC calls per days.

Two years ago, SnapCall product team decided to add video feature to roadmap. Here FreeSwitch server was not well designed for this new challenge, mainly because it's a MCU server and for video ressources optimisations it was better to move to SFU architecture. After benchmarking lot of different solutions, conclusion was to build our own video micro service, to feet perfectly well with our product roadmap and tech ressources.

Technology and frameworks selection

Choosing the right technologies for building the video server from scratch was an important step in the development process, as it had a significant impact on the overall architecture performance, scalability, and maintainability of the SnapCall system. There were various technologies and frameworks available, each with its own strengths and weaknesses, and the choice of technology depend on the specific requirements of the video feature.

We've defined several working topic :

  1. SFU vs MCU architecture

  2. Video/Audio codecs compliants with WebRTC API embedded in browsers

  3. Strong and secure network stack

  4. Ability to handle large number of concurrent users

SFU vs MCU

This great article explain the key points of both technologies. SnapCall service is hosted in AWS cloud environment, therefore SFU seems to be very adapted because :

  • CPU and server ressources has to be optimised, which allow us to scale our system better and at lower cost.

  • Our final video product has a strong identity and stream participants can have different video layouts.

We've tested several SFU libraries based on such comparative studies :

Finally MediaSoup library was a big match for our requirements.

Video/Audio codecs

The video server should be able to handle a variety of video formats, such as MP4, MKV, and AVI, as well as a range of codecs, such as H.264 and VP8/VP9. Additionally, it should also support different streaming protocols, such as HTTP Live Streaming (HLS) and Real Time Protocol (RTP), to provide a good video streaming experience for the users.

We've selected VP8 codec for video because it offers a great compromised between quality and ressource usage.

Here you will find a comparison article between main video codec for WebRTC.

Regarding audio, Opus was the best option because it is specially designed for real time communications. Adaptative quality regarding network conditions was prioritised because audio is less ressource consumer than video.

Network stack

SnapCall Video streamer is using the Secure Real Time Transport Protocol (SRTP) to carry safely all these informations. As SRTP is only working over UDP , TURN servers help us to handle UDP and TCP protocols with generic opened ports to increase deliverability of SnapCall service for users in every conditions.

Handling large number of concurrent users

All the previous point need to be assembled in a working code able to handle lot of concurrent users at the same time without any disruptions as SnapCall is a real time service.

Regarding this we chose to run this code on top of Node.JS server, the following points guided our choice :

  1. JavaScript Everywhere: Node.js allows our developers to write server-side code using the same language they use for client-side code, which make it easier for web developers to build full-stack applications.

  2. High Performance: Node.js uses the V8 JavaScript engine, which was developed by Google for use in the Chrome browser. This engine is highly optimized for performance and can handle a large number of concurrent connections, making Node.js well-suited for building high-performance servers.

  3. Scalability: Node.js is designed to handle a high number of concurrent connections and it is easily scalable. It uses an event-driven, non-blocking I/O model, which allows it to handle a large number of connections without the need for multiple threads or processes.

  4. Large Ecosystem: Node.js has a large and active community that has developed a wide variety of modules and libraries to extend the functionality of the platform. This makes it easier to find and use existing code, which can help to speed up development time.

  5. Easy to Learn: Node.js uses JavaScript, which is a widely-used programming language and many web developers already have experience with it. This makes it easy for developers to learn and get started with Node.js.

  6. Good for real-time applications: Node.js is great for building real-time applications such as chat, games, and other collaboration tools. Its event-driven and non-blocking I/O model makes it well-suited for handling real-time communication between clients and servers.

Implementing and Testing a High-Performance Video Server

It was a crucial step in building a video server from scratch. It involves taking the planned and designed architecture and turning it into working code. It also includes the process of testing the video server to ensure that it meets the performance, scalability, and reliability requirements.

One important aspect was to ensure that the code was optimized for performance. This includes using efficient algorithms, minimizing the use of resources, and reducing the number of network roundtrips. Additionally, it was important to use a node strategy, to distribute the service closer to the users and reduce latency.

Testing the video server was an essential step in ensuring that the system behaves as expected and meets the performance, scalability, and reliability requirements. This included load testing, stress testing, and functional testing. Load testing was used to verify that the video server can handle the expected number of concurrent users. Stress testing was used to verify that the video server can handle unexpected traffic spikes and still maintain a good performance. Functional testing is used to verify that the video server was able to serve rooms correctly and that the video streaming quality is good.

Implementing and testing a high-performance video server was an iterative process that requires a deep understanding of the technologies and frameworks used, as well as a good understanding of the performance and scalability requirements. Regular monitoring and performance tuning help us to maintain a high-performance video server.

Last updated