Zoom Jitsi

Editor’s Note: This post was originally published on October 23, 2018. Zoom recently started using WebRTC’s DataChannels so we have added some new details at the end in the DataChannels section.

Rube Goldberg’s Professor Butts and the Self-Operating Napkin (1931)

Zoom Jitsi

Zoom has a web client that allows a participant to join meetings without downloading their app. Chris Koehncke was excited to see how this worked (watch him at the upcoming KrankyGeek event!) so we gave it a try. It worked, removing the download barrier. The quality was acceptable and we had a good chat for half an hour.

Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Jitsi on mobile – download our apps and start a meeting from anywhere. Hello, Slack fans! Very pleased to meet you! There's no need to create an account. Apr 14, 2021 Zoom is a leading platform for setting up virtual meetings, video conferences, direct messages, and collaboration tasks. In fact, the application is available for multiple operating systems, which makes it easier to sync data across devices. Jitsi Meet If you have been using Zoom and want something as similar as possible to ease the transition, then Jitsi Meeting may be the platform for you. It looks very similar to Zoom and has very similar features. Jitsi can also be used in the web-browser, though there are no breakout rooms.

Opening chrome://webrtc-internals showed onlygetUserMedia being used for accessing camera and microphone but no RTCPeerConnection like a WebRTC call should have. This got me very interested – how are they making calls without WebRTC?

Why don’t they use WebRTC?

The relationship between Zoom and WebRTC is a difficult one as shown in this statement from their website:

The Jitsi folks just did a comparison of the quality recently in response to that. Tsahi Levent-Levi had some useful comments on that as well.

So let’s take a quick look at the “excellent features” under quite interesting circumstances — running in Chrome.

The Zoom Web client

Zoom jitsi google meet

Chromes network developer tools quickly showed two things:

  • WebSockets are used for data transfer
  • there are workers loading WebAssembly (wasm) files

The WebAssembly file names quickly lead to a GitHub repository where those files, including some of the other JavaScript components are hosted. Eclipse ee for java developers. The files are mostly the same as the ones used in production.

Media over WebSockets

The overall design is quite interesting. It uses WebSockets to transfer the media which is certainly not an optimal choice. It is similar to using TURN/TCP in WebRTC — it has a quality impact and will not work well in quite a number of cases. The general problem of doing realtime media over TCP is that packet loss can lead to resends and increased delay. Tsahi described this over at TestRTC a while ago, showing the impact on bitrate and other things.

Zoom Jitsi

The primary advantage of using media over WebSockets is that it might pass firewalls where even TURN/TCP and TURN/TLS could be blocked. And it certainly avoids the issue of WebRTC TURN connections not getting past authenticated proxies. That was a long-standing issue in Chrome’s WebRTC implementation that was only resolved last year.

Data received on the WebSockets goes into a WebAssembly (WASM) based decoder. Audio is fed to an AudioWorklet in browsers that support that. From there the decoded audio is played using the WebAudio “magic” destination node.
Video is painted to a canvas. This is surprisingly smooth and the quality is quite high.

In the other direction, WebAudio captures media from thegetUserMedia call and is sent to a WebAssembly encoder worker and then delivered via WebSocket. Video capture happens with a resolution of 640×360 and, unsurprisingly, is grabbed from a canvas before being sent to the WebAssembly encoder.

The WASM files seem to contain the same encoders and decoders as Zooms native client, meaning the gateway doesn’t have to do transcoding. Instead it is probably little more than a websocket-to-RTP relay, similar to a TURN server. The encoded video is somewhat pixelated at times and Mr. Kranky even complained about staircase artifacts. While the CPU usage of the encoder is rather high (at 640×360 resolution) this might not matter as the user will simply blame Chrome and use the native client the next time.

H.264

Delivering the media engine as WebAssembly is quite interesting, it allows for supporting codecs not supported by Chrome/WebRTC. This is not entirely novel – FFmpeg compiled with emscripten has been done many times before and emscripten seems to have been used here as well. Delivering the encoded bytes via WebSockets allowed inspecting their content using Chrome’s excellent debugging tools and showed a H264 payload with an RTP header and some framing:

raw RTP data sent over the Websocket

— Nils Ohlmeier (@nilsohlmeier) September 5, 2019

chrome://webrtc-internals tells us a bit more about how this works. See a dump here which can be imported with my tool here.

No STUN/TURN = fallback to WebSockets

Looking at theRTCPeerConnection configuration the most notable thing is thaticeServers is configured as an empty array which means no STUN or TURN servers are used:

Zoom

This makes sense as TCP fallback is most still likely provided by the previous WebSocket implementation and no TURN infrastructure is required just for the web client. Indeed, when UDP is blocked the client falls back to WebSockets and is trying to establish a new RTCPeerConnection every ten seconds.

Standard PeerConnection setup but with SDP munging

There are two PeerConnections, each of which creates an unreliable DataChannel, one labelledZoomWebclientAudioDataChannel, the other labelledZoomWebclientVideoDataChannel. Using two connections for this isn’t necessary but was probably easier to fit into the existing WebSocket architecture.

After that,createOffer is called followed bysetLocalDescription. While this is pretty standard there is an interesting thing going on here as thea=ice-frag line is changed beforesetLocalDescription is called, replacing Chrome’s rather short username fragment (ufrag) with a lengthy uuid. This is called SDP munging and is generally frowned upon due potential interoperability issues. Surprisingly Firefox allows this which means more work for Nils and his Mozilla team. We’ll discuss what the purpose of this might be below.

Simple server setup

Next we see the answer from the server. It is using ice-lite which is common for servers as it is easy to implement as well as a single host candidate. That candidate is then also added viaaddIceCandidate which is a bit superfluous but explains the double occurrence in Nils screenshot. The answer also specifiesa=setup:passive which means the browser is acting as the DTLS client, probably to reduce the server complexity.

Quite noticeable is that we see the same server-side port 8801 both in Nils screenshot as well as the dump we gathered. This is no coincidence, Zoom’s native client runs on the same port. This means that all UDP packets have to be demultiplexed into sessions which is typically done by creating an association between the incoming UDP packets and the ufrag of the STUN requests from those packets. This is probably also the reason for munging the ufrag. This is a bit silly – demultiplexing works just as well with just the server side ufrag.

Traffic inspection does not reveal anything new

Zoom Jitsi Meet

Inspecting the traffic is a bit more complicated. Since its JavaScript one can use the prototype override approach used by the WebRTC-Externals extension to snoop on any call toRTCDataChannel.prototype.send. Unsurprisingly, the payload of the packets is the same as the one sent via WebSockets.

Zoom Jitsi Skype

This update to the Zoom web client will, as Nils pointed out, most likely increase the quality that was limited by the transfer via WebSockets over TCP quite a bit. While the client is now using WebRTC, it continues to avoid using the WebRTC media stack.

{“author”: “Philipp Hancke“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.

Zoom Jitsi Meet

Related Posts