published 10/24/2024

To WASM, or not to WASM

I’m losing sleep over whether the web client should be written in Rust or TypeScript. I need your opinion, my beautiful little rubber duckies.

Frontend

But first, I’m going to spew my own opinion. The UI layer will absolutely be written in TypeScript using a flavor-of-the-month web framework. I’m not going to use Yew or any other React clone written in Rust.

Why? It’s pretty simple, I need frontend contributors. I don’t think it’s fair to ask a frontend savant to learn Rust and deal with the restrictions imposed by the language. Unpopular opinion: a UI should be flashy and not safe.

Additionally, JavaScript web frameworks are more mature and used in production. I’m not suggesting that you can’t use a Rust frontend library, just that they won’t be nearly as polished or feature complete.

“@kixelated ur dumb” if you disagree.

My plan is to create a <moq-karp> custom element that abstracts away the underlying implementation; similar to the <video> tag. Then you can use whatever frontend framework you want to interface with the UI-less element.

What is up for debate is the language used to power the media experience behind this custom element. And yes, there’s going to be a LOT of links to MDN in this post.

Background

Before we get started, let’s talk about the code that has already been written.

moq-rs is a Rust library that handles networking and media containers.
moq-js is a TypeScript library that does networking, media containers, encoding/decoding, and capture/rendering. It also contains this blog post because splitting the library from the website was too difficult for me.
moq-web is a Rust proof-of-concept that uses WebAssembly to decode and render media. It barely works.

The decision that needs to be made is whether to continue with moq-js or switch to moq-web. Can you help me decide?

Threads

I know “thread” is a spooky word to Javascript developers, but the library will need to run in multiple threads:

Main Thread
WebWorker Thread
AudioWorklet Thread

The Main Thread needs to initialize the WebWorker and AudioWorklet of course. Unfortunately, AudioContext also requires the main thread for some reason. This is a bit frustrating because we will occasionally need to reinitialize it too, like when changing the audio frequency (or device?).

The WebWorker Thread will perform any networking, decoding/encoding, and rendering. This is done via WebTransport, WebCodecs, and OffscreenCanvas, respectively. Our application code doesn’t need to be fast because the heavy processing (ex. encoding) is handled by browser APIs.

Finally, the AudioWorklet Thread is a nightmare. In order to render audio with minimal latency, you use an AudioWorklet that runs on the audio thread. This is completely sandboxed and triggers a callback every ~2ms to request the next 128 audio samples.

Main Thread

Let’s cover the easiest decision first.

The small amount of code backing the custom element should be written in TypeScript. It will initialize the WebWorker, AudioContext, and AudioWorklet. The custom element API will be a simple shim that performs a postMessage to the worker.

The benefit of TypeScript here is that it’s simple. It also means we can offload the slow WASM initialization to the background. In theory, we could even establish the connection to the MoQ server while the WASM module is loading.

WebWorker Thread

Now this decision is trickier.

WASM in a WebWorker makes a lot of sense because the worker is in a sandbox already. An API that you use to communicate with a WebWorker will be very similar to an API that you use to communicate with a WASM module.

My biggest concern is the performance. WASM will undoubtedly be slower even when written in the 🦀 language.

Why? Well one thing you can do with WebWorkers but can’t do with WASM yet is a zero-copy. That means every byte received over the network (WebTransport) would need to be copied to WASM and then immediately copied back to decode (WebCodecs). We might even need to perform another copy to and from in order to render each frame, although fortunately I think this can be avoided thanks to WebCodecs’ hardware offload.

But, am I being paranoid and over-optimizing? Almost certainly. There are other reasons to keep the worker in TypeScript, again for more contributions and faster startup times. Not to mention that interfacing with JavaScript from Rust is still pretty dreadful as there are some things, like callbacks, that don’t translate well.

But if written in Rust, then much of the code would be shareable with native apps. We would still use native APIs to perform networking, encoding, and rendering, but the logic would be the same. Surely there’s justification for a more complicated but reusable web library?

Another “@kixelated ur dumb” in chat please.

WebWorklet Thread

Finally, my least favorite component: audio. I barely understand how beeps and boops are encoded into bytes and samples. But I do know how to output audio samples on the web, and it’s painful.

To achieve minimum latency, you need to create an AudioWorklet that runs on a dedicated audio thread. You can think of it like an WebWorker but even more sandboxed. This worklet runs a callback every ~2ms to gather the next 128 audio samples.

So we first need to get decoded audio samples to this AudioWorklet. There is a postMessage API but apparently the lowest latency approach is to code a ring buffer… in JavaScript. I’m not kidding, you use SharedArrayBuffer and Atomics in a language that doesn’t even have a dedicated integer type. Fun fact: number is secretly a float64, so math gets wonky above 2^53 unless you use a BigInt.

Right now, we render the samples unmodified. However, in the future, we’ll need to perform some audio processing for features like echo cancellation and AI voice changers (gotta raise funding somehow). This part seems miserable to do in JavaScript as it involves UInt8Array and DataView, which are suprisingly terrible APIs. I’m sure there are audio processing libraries written in Rust or C/C++ that we could leverage instead.

From my limited research, it does seem possible to run WASM inside of an AudioWorklet. However, SharedArrayBuffer and Atomics are not yet supported in WASM so some additional latency is unavoidable. But I imagine postMessage only incurs a few milliseconds most and it likley doesn’t matter when the network jitter buffer is already closer to 100 milliseconds.

In fact, this might be a great problem turned solution, because SharedArrayBuffer requires quite annoying CORS settings that would be nice to remove. This might be lifted in the future but I wouldn’t hold my breath. It’s also just a huge headache to interface with a fixed size SharedArrayBuffer given the uneven arrival over the network.

So I’m definitely leaning towards Rust here. “@kixelated ur dumb”

Decision Time

I’m leaning towards Rust for the WebWorker and AudioWorklet threads, and using TypeScript for the UI and Main thread. What do you think?

Good luck leaving a comment on this blog post. I haven’t bothered to code that feature yet. So instead join the Discord. You already know what to type.