Tech Notes: Notes on WebAssembly

Notes on WebAssembly

June 19, 2022

I've now spent roughly a year working with WebAssembly. Much like my notes from when I first stepped into TypeScript, here's some "advanced beginner" things I have learned about WebAssembly, with a particular emphasis on browsers and C++.

If you're the kind of person who likes to poke at an example, I've made a WebAssembly file viewer called "weave" that lets you interactively explore the contents of .wasm files. You can play with some demo files here.

In particular, note that you can click into the "code" section, and from there into a function body, and from there even label the otherwise-unlabelled parameters/locals to aid in reading the code.

One of the things that attracted me to work at Figma was a chance to learn about and work with WebAssembly. It turns out Figma is a pretty good environment for this. Figma is a combination of "native to the web" — a lot of the app written in the ordinary React/TypeScript stack — while also using a surprising quantity of low-level C++ code. Figma documents as edited in your browser are rendered using GPU shaders managed by a C++ scene graph, which you'd probably only ever notice if you thought about how smooth zooming is! This post goes into it a bit.

The one-paragraph intro. WebAssembly ("Wasm") is an instruction format + virtual machine that is now native to browsers. It is designed to allow running code from languages other than JavaScript on the web safely, but there are broadening applications of it elsewhere. In an abstract sense it's more or less the same idea as Native Client, the JVM, or CLR, though of course also different in many ways. There are interesting applications of Wasm outside of browsers but I'll leave them out of scope for this post.

How does it work? There's a serialization format and bytecode etc. But another way to look at is that it runs pretty similarly to how JS executes: sandboxed from the OS, typically with an optimizing compiler preprocessing the code you ship. In that view, as compared to JS, it is just faster to deserialize and easier to optimize. The other big difference from JS is that there is no ambient standard library or browser API.

The spec. The Wasm spec is a real delight, succinct and specific. It warms my PL enthusiast heart. In writing tools like weave I found the format quite sensible to work with.

Simple architecture. The Wasm machine is (I think?) much simpler than comparable systems. It is a stack machine, no registers; instructions generally push and pop 32/64-bit ints and floats onto that stack. (Edit: the right way to view this is that the stack is a serialization of the intent of the code, while the runtime will surely optimize it into something that uses registers; see this comment.) Code is an array of functions, which may call each other by array index. The hosting environment can register functions in that array as well, to let Wasm code call out. Memory is a flat array of bytes indexed by 32-bit addresses. Memory, the call stack, and code are all separate from one another — "Harvard architecture".

Beyond that, there is no explicit notion of objects, garbage collection, pointers, strings, arrays, modules, etc.; your Wasm functions can call each other, do math, read and write memory, and that's pretty much it.

The text format. The spec defines a text format that is 1:1 with the bytecode, rendered using s-expressions. Godbolt (aka "Compiler Explorer") also lets you view Clang-generated Wasm output; here's a simple "add" function. Unfortunately that output is not the text format, it is just a thing I think made up by Clang. But if you're familiar with godbolt already you probably won't have much of a problem with this. (Weave, my Wasm viewer linked above, also renders instructions in a nonstandard way, because it's focused on a textual presentation of the original file contents.)

Performance. Is WebAssembly magic performance pixie dust? (short answer: no) is a good, detailed take on the performance question. A higher-level way to answer that same question is that in performance work, usually architectural changes matter more than optimizing loops. With that said, Wasm gives you fairly low-level control, so relative to JS it's pretty ideal for running code where you care about things like efficiency of memory layout, and it's an easier compilation target for other languages.

Pointers. Wasm memory is currently limited to 4gb, which means pointers are only 4 bytes. Meanwhile, v8 on 64-bit platforms uses 64-bit pointers, which means they have to go to great lengths to not have 8-byte pointers eat all your memory. (I found this blog post "Handles are the better pointers" really changed my perspective in this area, worth your time. Relatedly, in my n2 experiment I realized that by limiting to "only" 4 billion files at a time, I'd halve this size all the graph handles by switching them to 32 bits.)

An amusing side note: Wasm is limited to 4gb but a given file can declare a lower limit, which means that some code must represent the number "the current memory limit". This number actually requires a 64-bit integer to represent it, because an unsigned 32-bit integer only goes up to 2**32-1, one less than the maximum possible value! Chrome got this wrong and had some bugs in this area, which means in Figma's current experiments with using the 4gb limit we actually limit to 4gb minus one page to avoid those bugs.

Safety. Wasm is sandboxed, yet you can run C++ atop it. How can that work? The spec has a whole section on soundness that is establishing memory safety in the Wasm semantics, but that actually is not answering the safety question you might have.

Here's a thing that is obvious to me now but I think wasn't obvious when I started. C++ has its own notion of the stack and heap, in that in C++ you can e.g. take the address of something on the stack. This translates to Wasm by C++ stack variables existing in the Wasm-addressable memory, in the same way the C++ stack exists in memory. That memory is totally under the control of the C++ code. Meanwhile, Wasm runtime structures, including the call stack, are separate and are effectively invisible to C++. In summary, C++ memory errors like buffer overflows can still stomp on C++ memory, but that memory is distinct from any Wasm structures such as the call stack, so all a broken Wasm program can do is mess up its own memory, not the host's.

Escaping JavaScript. Does Wasm mean you can now write your favorite language instead of JavaScript, or run your favorite program in a browser? Sort of, but not really. It's true it's possible to cross-compile things to Wasm, but it was already possible to cross-compile many things to JS anyway. In other words, Wasm makes things more efficient but it doesn't expose any new semantic consequences or behaviors, other than the ordinary consequences of things getting faster.

The reasons not-JS in the browser doesn't often work out are vaguely the same whether you target JS or Wasm, with a phenomenon I like to call Probst's paradox: software and programming languages that were not written to run on the web to begin with will have their own assumptions about the environment that make them run poorly on the web. For many languages the assumption is that the large language runtime or libraries or a garbage collector are freely available, and for many programs the assumption is that things like "files" or "the screen" or "stdout" exist. Of course, almost anything can be made to work with enough hammering.

(You can see a similar dynamic play out when people make crappy ports of webapps to phones. You can make good apps using web technologies on a phone — Libby, for example, is really great! But if you want to reuse the work you put into making some old webapp by running it on a phone, the usual outcome is not great, because the webapp wasn't written with phones in mind.)

It's interesting to me how C (as distinct from e.g. Go or Python, etc.) ends up being one of the better languages for targeting Wasm in this respect, because C is designed for running in places where there isn't much of a runtime (like no_std in Rust). But note even to run C under Wasm you typically need to end up shipping an implementation of malloc in your Wasm bundle.

The hosting environment. When running in a browser, Wasm memory is just an ArrayBuffer. Within Wasm code is free to scribble whatever it wants on that memory. To call out to the environment (e.g. the browser), it can invoke functions that are exposed to it, but the only types Wasm knows about are numbers.

This means to pass a string or structure out, you pass an address, and from the JS side you write let x = memory[address]; to read bytes from memory. (That isn't even pseudocode, see Memory.buffer!) There are many ways to bind these together but all of them involve cost and frequently copying. For example, to convert a Wasm string to a JS String object — which you must do to invoke any browser APIs that involve strings — you must pass a slice of the memory array bytes through a TextDecoder or some equivalent. This isn't free.

In all, in theory you can have JS call Wasm call JS and thread call stacks through any language, but the boundary between makes it more expensive than you probably want. And Wasm doesn't make you magically able to mix two non-JS languages any more than you were already able to mix them outside of Wasm.

I'll emphasize here that Wasm in a browser runs synchronously in the same thread as JS, not as a separate worker. When the stars align such that your devtools work, this means that e.g. stack traces can trace backwards through JS and Wasm frames in the same stack. (It's of course possible to run Wasm in a worker too.)

Can it crash? In general, Wasm code can freely do whatever it wants to its own memory, so language-level problems — including even up to "dereferencing a null pointer" — are legal. Writing to a null pointer is just writing to the memory array at offset zero, as legal as any other write.

But bad code in Wasm still can crash at runtime! In the spec they call it a trap. For one simple example of a trap, imagine code that accesses memory[n] for some n greater than your memory size. You can see some traps specified here.

However, Wasm has a "validation" pass when it loads that verifies many properties such as "all direct function calls refer to valid function indices", so many of those sorts of potentially invalid behavior are screened out at load time. This validation is part of why Wasm can execute efficiently at runtime while still being secure. For another example of this, jumps in Wasm are structured in an interesting way such that they always refer to a block boundary.

At Figma we encountered traps in a subtle way related to C++ virtual calls, but I have a lot more to say about C++ and Wasm in particular, so I plan to go into all that in another post. (Edit: as promised, here's Wasm and C++.)

HN discussion.