Javascript heap size

May 04, 2009

A real pain point for those of us developing on Linux is that v8 is 32-bit x86 only and our development machines are 64-bit. This makes our entire build 32-bit. You can certainly make it work, but it's painful to get going and strace prints the wrong info and 64-bit plugins won't be linkable and gdb's debug info is wrong and you're the only process loading 32-bit GTK so your startup is slow and 32-bit IME modules are set up wrong on Ubuntu etc. etc.

I was chatting with a v8 developer about a 64-bit implementation and he had an interesting comment, which I'll now share with you*. One of the fears of 64-bit code is that all your pointers double in size, which means you get half the cache, twice the memory usage, etc.

He argued that the proper implementation strategy, therefore, was to continue using 32-bit pointers for JavaScript and track a sort of 64-bit "base pointer" that these 32-bit pointers were relative to. This would cap JavaScript instances to 4gb of memory, which on the one hand I hope no web page ever would hit but on the other does seeem a bit artificial.

His response, which this whole post was lead-in for, is that when you have that much memory it's difficult to actually make use of all of it from a language like JavaScript because it's single-threaded. That is, the time it takes to read in and out (or more importantly, process) that much data starts making apps that would use more data unuseful. He suggested that part of the reason 64-bit Java was important was that there were these heavyweight servers with multiple cores that wanted to run a bunch of Java threads in the same heap simultaneously.

There's an interesting parallel here to servers. Say you can stick 10 terabytes of disk in a single machine — it ends up not being too useful as a server unless it's archival, as in most of that data isn't accessed, because you're ultimately limited by disk bandwidth. Just trying to stream the contents of the disks at a sequential 50mb/sec** would take two and a half days. So when you're trying to serve real-time data off a disk (like, say, a search engine might) you're better off having more machines with smaller disks.

* In my typical "take something intelligent someone else said and twist it through my poor understanding" fashion you've maybe come to expect from this blog.

** I'm way behind in my knowledge of reasonable numbers for modern hardware but even if I'm off by a factor of four the point stands.