Binary size and other tree maps
(This is kinda old at this point, but I may as well post it since I haven't touched it in a while.)
We build Chrome as one huge binary. This has a number of benefits
outside the scope of this post but one negative is that we end up with
a single enormous file without much insight into where the space is
going. You can look at correlated factors like .o
file size but you
never know what the linker is going to throw away or optimize. objdump
and friends can tell you the relative sizes of sections, but that still
is a pretty blunt instrument.
nm
can give you per-symbol sizes. This let me first discover
e.g. for our translate feature, we ship a 1mb language model data
table. But what I'm more interested in than single large symbols is
aggregate costs of modules.
It turns out that nm
can also emit the line number each symbol
came from, though it takes a long time to compute (at one point I dug
into nm
's source and discovered there's a comment in there about
this; my memory is months old at this point but it was something about
"this is slow, but we don't do it often"). With paths to files we can
map back from bytes in the binary to a directory structure of the
source.
With a directory structure in hand, I next turned to visualization. I've used treemaps before for looking at disk space (in particular on Linux Baobab is built into Gnome and its ring chart is quite nice), but how can I share the results with my coworkers? I turned to the web but found lots of one-offs and the larger JavaScript Infovis Toolkit but I found its UI frustrating and clunky.
I said to my office: "I bet I could hack something decent up in a few minutes." Ojan responded: "I bet it'll take you a few hours to get 80% there, and then a week to have it be useful." And he was pretty much spot on.
But the end result is that I have published a web-navigable treemap of our binary size. (You can see some other discussion of it on hacker news.) This breaks it down by directories; it's not hard to do other breakdowns, like by namespace.
I also published the treemapping widget separately. It
was fun to write, a combination of intuition as well as reading a
paper and implementing the algorithm from it. It's
pretty straightforward and works on both WebKit and Gecko (though I
may have accidentally broken Gecko more recently, I haven't checked,
and I also rely on WebKit transitions for gratuitious but brief visual
effects). I spent an embarrassing amount of time fiddling with getting
the spacing right; it turns out adjusting divs when borders are present
is still pretty fiddly, even with the border-box
CSS attribute.
Since then I've used the same widget for looking into our test timings; you can see a snapshot of the map. With this in hand I knew which tests were most problematic and cut down test runtime by a lot. (Coincidentally, while grinding away at tests I also discovered that much of our test flakiness was caused by a single bug, so a lot of the red you see on those old charts is now fixed. But that's a story for another time.)