Globals and singletons are already well-known as a design antipattern, but they have an interesting additional cost. Consider a global (I include file-level static in this category) value that has initialization code. That code must be run at startup (which leads to the static initialization order fiasco, though that is not the point of this post).
Because this initialization code is run at startup, before even
main() is entered, it is in the critical path for startup. It turns
out that even simple code must be paged in off disk, which can lead to
disk seeks, and disk seeks murder your startup performance.
This is not hypothetical: with ChromeOS we found that innocuous-seeming static initializers in Chrome were actually affecting the bottom line of startup performance. (Note: that observation comes from a coworker; I'm not sure whether he was using a non-SSD machine at the time or if it also happens on SSDs. Just guessing, but paging in more code, especially code that is non-contiguous, must have some non-zero cost even on the SSDs that ChromeOS relies upon.)
Because of this cost we attempt to track static initialization on our performance bots and prevent new checkins from adding more. (Ideally we'd remove them all but progress is slow.) I recently looked into how this works and I thought it'd be useful to write it down before I forget.
How constructors are implemented
The compiler creates, for each object file, a function that contains
the constructors for the file. Pointers to these functions are
collected in a table at link time. At startup,
__do_global_ctors_aux iterates through the table and calls each
function. (Here's a nice page that walks through the
disassembly.) Conceptually, to judge the cost of all static
constructors you might want to do something like sum the size of
all of these functions, but for our purposes we care about disk seeks;
even doing more work in a single static constructor is fine if we
reduce the total number of functions paged in, which means the size
of the constructor table is the statistic of interest.
The table of functions shows up as the
.ctors section of the
executable. You can dump table via commands like (note that the first
entry is -1, the rest are addresses):
$ objdump --full-content --section=.ctors path/to/binary
or in gdb,
(gdb) x/1000xg &__CTOR_LIST__
The gdb output is perhaps useful since it will decode little-endian for you. (N.b. that "g" trailing the "x" command prints 64-bit pointers; adjust as necessary locally.)
For a Chrome binary I glanced at the ctor list appears to be in pointer order, which means you can see how much of the resulting binary they span by subtracting the last entry from the first. From my random debugging build: 30mb, not good.
Constructors versus static initialization
Note that data that is initialized to a constant is implemented in a different way: the constant value can just be placed in the right place at compile time, so there is no cost. In contrast, C++ objects that have constructors involve code and must be computed at runtime. You'll also sometimes encounter code that initializes variables with function calls (like we did with the mysterious IcedTea crash).
You might also notice that static data can be shared between multiple instances of the same executable, while initialized memory is private; see my post about how memory works for more on that.
I noticed with some interest that the Go programming language, designed in part by compiler hackers, neatly sidesteps some of the above problems: by defining initialization order carefully ("The importing of packages, by construction, guarantees that there can be no cyclic dependencies in initialization.") and by only allowing simple values as constant initializers. See their manual for more.
What to do about it
Mozilla hackers have found that Linux is pathologically bad in how it runs the resulting ctor list, and it looks like they have at least considered fixing that manually. We have chatted about doing the same, but fundamentally I believe the way to keep startup fast is to do less. See also my earlier post about performance.
It appears that the generated functions that run these constructors
get names starting with
_GLOBAL__I_. This means a call like
$ nm out/Debug/chrome | grep _GLOBAL__I
will dump a list of all files that have a global constructor. Go delete some code!