Embedding the current revision in your product's binary

November 20, 2014

Today's post was written by my coworker and longtime collaborator Nico Weber. Reading this made me suddenly remember fighting with making these scripts work on multiple platforms across svn, git, and git-svn.

It's sometimes useful if a program can say "this program was built at rXXXX". For example, about:version in chrome used to tell you the SVN revision your build of chrome is from, clang --version will list which SVN revision chromium's clang is built off of (Xcode's clang doesn't do this), and so on. If you know that a certain bug is fixed at rYYYY, you can use this feature to quickly discover if your binary has this fix already.

It's an obvious feature, but it's actually a bit tricky to implement if you don't want it to slow down your build. Clang's Make-based build has a Makefile that shells out to a script to get the current revision. The script calls svnversion or git svn info or git log -1 depending on the checkout, then the Makefile does the same again to get the revision of the llvm directory, then both strings are concatenated and compared to the contents of a file containing these two numbers from the last build, and if the files are different it overwrites the file it just read. The "Version.o" file has a dependency on this file, so Version.o file gets rebuilt if it changes. This means that every time you run make, a bunch of scripts get run, hitting the disk a few times. Clang's Make-based build uses recursive Make so incremental builds aren't all that fast anyway, so this is good enough. LLVM also has a CMake-based build though, which can write ninja files, and ninja doesn't allow shelling out from its manifest files.

Before discussing that case, let's review how the same feature was done in Chromium. Chromium used to use a similar system as LLVM's Make-based build: The lastchange target (which also gets the current revision from the VCS and writes it to a file) used to depend on build/util/lastchange.always, a file that doesn't exist and never got created. So the lastchange target used to be built every time. Again, we used to use make which wasn't all that fast anyhow.

(I think this used to be only the story on Linux and Windows. On Mac, we used to use a "postbuild" and store the revision in the app's bundle Info.plist every time after the app bundle got built. So this wasn't a problem on OS X. Since xcode had postbuilds, I taught the gyp ninja generator about postbuilds on OS X, but we feel that postbuilds aren't a great feature in general, so we didn't make them work on other systems.)

When Chrome switched to ninja on Linux, running the lastchange script on every build was annoying. Originally, ninja would always rebuild all downstream dependencies, so lastchange would write the revision to a file on every build, and since chrome transitively depends on that revision file somehow, chrome would relink on every build, even if nothing happened. To prevent this from happening, Evan Martin [ed: that's me!] added this to the ninja generator:

# Chrome-specific HACK.  Chrome runs this lastchange rule on
# every build, but we don't want to rebuild when it runs.
if 'lastchange' not in input: ...

This made builds fast, but also meant that your embedded revision would be out of date when using ninja. Peter Collingbourne taught ninja about "restat rules" (build edges that can cancel themselves if the output they produce is identical to the output they produced last time) and Ami Fischman changed gyp's ninja generator to make all custom rules restat rules — so when the contents of the revision file doesn't change, the build edge cancelled itself and Chrome no longer relinked on each build, even without the above hack. But the script to check the revision would still run on each build.

In a seminal CL, Scott Graham realized that people always have to run gclient sync, and moved generation of the revision file from the build system into a gclient hook. (Back then, we had only few hooks — mostly one to run gyp — and people often didn't run hooks, so this was less obvious back then than it is now.) This was the first time that ninja chrome wouldn't run one step on every build, since the revision file would be written when you run gclient sync — the only time your repo's revision actually changed. (Richard Coles cleaned this up more a bit later).

This works fairly well for chromium, but LLVM doesn't have a script you have to run to sync — if you want the latest code, you just svn up or git pull, so Chromium's approach didn't quite work. They used to do the same thing every time CMake runs — equivalent to running gyp, but people usually don't run cmake manually: it reruns itself when your CMakeLists.txt change. Not every change touches CMakeLists, so your revision file was usually only mostly up-to-date.

A previous attempt to fix this was to move the revision file generation into the build system and make it depend on a non-existent file, like chromium's 'lastchange' rule of yore, but that meant that ninja always ran that edge and the build never got clean (just like in Chromium), so that quickly got reverted.

Yesterday, Apple's Jordan Rose landed a change to LLVM that solves this problem in a pretty clever way: He moved the generation of the revision file back into the build system, but instead of making it depend on a non-existent file so that it runs on every build, it depends on .git/logs/HEAD or .svn/entries, files that the VCSs update every time you sync. With this, the revision update script runs exactly if your revision has actually changed.

(Relying on implementation details of your VCS comes with its own drawbacks, of course: If a VCS changes its internal files, this will break. And this already happened: SVN 1.7 uses .svn/wc.db instead of 1.6's .svn/entries , so Jordan already had to land one patch to make this approach actually work in practice. Still, this is probably the solution with the best tradeoffs.)